Save money by virtualizing Oracle
November 9, 2011 5 Comments
I wrote an internal EMC memo on licensing issues with Oracle on VMware as I get a lot of questions on this topic. But I’d like to expand the question a bit. After all, my blog is named “Dirty Cache” which could also be substituted with “Dirty Cash” – and as said, my mission is to lower cost and drive up service levels for my customers…
Here my internal memo (slightly edited for the blog and updated with a few corrections). Again, I want to make it clear that these are my own opinions based on (limited) customer experiences, I might be completely wrong and that’s why my blog has a disclaimer ;-)
Use this information at your own risk – don’t shoot the messenger.
How should we license Oracle database on VMware?
Beefed up question:
How can we save money on licensing and other expenses by virtualizing Oracle?
There seems to be a lot of confusion on licensing when customers consider running Oracle databases on VMware. Part of the confusion is caused by Oracle on purpose (classic FUD) by suggesting that licensing is more expensive on VMware than on physical servers. The reality couldn’t be more different – I strongly believe that many customers can actually save on database licenses by going virtual. But to understand how to achieve this, you need to know a few things – I hope I can clear this up in a short explanation. I will keep the discussion to Oracle database licenses and ignore application / middleware licensing etc. for now.
Oracle customers typically license their basic database by one out of three options:
- License by CPU (core) – the more CPU cores, the more licenses are needed. There is a processor core factor depending on the type of CPU and can be 0.25, 0.5, 0.75 or 1.0.
- License by named user – the more named users, the more licenses are needed. The amount of CPU’s is not important, neither the amount of total databases. Typically one license pack per 25 users.
- Enterprise (site) License – the customer negotiates a contract for the whole company and afterwards can deploy as many databases on as many servers / CPUs as he wants.
If a customer uses option 2 or 3, then it does not matter if they run virtual or physical. But there are also no license savings possible without re-negotiating their contracts. I don’t want to go as far as suggesting to customers to change their license models so we leave this as-is for now.
In my experience, most enterprise customers use either CPU licensing or enterprise (site) contracts. Some have different licensing methods for different business units. Oracle can be very creative in customer-specific contracts so expect to find a different situation for each individual customer.
But let’s assume CPU licensing for the sake of this discussion.
Maintenance & support
Users typically buy the CPU licenses but then have to pay maintenance for the time they use the licenses. Yearly maintenance cost is about 20-25% of license (list price). I have no information on typical discounts. I expect customers to get at least 50% discount off the price list (but only on licenses, not on maintenance AFAIK).
Database Edition and options
The plain database license comes in 3 versions (for servers):
- Standard Edition One – Maximum 2 processors, no options allowed. Only used for testing and very small deployments
- Standard Edition (SE) – Maximum 4 processors, no options allowed. Only used for smaller sizes and workloads (but stay tuned)
- Enterprise Edition (EE) – No limitations and on top of EE, you can have many licensed features. Most customers will use this, at least for production databases
On top of the basic Database license, most customers use a set of options, each requiring additional licenses per CPU. The most common options are:
- Real Application Clusters (RAC) – allows many servers running the same database (active-active clustering) to allow scale out performance and high availability.
- Real Application Clusters One Node – same but one database can only run actively on one node. For high availability only.
- Active Data Guard – remote replication using log shipping. Note that standard Data Guard is free, but Active Data Guard allows the standby database to be opened for read-only purposes and offers some extra features.
- Partitioning – allows tables to be split up in smaller chunks. Absolutely required when running large databases and no downtime can be tolerated. Eases administration work and offers some performance benefits.
- Real Application Testing – allows workloads to be recorded and re-played on another database to do performance and functionality testing
- Advanced Compression – allows database blocks to be compressed – requiring less storage and boosting performance (in most cases).
- Diagnostics Pack / Tuning pack – provides automated reports. Oracle AWR (Advanced Workload Reports – a performance reporting tool) is part of Diagnostics Pack.
In my experience, nearly all customers have partitioning. Most customers have tuning and/or diagnostics pack. Some customers have RAC. Some customers have the other options. There are more options available but these are the most common.
Many customers have 3 or more options – sometimes the options cost more than the base database license – especially if they use RAC they will have most of the other options, too.
Running on a cluster
If a database runs on a cluster, then Oracle assumes the database can make use of any processor in the cluster. This is independent on what kind of cluster is used (so can be Microsoft Cluster, HP MC/Service Guard, VMware, Oracle RAC, etc).
This is basically the foundation for all confusion. For example, if you deploy a VMware farm (cluster) of 16 servers, and all virtual machines run all kinds of stuff (file/print, exchange, apps, etc etc) and only one tiny virtual machine in the corner, with only one virtual CPU runs a small Oracle database, you would expect only to pay for one CPU core – but Oracle’s reasoning is that this tiny VM can be dynamically moved (with VMotion a.k.a. Live Migration) to all nodes in the cluster and on any processor. Therefore, all CPUs have to be fully licensed by Oracle. So in this case, running the single database on a (small) physical server would be cheaper than running on a VM in the farm.
Total cost of the stack
In a typical database server deployment, the cost of the database licensing is far greater than the cost of the hardware + OS licenses combined. I have no hard numbers but I assume the average database licensing cost (plus options) is 10 times larger than the cost of the server + OS.
So a $5,000 server would typically require $50,000 on licenses. Then because maintenance is 25% yearly, the total cost of licenses over a 3 to 5 year period is even higher – so for a 5 year TCO the total license cost might be $75,000 (assumption – could also be closer to $100,000). Let me know if you can provide better, real-world numbers.
It is very hard to size a typical Oracle database based application. There are no good methods or calculations to figure out how much CPU power, disk I/O and memory is needed to run a given app. So historically, project teams size their database servers for peak loads, and because they cannot predict how big the peak load is, they double the resources “just in case”. The end result is that most database servers are way oversized in terms of CPU and memory. (see my earlier post on this conservative behaviour: Monkey Business)
Most physical deployed database servers will average on about 10-15% CPU load (or less). However, they will peak to higher loads at certain times, such as Monday morning when many users log in, or when month/quarter/year-end batch processing is started, etc.
Then, the utilization numbers can be influenced by other tasks of the processors. Some common causes of “artificially high” CPU loads on database servers:
- CPU is involved in storage mirroring (i.e. Host Level Mirroring – using Oracel ASM or a Unix volume manager)
- CPU is involved in file transfers over the IP network
- Backup (non-serverless, using CPU, Network and I/O bandwidth)
- Customers run the application server on the same machine driving up CPU load – This can drive up CPU load from 10% to 90% or more !!
- Same for Middleware and Enterprise Service Buses (Think Oracle BEA, IBM Websphere, SAP Netweaver, etc)
- A bunch of monitoring/management agents burn CPU cycles (Tivoli, BMC, HP Openview, CA, etc). Each agent maybe consuming 1% but add it up and you have another 5-10% overhead.
- Administrators generate database dumps / exports and run their own reports, scripts and tools. They run ad-hoc queries as well that should not be on production.
- Poorly tuned database servers cause paging and other CPU overhead – hard to diagnose but driving up CPU and I/O significantly.
- Database admin tasks (table reorganizations, (re)building indexes, converting tablespaces, …)
- And so on…
All of these cause the processors, expensively licensed for database processing, to do other stuff.
So if a server is running at 15% utilization, then the utilization caused by the database workload itself might only be 10% and the rest is caused by other stuff (whether really needed or not).
Needless to say that Oracle likes customers to use their expensive licensed CPUs for other tasks because it forces them to buy additional CPUs sooner and therefore drive their license revenues.
Isn’t life great for an Oracle rep? ;-)
Number of databases
Most customers run many databases. For the average enterprise size customer that I visit, 100+ databases is a normal number. A big global that I visited runs 3000+ Oracle databases worldwide (and this is only the scope of this specific project team). Imagine the cost of licensing all these databases on all individual servers…
Why so many? Well, customers do not like to share multiple applications on one database (and often this is not even supported). So if you run SAP ERP, Oracle JD Edwards, your own banking app and a few others, they all require their own production database.
For each production database, you might find an acceptance environment, test system, development server, maybe a staging area to load data into the data warehouse, maybe a firefighting environment, a standby system for Disaster Recovery, a training system and so on. Customers will rarely share production environments on the same server (unless virtualized or at least with workload management segregation). Sometimes they share a few databases for non-production on a server. So for, say, 100 databases, the average customer runs between 30 and 50 (physical) servers.
Power of big numbers
It does not require rocket science to understand that many of these databases do not require peak performance at the same time. A development system typically drives workload during daytime (when developers are coding new application features). A data warehouse runs queries during the day and loads in the evening but might be idle at night. For a production system it depends on the business process. An acceptance system might sit idle for weeks and then suddenly peak for a few days preparing for a new version deployment into the live production system. And so on.
So what if you could share resources across databases – without influencing code levels, security, stability and so on?
If that would be possible – you would not size for “peak load times two” anymore. You would size for what you expect and assume an average utilization of, say, 70% over the whole landscape. If one database needs extra horsepower, there is enough power available somewhere in the cluster.
How much license cost would you save by bringing down the number of CPUs so that utilization goes up from 10% to 70%?
What would be the effect on power, cooling, floor space, hardware investments, time-to-market?
What would be the business advantage of not limiting production performance of a single server, by whatever was sized during initial deployment? Risk avoidance?
What would be the business advantage of solving future performance issues by just adding the latest and greatest Intel server in the cluster and VMotion the troubled database over?
Wasn’t this exactly why we started server virtualization in the first place about 8 years ago? And why EMC aquired VMware?
Wouldn’t you think the average Oracle sales rep is scared of losing license revenue, when his customer starts considering to run his databases on a virtual (cloud) platform? Would it make sense for him to drive his customers mad with FUD around licensing,
certification support issues and whatever he can think of to prevent his customers going this way? Even threatening to drop all support if they continue to go in that direction? (I know this has really happened in some occasions…)
If Oracle is scared of losing license revenue, wouldn’t you think there is a huge potential for savings for our customers here?
The journey to the private database cloud
So how should we deal with this?
A few starting points
- Oracle supports VMware. Period. Any other claim of Oracle reps can be taken with a grain of salt (to be more specific: it’s nonsense).
- Oracle does NOT certify VMware. Then again, Oracle does not certify anything except their own hard- and software. But IMO, support is all you need and the discussion around certification leads nowhere. Classic FUD.
- Oracle might ask the customer to recreate issues on a physical server if they suspect problems with the hypervisor. Isn’t it great that we can do this easily with Replication Manager? ;-)
- Oracle only supports Oracle RAC on VMware for one specific version (184.108.40.206 and higher). Any other (lower?) version with RAC is not recommended on VMware because of support issues. Expected to change in the future.
- Both EMC and VMware offer additional support guarantees for customers deploying Oracle on Vmware. So where Oracle pulls back, EMC and VMware will fix any issue anyway.
- Oracle and EMC have a Joint Escalation Center to solve customer issues. Nobody seems to realize this… More in a future post. More info here: h7424-optimizing-oracle-dba-ep.pdf
- Performance is no longer an issue. With VMware Vsphere 5, a single virtual machine can have 32 virtual processors, 1 TB ram and drive 1 million iops. Only the most demanding workloads would not fit in this footprint. But with customers running hundreds of databases, maybe we should start with the 95% + that do fit and make significant savings there. By the time we’re done, VMware will have Vsphere 6 and who knows what happens then.
How to get around the licensing issue
As said, Oracle requires licenses for all servers in a cluster. So how do you limit the number of licenses? By deploying an Oracle-only VMware cluster. Only run Oracle databases here. No apps, no middleware, no file servers, and try to move everything off that does not relate to database processing. No host replication, no storage mirroring, etc. Ditch all management agents you don’t need and move other stuff out-of-band.
Say you have a legacy environment with 10 servers, each with 16 cores, so you have 160 cores licensed with oracle Enterprise Edition and a bunch of options. Average CPU load probably 15% but let’s assume 20% to be conservative.
I claim that a single VMware cluster with 3 servers each with 32 cores will easily do the job. Now we have 3 * 32 = 96 cores to be licensed. 96/160 = 0.6 = 60% so we saved 40% on licensing right away. Probably the average CPU load on the whole cluster will still be much less than 70% so we can gradually add a bunch more databases until we average out on 70%.
If the old system was not running Intel x86 but SPARC, PA-RISC or POWER cpu’s, then the processor factor was probably 1.0 or 0.75.
Intel has 0.5. So for 96 cores (Intel) you would need to pay 48 full licenses. Another 33% (or even 50%) savings.
The savings of 40% (or more) on licensing will easily justify an investment of a nice new EMC storage infrastructure with EFDs, FAST-VP, VPLEX and all other goodies. Plus VMware licenses. Do you think the discussion will be about price per Gigabyte, competing with our friends at HDS or Netapp, if we just saved our customer millions in Oracle licenses?
But the story does not end here.
Let’s assume the customer needed High Availability and scale-out performance and was running Oracle RAC. RAC is the most expensive licensed option and you need at least two for a two-node cluster. But VMware allows for HA (High Availaiblity clustering) as well. Using VMware HA instead of RAC, you would have to fail-over and recover the database in case of an outage – if you cannot tolerate this, then by all means – stick with RAC (only for mission critical databases!) and you might want to consider RAC stretched clusters with VPLEX. But most customers can live with 5 minutes of downtime in case a server CPU fails and in that case, replacing RAC with VMware HA can save them another big bunch of dollars.
Let’s assume that with virtualization you justified the investment in a nice shiny EMC infrastructure with Flash drives to replace the competitive storage
crap gear. Now the Oracle cluster is no longer limited by storage I/O’s (EMC is the only one that can get maximum performance out of Flash drives) and you can drive more workload out of the same 3 VMware servers in the cluster. But you can also replace host mirroring (where applicable). You can implement snapshot backups to get the backup I/O load away from the production servers. You removed the middleware and apps stuff from the database servers – reducing CPU utilization and allowing even more headroom for DB consolidation – all without buying extra licenses from Oracle.
You want even more savings?
What if you create TWO database clusters for VMware? One for Production (running Oracle Enterprise Edition (EE) with all the options you need) and one for Non-prod (running Oracle Standard Edition (SE) without options – good enough for test/dev and smaller, non-mission critical workloads). I bet the number of non-prod, non-mission-critical databases will be much more than for mission-critical, high performance production. By removing the expensive options AND moving from Enterprise to Standard Edition, you saved another ton of money on Oracle licensing as Standard Edition is much cheaper than Enterprise Edition (including options). But be aware – the devil is in the details and using Standard Edition is not for the faint-of-heart (for example, you could no longer clone a partitioned database to a SE enabled server because of the missing license and functionality). Still if you are keen on saving as much as possible, then this might be the final silver bullet…
Do you run a huge Enterprise Data warehouse (maybe considering an overkill and over-hyped database “appliance” such as Oracle Exadata)? Having troubles with loading times and query performance? See if you can replace it (partly) with Greenplum – saving another bag of money and speeding up Business Intelligence queries (plus opening up new possibilities for non-structured data). But be careful, in an Oracle-religious environment, people might not like you for replacing Oracle…
I had this discussion already with a few enterprise customers. And found that although the story is easy in theory, the reality is different. If a customer already has the 160 CPU licenses purchased from Oracle, then the Oracle rep will not happily give a money-back in return of the shelfware licenses. So in that case the customer could only save on maintenance and support. But having enough licenses on the shelf, he would not have to purchase any more for the next 5 to 10 years. So I sometimes talk about cost avoidance instead of immediate savings. And again, if Oracle is licensed by user or by site license, then saving on licenses will be a tough discussion. Still, the savings on power, cooling, hardware and floorspace would still be significant enough to proceed anyway.
And don’t forget the other benefits of moving to a private (virtualized) cloud: they are no different for Oracle than for other business applications.
For this approach to work, we need customers who are willing to work with us and be open on how they negotiated their contracts with Oracle, and a team of database engineers accepting the challenge to make it happen. If internal politics or “database religion” cause significant (political, not technical) roadblocks then you will get nowhere.
It’s not an easy task but the rewards can be massive. We’re only just starting to figure out how to convince our customers and drive this approach. Feedback welcome and let me know if you need support.
Landing page with Oracle ‘s price list: http://www.oracle.com/us/corporate/pricing/price-lists/index.html
Download “US Oracle Technology Commercial Price List” for the database license document. Read the fine print because it’s not always as simple as it seems.
Processor Core Factor information: Oracle Processor Core Factor Information
Oracle pricing (wikipedia): Oracle Database Pricing (Wikipedia)
Database Editions and features: Oracle DB edition and features comparison
Update: found another goodie here
And a post from our Virtual Geek: http://virtualgeek.typepad.com/virtual_geek/2011/07/even-more-reasons-to-run-oracle-on-vmware.html
And our Oracle Storage Guy (Jeff Browning) posted my article on his blog: http://oraclestorageguy.typepad.com/oraclestorageguy/2011/11/oracle-licensing-on-vmware-no-magic.html
Reference customers who virtualized Oracle on VMware: Customer Successes Virtualizing Oracle Database on VMware
VMware whitepaper on Oracle licensing: VMware: Understanding Oracle Licensing in VMware environments
Oracle RAC on VMware – Deployment Guide
Oracle Databases on VMware – Best Practices Guide
DSP Managed Services – Oracle on VMware whitepapers (requires registration)