Oracle on VMware – The Final Frontier

Final FrontierA question I tend to ask my customers almost always is what their current state is regarding IT transformation and journey to the cloud. Of course such a strategy does not work very well on bare metal and some kind of isolation between services and physical hardware is required – which naturally includes virtualization, as well as the use of some kind of container technology, as-a-Service paradigms, changes in IT administration and operations etc.

Nearly always the answer includes “We already virtualized everything! … Well, ehm, except Oracle….”

TL;DR: There are no more roadblocks for virtualizing Oracle, including license issues. See the last section “Mythbusting” of this post for a summary on myths and truths.

Top virtualization objections

The reasons for avoiding virtualization of database servers are typically the usual suspects:

  • Bad experience with previous consolidation attempts
  • Expected performance overhead / sizing limitations
  • Technical issues / more complexity
  • Different strategy for DB consolidation (such as going with Pluggable Databases / Multitenant)
  • No need because we have a ULA / PULA / Special contract

Most of these have been covered on my blog before so I will focus on the two biggest showstoppers:

    • Virtualizing Oracle not supported on the preferred platform (i.e. VMware)
    • The Oracle on VMware licensing dragon (The claim from Oracle that once you run Oracle on VMware, you have to fully license ALL vSphere/ESX hosts in your entire datacenter – including clusters that don’t run Oracle at all – also known as Galaxy Licensing.)

 

Oracle Virtualization benefits

So why would one want to virtualise Oracle databases anyway?

The quick answer: Massive cost reduction as well as operational benefits.

A small recap of the stuff I’ve written before:

By virtualizing Oracle it is often possible to reduce the amount of processors required to run consolidated workloads. This because individual servers no longer have to be sized for peak loads as workloads can dynamically move between servers, allowing much higher average CPU utilization. There are a number of design rules to keep an eye on, and there are some notable exceptions but in general this is how it works. There are also a number of additional operational benefits of moving to a virtualized environment.

Why VMware? Because IMHO it’s currently the only platform that is both capable in dynamically balancing workloads, and stable enough for mission critical workloads.
With some exceptions, other hypervisors or alternative consolidation strategy (Such as Oracle Multitenant a.k.a. Pluggable Databases) simply don’t yet offer enough efficiency benefits or are either not supported or lacking the required features or matureness.

With note that I have seen customers also being succesful with IBM Power systems (AIX LPARs) and I yet need to see how wel Linux KVM is up to the job – please let me know if you achieved good results with KVM!

Oracle now supported on VMware

In September last year during Oracle Openworld, Oracle and VMware announced a cloud agreement regarding Hybrid Cloud deployments using VMware on Oracle’s cloud. As part of the agreement Oracle now also officially supports Oracle deployments on VMware (even if it’s not part of a Hybrid Cloud strategy). Note that Oracle supported VMware before but with some side restrictions and requirements – most notably having to move back to bare metal in case of certain issues.

For more info: Oracle Cloud VMware Solution – A New beginning of a Strategic relationship

One small step for VMware & Oracle , one giant leap for VMware & Oracle mutual customers

This new agreement eliminates the first major roadblock in virtualizing Oracle.

The Final Frontier

With the support issue resolved, there remains one final barrier to go virtual: the licensing dragon.

This could be a big one as many customers don’t want to take the supposed risk of becoming non-compliant with their licensing, and they want to avoid any trouble with audits, legal issues and ultimately the risk of having to pay a huge amount of money to get rid of such licensing issues.

Needless to say that in the past I have been in customer engagements where we secured the “technical win” where the customers expressed their preference for one of our converged or hyperconverged cloud platforms to run Oracle, but lost anyway due to these licensing risks. With the advantage of these platforms gone, all we can do is offer just a bunch of servers and/or disks, and our real value-add is gone. In that case, unfortunately, we often lose because we don’t hold the licensing/discount wildcard and we have no further unique selling points.

A lot has been written about why this does not make sense and how you can avoid this issue (see my previous post on the matter) but the usual comment is that even though we all agree, Oracle does not accept this as a valid isolation strategy and our customers don’t want to take the risk and rather pay much more to avoid legal disputes.

Why does Oracle not accept? My take is that Oracle doesn’t want to run customers as efficient as possible – as this reduces their license and maintenance revenue – so they will use any kind of FUD tactics to scare customers away from innovation and optimization.

With the Oracle / VMware announcement, nothing has been mentioned about licensing – so even with the support issue gone, the perceived licensing problems remain. Let’s take a deeper look.

Introducing LicenseFortress

Around mid 2018 I started working with a company named LicenseFortress as they somehow could solve the licensing stalemate issue. The idea is as follows:

  • LicenseFortress signs off the proposed architecture to verify it does not contain potential license violations. They look at things like affinity rules, VLAN isolation, tracking audit logs etc (essentially what I recommended in my License Dragon post). If your architecture is one of DellEMC’s CI or HCI platforms, then this is easy as these architectures are well known for them.
  • A Virtual Appliance (LicenseFortress Discovery – you can try it for 60 days) is deployed that continuously monitors the customer’s environment for compliance issues and notifies immediately if some rules are broken. This goes beyond just bean counting the number of CPU cores – it also checks for license violations not related to running on VMware (for example, accidentally running an in-memory query). So LF Discovery enforces license compliancy at all times
  • A financial guarantee is provided to protect customers against licensing claims from Oracle. See “Premier” under Subscriptions for more info
  • If Oracle LMS requests an audit, we can be sure the environment is compliant
  • If Oracle claims this is not the case, LicenseFortress assists with technical as well as legal support to refute the claims
  • If Oracle sues the customer (which we consider highly unlikely to happen but it’s a theoretical possibility) LicenseFortress offers legal support and guides the entire process
  • If the customer would lose the case and is forced to buy additional licenses and support, this is also covered in the financial guarantee. LicenseFortress is backed by a large insurance company for these potential large claims.

The end result is that:

  • The customer does no longer have to worry about being compliant – the required proof is constantly maintained by LicenseFortress Discovery
  • The financial risk of being non-compliant is covered even in case of legal disputes
  • Protection against any kind of license violations – not just Oracle on VMware related

I really think this approach is brilliant and unique – so I have become a big advocate within Dell resulting in working together on multiple occasions: At DellEMC we work with LicenseFortress to remove any financial risk (with respect to license compliance) when our customers want to go with Oracle on VMware.

Let me phrase that again:

We work together to remove ANY financial risk running Oracle virtualized on our platforms

(with respect to license compliancy of course)

I’m expecting more information / announcements on this anytime soon – stay tuned !

Mythbusting

Some myths and other statements regarding licensing:

VMware has performance overhead

Kind of true – there is a small overhead – we measured about 4% on ESX version 5 – but with modern CPU enhancements and improvements to the hypervisor, current systems will have even less

Scaling issues

On vSphere 6.7  a single VM can scale to 128 vCPU, 6TiB RAM, 62TB per virtual LUN (source: vInfrastructure blog). Not many workloads are too large for these limits. Of course the physical hardware has to support this.

Additional complexity

Well, yes you’re introducing another layer (the hypervisor). Which is why you should choose a mature, proven platform for which many tools and integrations are available. With the right tools in place this does not have to be a problem – and many companies have virtualized most of their workloads already so no additional skills/tools need to be added.

Different strategy for database consolidation

Maybe – depends on which one. One common mentioned alternative is Oracle Multitenant – see my post on that for more info.

No need because we have a ULA / Other special agreement

Depends on the conditions of course – sometimes it could make sense to delay a consolidation project to maximize the amount of CPUs before ULA certification. Beware however of the pitfalls (work with a license consulting company for guidance on this). My take is that in the long run, reducing the amount of processing power to run your real estate is a good thing but timing may be critical.

The myth “We can deploy as much servers/options/VMs as we want, we have a ULA” often does not hold true in the long term.

Oracle is not supported on VMware

Simply untrue. Oracle has supported running on VMware for years – and with the recent announcement, even the requirement for reproducing on physical hardware has gone.

Oracle needs to be licensed on ALL VMware hosts

Simply untrue. You can license by physical host even within the same VMware cluster if you like, i.e. only licensing specific hosts for Oracle within the very same cluster (i.e. sub-clusters). I verified this with LicenseFortress and they offer their guarantee on sub-licensed clusters. I also have reference customers who are doing exactly this (without Oracle’s approval) and there are no compliance issues whatsoever.

Oracle needs to be licensed on all sockets

Untrue. According to LicenseFortress we can license a single socket on a multi-socket host – but special care must be taken to prevent Oracle software to be accidentally running on the wrong socket. There are ways to enforce this.

Oracle needs to be licensed on all cores of a socket

True until proven otherwise – although not everybody agrees – See for example House of Brick – Oracle Licensing: Soft Partitioning on VMware is Your Contractual Right but LicenseFortress don’t allow this in their guarantee (as of today) as it is considered too high of a risk.

If you really need this, also consider buying servers with less cores per socket, or using Oracle Standard Edition, or even a different database (EDB Postgres for example) to achieve more cost efficiency.


With the last frontier gone, it’s time to leave the past behind and start the next chapter in the journey to the cloud – migrating the Oracle workloads.

Contact me for more details if you’re interested.

Special thanks to the guys at LicenseFortress for reviewing this blog post’s contents and providing valuable feedback!

Further reading

Monin – Carelessly running Oracle, even on VMware and Nutanix

LicenseFortress blog

This post first appeared on Dirty Cache by Bart Sjerps. Copyright © 2011 – 2020. All rights reserved. Not to be reproduced for commercial purposes without written permission.

9 Responses to Oracle on VMware – The Final Frontier

  1. Matt K. says:

    Very good post Bart.

    On this topic (VMware has performance overhead)…one could argue that for some DB workloads or parts of the DB workload running Oracle on VMware, it will execute faster than on physical.

    The basis for this is if a DB’s memory resides in a single NUMA node/CPU package (ie. L1, L2, LLC, DIMMs), the host based processing work (logical reads/CBC work, SQL parsing, executing PLSQL, …) does not have to reach over the UPI/QPI link to access memory on a remote NUMA node.

    Think of these round figures for speed of accessing local & remote cache & memory: L1 cache takes ~4 cycles, L2 cache takes ~12 cycles, LLC cache takes ~26-30 cycles, DIMMs take ~80 cycles, remote memory takes ~310 cycles.

    Hmmmm, a thoughtful thought.

    • Bart Sjerps says:

      Hi Matt,

      Interesting point. Indeed we have seen that on a core-by-core basis, single socket machines are usually faster for that reason.

      Not sure under which conditions a DB on a VMware VM would have better performance than the same on bare metal. Maybe (on a 2 socket machine) where the virtual cores are less than half the physical and VMware keeps all CPU threads on the same socket?

      Guess this is a good point in consolidation scenarios. Would like to see some evidence by numbers if you have any :)

      Thanks!

      • Matthew Kaberlein says:

        Hey Bart

        Not sure under which conditions a DB on a VMware VM would have better performance than the same on bare metal. Maybe (on a 2 socket machine) where the virtual cores are less than half the physical and VMware keeps all CPU threads on the same socket? ==> Think of this…

        An ESXI host has 2 sockets & 384GB RAM, each socket has 12 cores & 192GB RAM. So 2 NUMA nodes in the host.

        A DB needs at max, 8 vCPUs & 32GB RAM. With this, the NUMA & CPU schedulers would place this DB VM to run on a single NUMA node. So all CPU, cache & memory accesses would come from the local NUMA node. No QPI, UPI links nor remote memory/cache overhead to read a DB block.

        Unfortunately no empirical data to substantiate this, but it sure does make sense.

    • That’s a NUMA property, not a virtualisation property.

      So stating that running oracle on vmware executes faster is not true, and in my opinion makes it confusing.

      If the hypervisor for running oracle is restricted to a single socket (I say hypervisor, workload restriction is not a vmware property, but something that most hypervisors can do), it is true that memory lookups from another socket across the inter-socket link are prevented, and thus that increased overhead for doing that is prevented.

      There is nothing preventing you from managing processes and memory access at the operating system level to limit it to a single socket or even a single core, accomplishing the prevention from additional overhead from inter-socket communication.

      VMware, or any type 1 hypervisor for that matter, does impose additional overhead. However, it’s implemented very smart to minimise that overhead. Any user land execution is executed at no overhead, however any system call does impose some overhead, because not only does it need to make the context switch, but the hypervisor must intercept and make sure the virtual machine scoped execution does not interfere with other virtual machine’s resources.

      This also means that one workload could have a totally different virtualisation overhead from another.

      • Bart Sjerps says:

        Hi Frits,

        My take is that, as long as the overhead percentage wise is very small (say up to a few % at most), then it doesn’t matter much.

        I don’t care if VMware would have 1% overhead where another hypervisor would have 2% or 0.5%. More important to me is the ability of dynamically moving workloads (VMs) around without noticeable effects for the end users, as well as stability, scalability, manageability, 3rd party tooling integration etc and this is where some other platforms are lagging behind.

        Good point on NUMA, I already had my doubts on how VMware could speed up things where you could not somehow do this on bare metal. At best, a hypervisor has zero impact, not negative :)

        It’s a bit like the filesystem (ZFS) flame wars a while back. Some people were claiming ZFS was many times *faster* than other file systems. IMO this cannot be true, turned out they were depending on the caching side effects – which you can also achieve on other filesystems with a few tricks. The ideal FS has zero overhead (but again, not negative).

        • Hypervisors and overhead: I fully agree. If you need/want to virtualise, then your choice of hypervisor shouldn’t impose significantly more overhead than other hypervisors, and should be easy to monitor and use. To be honest, most virtualisation use in companies that I seen is treat it as a black box, so “set and forget”, not many fundamentally include it in their monitoring and ACT on the monitoring.

          The confusion on NUMA optimisation and vmware/hypervisor introduced NUMA optimisation was the main reason for posting a comment. This could give people a completely wrong perception. Details matter.

          To be honest, I seen something with hypervisors in general that actually could slightly improve certain workloads, where slightly is less than 5 percent or so, for which I build the hypothesis that this is because the costly system calls (costly from a hypervisor perspective), these are grouped and then executed, which could overall have slightly less overhead than executing these one at a time.

          Oh yes, the ZFS wars. Sun/oracle people were so brainwashed that ZFS was the answer to everything, up to the point that systems clearly suffering from IO latency problems could not be diagnosed as ZFS not being the culprit. Don’t get me wrong, I very much like ZFS, and it brought great flexibility, but because of the design there simply are cases where that design makes it less optimal. One of these is the inability to do direct IO. The true believers would point out that you could do all kinds of settings to getting closer to direct IO latency, I would say to not use it where direct IO is preferred. And such a case is the Oracle database, which makes it a bit painful, as that is an Oracle product, and oracle being the owner of ZFS too.

  2. marciolguedes says:

    Really you are living in 2020? Do yo know X7 and X8 Machines? Oracle Kernel 4.1XX give better perfomance and security than Red Hat.
    On Cloud is speaking PT not on TB. If i understand your text above.

    • I have no clue to what you are actually responding.
      There is a pertinent absence of facts here.
      Kernel A is better than B is exactly like blue is a more beautiful colour than red. Unless you can point to actual verifiable facts.
      If you can’t solve your problems on-premises, it’s guaranteed you can’t solve them in the cloud, because you didn’t solve your problem in the first place. There is no magic in cloud, cloud is like the pot of gold at the end of the rainbow.

      • Bart Sjerps says:

        Don’t feed the trolls :)
        I just approved the comment because it’s sort of on-topic (anything not on-topic or spam goes to /dev/null directly anyway) but it did not make much sense to me either.
        But if this guy is happy with his X7/8 machines due to drinking too much Kool-Aid then let him be.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: