Why clone databases for firefighting

clonesAs more and more customers are moving their mission-critical Oracle database workloads to virtualized infrastructure, I often get asked how to deal with Oracle’s requirement to reproduce issues on a physical environment (especially if they use VMware as virtualization platform – as mentioned in Oracle Support Note # 249212.1).

In some cases, database engineers are still reluctant to move to VMware for that specific reason. But the discussion is not new – I remember a few years ago I was speaking in Vienna to a group of customers and partners from Eastern Europe, and these were the days we still had VMware ESX 3.5 as state-of-the-art virtualization platform. Performance was a bit limited (4 virtual CPUs max, some I/O overhead and memory limitations) but for smaller workloads it was stable enough for mission critical databases. So I discussed the “reproduce on physical in case of problems” issue and I stated that I never heared of any customer who really had to do this because of some issues. Immediately someone in the audience raised his hand and said, “well, I had to do that once!” – Duh, so far for my story…

Let’s say that very often I learn as much from my audience as (hopefully) the other way around ;-)

Later I heard of a few more occasions where customers actually were asked by Oracle support to “reproduce on physical” because of suspected problems with the VMware hypervisor. In all of the cases I am aware of, the root cause turned out to be elsewhere (Operating System or configuration) but having to create a copy in case of issues is a scary thought for many database administrators – as it could take a long time and if you have strict SLAs then this might bite back at you.

So what is my take on this?

Continue reading

The Zero Dataloss Myth

In previous posts I have focused on the technical side of running business applications (except my last post about the Joint Escalation Center). So let’s teleport to another level and have a look at business drivers.

What happens if you are an IT architect for an organization, and you ask your business people (your internal customers) how much data loss they can tolerate in case of a disaster? I bet the answer is always the same:

“zero!”

This relates to what is known in the industry as Recovery Point Objective (RPO).

Ask them how much downtime they can tolerate in case something bad happens. Again, the consistent answer:

“none!”

This is equivalent to Recovery Time Objective (RTO).

Now if you are in “Jukebox mode” (business asks, you provide, no questions asked) then you try to give them what they ask for (RPO = zero, RTO = zero). Which makes many IT vendors and communication service providers happy, because this means you have to run expensive clustering software, and synchronous data mirroring to a D/R site using pricey data connections.

If you are in “Consultative” mode, you try to figure out what the business really wants, not just what they ask for. And you wonder if their request is feasible at all, and if so, what the cost is of achieving these service levels.

Continue reading

Oracle snapshots and clones with ZFS

Another Frequently Asked Question: Is there any disadvantage for a customer in using Oracle/SUN ZFS appliances to create database/application snapshots in comparison with EMC’s cloning/snapshot offerings?

Oracle marketing is pushing materials where they promote the ZFS storage appliance as the ultimate method for database cloning, especially when the source database is on Exadata. Essentially the idea is as follows: backup your primary DB to the ZFS appliance, then create snaps or clones off the backup for testing and development (more explanation in Oracle’s paper and video). Of course it is marketed as being much cheaper, easier and faster than using storage from an Enterprise Storage system such as those offered by EMC.

Oracle Youtube video

Oracle White paper

In order to understand the limitations of the ZFS appliance you need to know the fundamental workings of the ZFS filesystem. I recommend you look at the Wikipedia article on ZFS (here http://en.wikipedia.org/wiki/ZFS) and get familiar with its basic principles and features. The ZFS appliance is based on the same filesystem but due to it being an appliance, it’s a little bit different in behaviour.

So let’s see what a customer gets when he decides to go for the Sun appliance instead of EMC infrastructure (such as the Data Domain backup deduplication  system or VNX storage system).

Continue reading

Data Guard protecting from EMC block corruptions?

Today I was giving a training to fellow EMC colleagues on some Oracle fundamentals. One of the things that was mentioned is something I have heard several times before: Oracle is claiming that EMC SRDF (a data mirroring function from EMC Symmetrix enterprise storage systems mainly to provide enterprise disaster recovery functions) cannot detect certain types of data corruption where Oracle Data Guard can. Ouch. The trouble with this statement is that it is half-true (and these ones are the most dangerous).
Continue reading

Stretched clustering basics

Before showing my preferred solution for Oracle stretched high availability clusters, first some clustering basics.

Active/passive versus Active/Active clusters

NASA cluster

NASA cluster

Most clustering software is based on active/passive scenarios. You have a system (say, a database) that is running on a set of resources (i.e. a server, to keep it simple) and you have another system (a standby system) that is ready to run the system (failover) but is not actually running it at the same time.

An Active/Active system in general means both systems are active at the same time. There is some confusion as this can mean that the standby system is used for other processing (say, A is production and B is standby but currently running acceptance or testing environments).

By my definition, active/active clustering describes a cluster where all cluster nodes (systems) are processing against the same data set at the same time. There aren’t many products that can do this, especially in the database world. One of the few exceptions is Oracle RAC.

Continue reading

Desktop security: Application data got blurred

In the old days, when I started messing around with computers for fun as a young geek guy, computer security was pretty simple.

Amiga 2000

Amiga 2000

In those times we were using 8 or 16-bit PC’s with MS-DOS (for the poor guys) or, for the wealthy like myself, Commodore Amiga or comparable computers with real magic inside (who else around 1988 had 4-channel 8-bit stereo sound, 4096 colors, coprocessors for audio and graphics, true multitasking, a mouse-driven GUI handling multiple screens and windows, capable or running a word processor, graphics editor, sound tracker and some other stuff, all at the same time in 512 KB RAM?) Continue reading