The Zero Dataloss Myth

In previous posts I have focused on the technical side of running business applications (except my last post about the Joint Escalation Center). So let’s teleport to another level and have a look at business drivers.

What happens if you are an IT architect for an organization, and you ask your business people (your internal customers) how much data loss they can tolerate in case of a disaster? I bet the answer is always the same:

“zero!”

This relates to what is known in the industry as Recovery Point Objective (RPO).

Ask them how much downtime they can tolerate in case something bad happens. Again, the consistent answer:

“none!”

This is equivalent to Recovery Time Objective (RTO).

Now if you are in “Jukebox mode” (business asks, you provide, no questions asked) then you try to give them what they ask for (RPO = zero, RTO = zero). Which makes many IT vendors and communication service providers happy, because this means you have to run expensive clustering software, and synchronous data mirroring to a D/R site using pricey data connections.

If you are in “Consultative” mode, you try to figure out what the business really wants, not just what they ask for. And you wonder if their request is feasible at all, and if so, what the cost is of achieving these service levels.

Read more of this post

Oracle snapshots and clones with ZFS

Another Frequently Asked Question: Is there any disadvantage for a customer in using Oracle/SUN ZFS appliances to create database/application snapshots in comparison with EMC’s cloning/snapshot offerings?

Oracle marketing is pushing materials where they promote the ZFS storage appliance as the ultimate method for database cloning, especially when the source database is on Exadata. Essentially the idea is as follows: backup your primary DB to the ZFS appliance, then create snaps or clones off the backup for testing and development (more explanation in Oracle’s paper and video). Of course it is marketed as being much cheaper, easier and faster than using storage from an Enterprise Storage system such as those offered by EMC.

Oracle Youtube video

Oracle White paper

In order to understand the limitations of the ZFS appliance you need to know the fundamental workings of the ZFS filesystem. I recommend you look at the Wikipedia article on ZFS (here http://en.wikipedia.org/wiki/ZFS) and get familiar with its basic principles and features. The ZFS appliance is based on the same filesystem but due to it being an appliance, it’s a little bit different in behaviour.

So let’s see what a customer gets when he decides to go for the Sun appliance instead of EMC infrastructure (such as the Data Domain backup deduplication  system or VNX storage system).

Read more of this post

Oracle Stretched Cluster with VPLEX (update)

One request I got back after my series on Oracle RAC stretched clusters is if I could summarize again why anybody would choose VPLEX for storage replication over other solutions. My attempt was to describe the principles of VPLEX in enough detail for techies to understand it. For non-geeks, I will try to explain it as brief as possible.
Read more of this post

Data Guard or Storage based replication?

A comparison between Oracle (Active) Data Guard and EMC replication for disaster recovery purposes

Panic Button
This is an article I wrote a while ago for customers’ Database Administrators (DBAs) and application managers, that helps them in selecting the right Disaster Recovery tools for their business applications.
It is slightly modified to update new insights and to make it more readable on the web.

Read more of this post

Through the wormhole with Stretched Clusters

Last year, EMC announced a new virtualization product called VPLEX. VPLEX allows logical storage volumes to be accessible from multiple locations. It boldly goes beyond existing storage virtualisation solutions (including those from EMC) in that it is not just a storage virtualisation cluster – but merely a storage federation platform, allowing one virtualized storage volume to be dynamically accessible from multiple locations, as if they were connected through a wormhole, and being built from one or more physical storage volumes.

Wormhole in space
Read more of this post

Stretched Clusters – Alien storage

In my previous posts I described how Oracle ASM can be used to build stretched clusters. I also pointed to some limitations of that scenario. But I am by far not the first one in doing so – and some of EMC’s competitors attempted to build products, features and solutions to overcome some of the limitations in host mirroring.

A while ago, some guys I met from an EMC partner, confronted me with the question why EMC, the market leader in external storage and premium Oracle technology partner, had not offered a solution for these limitations. They pointed to a number of products from competitors that – allegedly – solved the problem already. Also they pointed to the architectural simplicity of these solutions.

Alien Storage

At that time I had no good answer (which does not happen to me very often). I was not aware of how these products worked and I asked some questions on that. In that period I was also confronted by our enterprise customers who started demanding an EMC solution for stretched clustering – so I started digging. Could it be that EMC was over-passed by some of these alien storage start-up companies in continuous available storage solutions? It seemed to be the case.
Read more of this post

Stretched clustering basics

Before showing my preferred solution for Oracle stretched high availability clusters, first some clustering basics.

Active/passive versus Active/Active clusters

NASA cluster

NASA cluster

Most clustering software is based on active/passive scenarios. You have a system (say, a database) that is running on a set of resources (i.e. a server, to keep it simple) and you have another system (a standby system) that is ready to run the system (failover) but is not actually running it at the same time.

An Active/Active system in general means both systems are active at the same time. There is some confusion as this can mean that the standby system is used for other processing (say, A is production and B is standby but currently running acceptance or testing environments).

By my definition, active/active clustering describes a cluster where all cluster nodes (systems) are processing against the same data set at the same time. There aren’t many products that can do this, especially in the database world. One of the few exceptions is Oracle RAC.

Read more of this post

Limitations of host-based mirroring for stretched clusters

For data mirroring, EMC SRDF is sometimes used in such a setup that both servers write to one location only (the “far” server writes across dark fibre links to the local storage). EMC has similar tools (Mirrorview, Recoverpoint, etc) for other storage platforms than Symmetrix.

srdf cluster

SRDF cluster with passive target

Read more of this post

Extreme availability with Oracle stretched clusters

Some of my customers have been pushing for more availability in their Oracle database applications. They want to eliminate downtime completely even if they experience a site failure. Whether this is a real business requirement or a technology push, I’m not sure – I guess a bit of both.

ha_aircraft

Most of these customers have already implemented Oracle RAC (Real Application Clusters), which provides them active/active server clustering for Oracle. If one of the servers in a RAC cluster fails, the others just keep running – no restart or recovery involved. This is a High Availability option typically for local sites.

For Disaster Recovery, most customers have some sort of storage replication (i.e. EMC SRDF/Synchronous or SRDF/Async, or they use Oracle Data Guard for this which replicates data on the Oracle database level). This protects against site failures and offers zero or near-zero dataloss (for committed transactions in Oracle – the non-committed transactions are rolled back during the restart – and this is exactly one of the problems by the way).
Read more of this post