Data Guard protecting from EMC block corruptions?
April 16, 2012 20 Comments
Today I was giving a training to fellow EMC colleagues on some Oracle fundamentals. One of the things that was mentioned is something I have heard several times before: Oracle is claiming that EMC SRDF (a data mirroring function from EMC Symmetrix enterprise storage systems mainly to provide enterprise disaster recovery functions) cannot detect certain types of data corruption where Oracle Data Guard can. Ouch. The trouble with this statement is that it is half-true (and these ones are the most dangerous).
Now I am not disqualifying Data Guard as a valid database replication tool, and both EMC SRDF and Data Guard have advantages and disadvantages as I have described before, but I hate being confronted with misconceptions (possibly FUD but let’s hope it’s just misunderstanding of technology) about our products, especially if they have been protecting our customers since 1995 against disasters (and I can tell you first hand these DO happen, unexpectedly and in a way never anticipated – and in all of the cases I have seen, SRDF worked as designed and protected our customer from severe data loss).
So here is a bit more explanation on this statement. The reasoning is as follows. Consider a primary site with Oracle Database A and EMC Symmetrix A. Symmetrix A replicates (mirrors) the data to the failover site B with standby database server B and Symmetrix B. The data is in sync and everything is working nicely.
Now something happens in the I/O stack (say, flaky HBA or half-broken cable) and a database write is malformed and written with wrong contents to Symmetrix A. Symmetrix A happily replicates the faulty data to site B and the corruption goes undetected (SRDF is a data mirroring tool; so it’s intentionally based on GIGO )
Compare with a situation where Oracle Data Guard replicates from site A to site B by shipping redo and archive logfiles.
As the log shipping mechanism does not involve database writes, the same corruption will mess up the production database A but not the failover database B (because database writes are not mirrored).
Preliminary (but faulty) conclusion: Data Guard is better than SRDF because it prevents you from block corruptions.
Any wrong information in that statement? No. It is factually correct. SRDF is not designed to correct or even detect logical corruptions caused elsewhere (for that you have a whole bunch of other tools and methods).
The problem is, that the statement does not tell you the full story. Back to the Data Guard scenario again. Let’s assume for a moment that the corruption caused by broken cables is not happening on the primary, but on the standby site. So the (correct) database write goes to Symmetrix A, and Data Guard sends the (correct) data to database B where its database writer writes (initially correct) data that gets corrupted and written to the storage subsystem B (which can be any type of storage for that matter).
How is Data Guard now going to detect and prevent the corruption from happening? I think it will not do this at all, and the corrupted block will sit happily waiting for the database to fail-over (because of some kind of outage or disaster at site A) and only then the corruption is detected (right at the point when you have other issues to deal with, it’s a disaster, remember?). This can be three months later when all backups and archive logs to recover the corruption might have been expired and overwritten (plus they might have existed in the wrong – now toasted – location anyway). It may be detected earlier (if the standby database is doing database verifies of some sort frequently) but even then it is not Data Guard that detects the corruption, but some manually kicked off verification process.
Consider the same scenario again with EMC SRDF. Database A is writing (correct) data to Symmetrix A which is using SRDF to write the (correct) data to database B. As there is no active database writer at site B there can be no corruption (where this *was* possible when using Data Guard).
So my take on this? By using Data Guard you “protect” (quotes intentionally and meant cynical) from one type of corruption but you silently introduce the possibility of another one.
I keep having problems trying to see that as an advantage over SRDF.