Oracle snapshots and clones with ZFS
August 29, 2012 6 Comments
Another Frequently Asked Question: Is there any disadvantage for a customer in using Oracle/SUN ZFS appliances to create database/application snapshots in comparison with EMC’s cloning/snapshot offerings?
Oracle marketing is pushing materials where they promote the ZFS storage appliance as the ultimate method for database cloning, especially when the source database is on Exadata. Essentially the idea is as follows: backup your primary DB to the ZFS appliance, then create snaps or clones off the backup for testing and development (more explanation in Oracle’s paper and video). Of course it is marketed as being much cheaper, easier and faster than using storage from an Enterprise Storage system such as those offered by EMC.
In order to understand the limitations of the ZFS appliance you need to know the fundamental workings of the ZFS filesystem. I recommend you look at the Wikipedia article on ZFS (here http://en.wikipedia.org/wiki/ZFS) and get familiar with its basic principles and features. The ZFS appliance is based on the same filesystem but due to it being an appliance, it’s a little bit different in behaviour.
ZFS is actually a sort of combination of a volume manager and a file system. In comparing to a classic volume manager, the concept of a ZFS “Zpool” is much like an LVM volume group. In the Zpool you have a default filesystem (which is named the same as the pool) and you can optionally create additional filesystems within the same pool. A ZFS file system cannot span multiple pools.
Now the ZFS snapshot happens on the ZFS filesystem level. So if you have multiple filesystems in the pool for a given database (say, one for data, one for logs, and one for indices) you cannot create a crash-consistent snapshot of that database using ZFS snaps [ update: slightly incorrect, see comments below ]. Even worse if your database spans not only multiple ZFS filesystems but also multiple pools. In those cases you need to fallback to Oracle’s Hot Backup methods and use a bunch of scripting to be able to recover the cloned database afterwards (EMC on the other hand offers technology to create snapshots for backups without even going in hot backup mode).
One size fits all
In one Zpool all volumes must have similar behavior (in terms of performance and size). This means you cannot effectively mix & match multiple drive types in the pool. A customer looking for some kind of storage tiering needs to have a different Zpool for every tier – giving you the consistency problems mentioned. Automatic data movement across the tiers a la FAST-VP is not possible. Oracle is suggesting they have some kind of tiering (they call it “Hybrid storage pools”) but it’s nothing more than one disk type with different sorts of (dirty) cache (DRAM and Flash cache). Marketing ain’t reality.
Also if you want to use SATA (or other low-cost, high-capacity, low-iops) disks for backup then you must have SATA disks for the clone databases as well (remember you cannot create snaps from one zpool to another zpool). So how do you perform an acceptance performance stress test if the production database is on fast Fibre Channel or SAS disks (or even on Flash drives) when the acceptance database is on slow SATA? It just isn’t gonna work…
If the creation of snaps drives up the storage utilization of the file systems beyond 70-80% then the performance of the appliance will slow down, or at least become very unpredictable (according to SUN/Oracle best practices you should not go over 80% allocation on the ZFS filesystems). Of course you can monitor the pool but we all know how that works – somebody kicks off the creation of another snap before leaving for home – or the acceptance test suddenly starts allocating huge chunks of new tablespace data at 3am – Murphy is always around. Note that the performance of backups (100% write) will suffer so the backup window of the primary (production) database might be seriously affected as well.
The snap space in ZFS is shared by the primary data on the same drives, so databases using snapshots cannot be isolated iops-wise from the spindles used for backup or other purposes. There is no way to isolate or prioritize I/O workloads within the zpool.
If, by even more Murphy intervention, the allocation reaches 100%, then the clone databases will abort – as will all running backup jobs writing to the same Zpool. Depending on the case the backup job could just hang forever or fail (I’m not sure which one is worst). If you’d have a separate snap pool (i.e. in a logically separate area) then the snapshots would be affected (in terms of availability and performance) but not the primary data (that’s why EMC uses separate snap areas).
To avoid filling the pool you therefore need lots of empty space hanging around (on energy-consuming, floorspace hogging, pricey spinning disks) – bye bye TCO and efficiency
If you make a snapshot of (Exadata) HCC stored tables then the test and development environments they are talking about, need to be on Exadata as well (otherwise they cannot use the HCC compressed data for testing purposes). No Virtual Machines for testing (not even on Oracle VM) unless you drop HCC on the primary DB or do a painful conversion each time after backup. But Oracle will happily sell you another Exadata so don’t worry.
There is no GUI driven tool such as EMC Replication Manager so everything needs to be scripted (and I know from past experience that in such scripts, the devil is in the details).
Risk on backups
You must have the test and development databases on the same storage system that you use for your backup data. By not physically isolating the backup target, restricting user access, and
abusing it for other purposes you put your backups at risk (i.e. the ZFS appliance – holding your last-line-of-defence backup data – suddenly gets accessed by Unix/Database admins to mess around with security and FS and NFS export settings etc – you need to be aware of this risk…)
Snaps off primary database
You need to have at least one 100% full copy (i.e. RMAN backup or Data Guard standby) of the production database (if it’s on Exadata) before you can make snaps. No direct snaps off primary DB – unless you put your primary database completely on the ZFS appliance (I promise to write in a future post on why you might not want this)
And of course all other ZFS limitations for databases apply (fragmentation performance issues, deduplication doesn’t work well, etc) but I’ll leave that for a future post ;)
[update] Matt Ahrens pointed out (see comments below, thanks!) that it is possible to create consistent snapshots of multiple ZFS filesystems within a pool using the “-r” option (recursive) or using “projects”. If I find time I will test to see if that works. I still don’t see how you could set up a database across different storage tiers (thus multiple zpools) – i.e. FC and SATA disk and maybe even some Flash – and then create consistent snaps. I also failed to mention that ZFS snapshots are read-only so you first have to clone from a snapshot before you can use it to run Oracle databases. For me the capability of making snaps directly off a database and directly mounting those read/write on a test DB was such a n0-brainer that I missed that one completely :-)