Tales from the past – Disaster Recovery testing

A long time ago in a datacenter far, far away….

Turmoil has engulfed the IT landscape. Within the newly formed digital universe,
corporate empires are becoming more and more
dependent on their digital data and computer systems.
To avoid downtime when getting hit by an evil strike, the corporations are
starting to build disaster recovery capabilities in their operational architectures.

While the congress of the Republic endlessly debates whether
the high cost of decent recovery methods is justified,
the Supreme CIO Chancellor has secretly dispatched a Jedi Apprentice,
one of the guardians of reliability and availability,
to validate existing recovery plans…

Another story from my days as UNIX engineer in the late nineties. I obfuscated all company or people names to protect their reputation or disclose sensitive information, but former colleagues might recognize parts of the stories or maybe everything. Also, some of it is a long time ago and I cannot be sure all I say is factually correct. The human memory is notoriously unreliable.

oobsignIn those days, our company was still relying on tape backup as the only Disaster Recovery (DR) strategy. The main datacenter had a bunch of large tape silos, where, on a daily basis, trays of tapes were unloaded, packed and labeled in a small but strong suitcase, and sent to an off-site location (Pickup Truck Access Method) so the invaluable data could be salvaged in case our entire datacenter would go up in flames.

Read more of this post

Introducing Outrun for Oracle

Overview

outrun-logo-transparentIf you want to get your hands dirty with Oracle database, the first thing you have to do is build a system that actually runs Oracle database. Unless you have done that several times before, chances are that this will take considerable time spent on trial-and-error, several reinstalls, fixing install problems and dependencies and so on. The time it takes for someone who is reasonably experienced on Linux, but has no prior Oracle knowledge, would probably range from a full working day (8 hours, best case) to many days. I also have witnessed people actually giving up.

Even for experienced users, doing the whole process manually over and over again is very time consuming, and deploying five or more systems by hand is a guarantee that each one of them is slightly different – and thus a candidate for subtle problems that happen on one but not the others. Virtualization and consolidation is all about consistency and making many components as if they were only one.

There are literally dozens of web pages (such as blog posts) that contain detailed instructions on how to set up Oracle on a certain platform. Some examples:

The Gruff DBA – Oracle 12cR1 12.1.0.1 2-node RAC on CentOS 6.4 on VMware Workstation 9 – Introduction
Pythian – How to Install Oracle 12c RAC: A Step-by-Step Guide
Martin Bach – Installing Oracle 12.1.0.2 RAC on Oracle Linux 7-part 1

Even if you follow the guidelines in such articles, you are likely to run into problems due to running a different OS, different Oracle version, network problems, and so on. Not to mention that in many cases the “best practices” provided by various vendors are often not honoured because they tend to be overlooked due to information overload…

Some people have hinted to use automated deployment tools such as Ansible (i.e. Frits Hoogland – Using Ansible for executing Oracle DBA tasks) but there are (as far as I know) no complete out-of-the-box solutions.

EMC has published several white papers and reference architectures with instructions on how to set up Oracle to run best on EMC. Still, some of the papers are not a step-by-step manual so you have to extract configuration details manually from various (sometimes conflicting) sources and convert them in configuration file entries, commands, etc.

So I decided a while ago to go for a different approach, and build a virtual appliance that does all of these things for you while still offering (limited) flexibility in different platform and versions, and preferences for configuration.

Read more of this post

Tales from the past – Overheated Datacenter

A long time ago in a datacenter far, far away….

It is a period of digital revolution.

Rebel Dot Com companies, striking from hidden basements and secret lofts,
have won their first fights against long-standing evil corporate empires.

During the battles, rebel geeks have managed to invent secret technology to
replace corporations old ultimate weapons,
such as snail mail and public telephone networks currently powering the entire planet.

Contracted by the Empire’s sinister CIOs, the UNIX Engineer and author of this blog
races against the clock across the UNIX root directories,
to prepare new IT infrastructure for the upcoming battle –
while at the same time, trying to keep the old weapons of mass applications available and running
as best as he can to safeguard the customers freedom in the digital galaxy.

In the late nineties, before I switched to the light side of the Force and joined EMC, I was UNIX engineer and working as a contractor for financial institutions. This is a first in a number of stories from that period and later. I obfuscated all company or people names to protect their reputation or disclose sensitive information, but former colleagues might recognize parts of the stories or maybe everything. Also, some of it is a long time ago and I cannot be sure all I say is factually correct. The human memory is notoriously unreliable.

heatwave
It was a friday late afternoon.

Everyone in my department already left for the weekend, but I was working on critical infrastructure project that was on a tight deadline, otherwise I guess I would have left already, too.

At some point I needed to re-install a UNIX server, which in those days was done by physically booting them from an install CD – so I needed to go to the datacenter room and get physical console access to get that going. I walked to the datacenter floor, which hosted several large UNIX systems, a mainframe, a number of EMC Symmetrix storage systems, network gear, lots of Intel servers mostly running Windows NT and maybe a few Novell.

There were large tape libraries for backup, lots of server racks, fire extinguishers and whatever you typically find in a large datacenter floor like that. I used my keycard to open the door to the datacenter and stepped in… The first thing I thought was, wow, it’s warm in here…

Read more of this post

Comparing database replication features

It’s still a hot topic in my customer conversations: Should we use Oracle Data Guard or something else for providing disaster recovery?
I’ve written an explanation a while ago. Recently I also created a powerpoint slide comparing various features – in an attempt to be as unbiased as possible (I think I partly succeeded ;)

I’ve put the comparison in a static page on my blog and will update it any time I get new insights or think I can improve it otherwise.

View the comparison here: Comparing DR features

This post first appeared on Dirty Cache by Bart Sjerps. Copyright © 2011 – 2015. All rights reserved. Not to be reproduced for commercial purposes without written permission.

The Oracle Parking Garage

Oracle parking garage

(Thanks to House of Brick Technologies)

 

Oracle Data Placement on XtremIO

xtremio-oracle-logo
Many customers these days are implementing Oracle on XtremIO so they benefit from excellent, predictable performance and other benefits such as inline compression and deduplication, snapshots, ease of use etc.

Those benefits come at a price and if you just consider XtremIO on a usable gigabyte basis, it does not come cheap. Things change if you calculate the savings due to those special features. Still, customers are trying to get the best bang for the buck, and so I got a question from one customer if it would make sense to place only Oracle datafiles on XtremIO and leave everything else on classic EMC storage. This would mean redo logs, archive logs, control files, temp tables, binaries and everything else, *except* the datafiles, would be stored on an EMC VNX or VMAX. The purpose of course is to only have things that require fast random reads (the tables) on XtremIO.

I can clearly see the way of thinking but my response was to change the layout slightly. I highly recommend to place everything that is needed to make up a database in a consistent way, on the same storage box.

Read more of this post

Oracle ASM vs ZFS on VNX

swiss-cheeseIn my last post on ZFS I shared results of a lab test where ZFS was configured on Solaris x86 and using XtremIO storage. A strange combination maybe but this is what a specific customer asked for.

Another customer requested a similar test with ZFS versus ASM but on Solaris/SPARC and on EMC VNX. Also very interesting as on VNX we’re using spinning disk (not all-flash) so the effects of fragmentation over time should be much more visible.

So with support of the local administrators, I performed a similar test as the one before: start on ASM and get baseline random and sequential performance numbers, then move the tablespace (copy) to ZFS so you start off with as little fragmentation as possible. Then run random read/write followed by sequential read, multiple times and see how the I/O behaves.
Read more of this post

Oracle ASM vs ZFS on XtremIO

zfs-asm-plateBackground

In my previous post on ZFS I showed how ZFS causes fragmentation for Oracle database files. At the end I promised (sort of) to also come back on topic around how this affects database performance. In the meantime I have been busy with many other things, but ZFS issues still sneak up on me frequently. Eventually, I was forced to take another look at this because of two separate customers asking for ZFS comparisons agaisnt ASM at the same time.

The account team for one of the two customers asked if I could perform some testing on their lab environment to show the performance difference between Oracle on ASM and on ZFS. As things happen in this business, things were already rolling before I could influence the prerequisites and the suggested test method. Promises were already made to the customer and I was asked to produce results yesterday.

Without knowledge on the lab environment, customer requirements or even details on the test environment they had set up. Typical day at the office.

In addition to that, ZFS requires a supported host OS – so Linux is out of the question (the status on kernel ZFS for Linux is still a bit unclear and certainly it would not be supported with Oracle). I had been using FreeBSD in my post on fragmentation – because that was my platform of choice at that point (my Solaris skills are, at best, rusty). Of course Oracle on FreeBSD is a no-go so back then, I used NFS to run the database on Linux and ZFS on BSD. Which implicitly solves some of the potential issues whilst creating some new ones, but alas.

Solaris x86

slob-rules-kenteken
This time the idea was to run Oracle on Solaris (x86) that had both ZFS and ASM configured. How to perform a reasonable comparison that also shows the different behavior was unclear and when asking that question to the account team, the conference call line stayed surprisingly silent. All that they indicated up front is that the test tool on Oracle should be SLOB.

Read more of this post

Featherweight Linux VNC services

This article describes how to set up a very lightweight VNC service under CentOS/Red Hat.

Virtual_Network_Computing_(logo).svg

Intro

In Red Hat Enterprise Linux (and derivates, I use CentOS) you can run a VNC service to allow graphical connections to a linux system. I was looking for a very lightweight VNC service (no fancy desktop with all the bells and whistles, just something that lets me do some stuff that requires an X session and run an Xterm – such as installing Oracle or running Swingbench, without using another host with an X client). In other words, a typical service for virtual machines that run as servers (such as database servers, web servers, etc).

CentOS standard method

I tried the standard documented way to do this in CentOS: CentOS Virtual Network Computing using the standard tigervnc-server method, but found a few issues with the way they set it up:

  • For every user requiring VNC services, you need to customize the configuration
  • If one user deletes or corrupts his VNC password file, the whole service stops working (fix via normal SSH login but requires skilled user)
  • If a user messes up his xstartup file he is locked out (fix via normal SSH login but requires skilled user)
  • Users need 2 passwords: for their (own) VNC service, and the usual one for Linux
  • Their X window and VNC processes are always running and thus eating resources even if not used
  • If their X session hangs (i.e. window manager killed, or simple logout) it’s hard or even impossible to clean up and restart (see section 4 in the mentioned article: Recovery from a logout) without resetting the whole VNC service
  • Every user requires a separate, unique TCP port

All by all, nice and easy for a small test server with a few users, but no good for larger environments. The good thing is that the desktops are persistent, i.e. you may disconnect and reconnect later and the VNC session will be as you left it. And you can install lighter desktop environments (twm or openmotif) instead of the huge and heavy Gnome desktop.

But I was looking for something better.

Read more of this post

Fun with Linux UDEV and ASM: Using UDEV to create ASM disk volumes

floppy-disksBecause of the many discussions and confusion around the topic of partitioning, disk alignment and it’s brother issue, ASM disk management, hereby an explanation on how to use UDEV, and as an extra, I present a tool that manages some of this stuff for you.

The questions could be summarized as follows:

  • When do we have issues with disk alignment and why?
  • What methods are available to set alignment correctly and to verify?
  • Should we use ASMlib or are there alternatives? If so, which ones and how to manage those?

I’ve written 2 blogposts on the matter of alignment so I am not going to repeat myself on the details. The only thing you need to remember is that classic “MS-DOS” disk partitioning, by default, starts the first partition on the disk at the wrong offset (wrong in terms of optimal performance). The old partitioning scheme was invented when physical spinning rust was formatted with 63 sectors of 512 bytes per disk track each. Because you need some header information for boot block and partition table, the smart guys back then thought it was a good idea to start the first block of the first data partition on track 1 (instead of track 0). These days we have completely different physical disk geometries (and sometimes even different sector sizes, another interesting topic) but we still have the legacy of the old days.

If you’re not using an Intel X86_64 based operating system then chances are you have no alignment issues at all (the only exception I know is Solaris if you use “fdisk”, similar problem). If you use newer partition methods (GPT) then the issue is gone (but many BIOSes, boot methods and other tools cannot handle GPT). As MSDOS partitioning is limited to 2 TiB (http://en.wikipedia.org/wiki/Master_boot_record) it will probably be a thing of the past in a few years but for now we have to deal with it.

Wrong alignment causes some reads and writes to be broken in 2 pieces causing extra IOPS. I don’t have hard numbers but a long time ago I was told it could be an overhead of up to 20%. So we need to get rid of it.

ASM storage configuration

ASM does not use OS file systems or volume managers but has its own way of managing volumes and files. It “eats” block devices and these block devices need to be read/write for the user/group that runs the ASM instance, as well as the user/group that runs Oracle database processes (a public secret is that ASM is out-of-band and databases write directly to ASM data chunks). ASM does not care what the name or device numbers are of a block device, neither does it care whether it is a full disk, a partition, or some other type of device as long as it behaves as a block device under Linux (and probably other UNIX flavors). It does not need partition tables at all but writes its own disk signatures to the volumes it gets.

[ Warning: Lengthy technical content, Rated T, parental advisory required ]

Read more of this post

Follow

Get every new post delivered to your Inbox.

Join 459 other followers