Announcing qdda 2.0

It’s almost a year since I blogged about qdda (the Quick & Dirty Dedupe Analyzer).

qdda is a tool that lets you scan any Linux disk or file (or multiple disks) and predicts potential thin, dedupe and compression savings if you would move that disk/file to an All Flash array like DellEMC XtremIO or VMAX All-flash. In contrast to similar (usually vendor-based) tools, qdda can run completely independent. It does NOT require a registration or sending a binary dataset back to the mothership (which would be a security risk). Anyone can inspect the source code and run it so there are no hidden secrets.

It’s based upon the most widely deployed database engine, SQLite, and uses MD5 hashing and LZ4 compression to produce data reduction estimates.

The reason it took a while to follow-up is because I spent a lot of evening hours to almost completely rewrite the tool. A summary of changes:

  • Run completely as non-privileged user (i.e. ‘nobody’) to make it safe to run on production systems
  • Increased the hash to 60 bits so it scales to at least 80 Terabyte without compromising accuracy
  • Decreased the database space consumption by 50%
  • Multithreading so there are separate readers, workers and a single database updater which allows qdda to use multiple CPU cores
  • Many other huge performance improvements (qdda has demonstrated to scan data at about 7GB/s on a fast server, bottleneck was IO and theoretically could handle double that bandwidth before maxing out on database updates)
  • Very detailed embedded man page (manual). The qdda executable itself can show its own man page (on Linux with ‘man’ installed)
  • Improved standard reports and detailed reports with compression and dedupe histograms
  • Option to define your own custom array definitions
  • Removed system dependencies (SQLite, LZ4, and other libraries) to allow qdda to run at almost any Linux system and can be downloaded as a single executable (no more requirements to install RPM packages)
  • Many other small improvements and additions
  • Completely moved to github – where you can also download the binary

Read the overview and animated demo on the project homepage here: https://github.com/outrunnl/qdda

HTML version of the detailed manual page: https://github.com/outrunnl/qdda/blob/master/doc/qdda.md

As qdda is licensed under GPL it offers no guarantee on anything. My recommendation is to use it for learning purposes or do a first what-if analysis, and if you’re interested in data reduction numbers from the vendor, then ask them for a formal analysis using their own tools. That said, I did a few comparison tests and the data reduction numbers were within 1% of the results from vendor-supported tools. The manpage has a section on accuracy explaining the differences.

Continue reading

Oracle Data Placement on XtremIO

xtremio-oracle-logo
Many customers these days are implementing Oracle on XtremIO so they benefit from excellent, predictable performance and other benefits such as inline compression and deduplication, snapshots, ease of use etc.

Those benefits come at a price and if you just consider XtremIO on a usable gigabyte basis, it does not come cheap. Things change if you calculate the savings due to those special features. Still, customers are trying to get the best bang for the buck, and so I got a question from one customer if it would make sense to place only Oracle datafiles on XtremIO and leave everything else on classic EMC storage. This would mean redo logs, archive logs, control files, temp tables, binaries and everything else, *except* the datafiles, would be stored on an EMC VNX or VMAX. The purpose of course is to only have things that require fast random reads (the tables) on XtremIO.

I can clearly see the way of thinking but my response was to change the layout slightly. I highly recommend to place everything that is needed to make up a database in a consistent way, on the same storage box.

Continue reading

Putting an end to the password jungle

manypwdsWith my blog audience all being experts in the IT industry (I presume), I think we are all too familiar with the problems of classic password security mechanisms.

Humans are just not good at remembering long meaningless strings of tokens, especially if they need to be changed every so many months and having to keep track of many of those at the same time.
Some security experts blame humans. They say you should create strong passwords, not use a single password for different purposes, not write them down on paper – or worse – in an unencrypted form somewhere on your computer.

I disagree. I think the fundamental problem is within information technology itself. We invented computers to make life easier for ourselves – well, actually, that’s not true, ironically we invented them primarily to break military encryption codes. But the widespread adoption of computing happened because of the promise of making our lives easier.

I myself use a password manager (KeePass) to make my life a bit easier. There are many password manager tools available, and they solve part of the problem: keeping track of what password was used for what purpose. I now only need to remember one (hopefully, strong enough) password to access the password database and from there I just use the tool to log me in to websites, corporate networks and other services (let’s refer to all of those as “cloud servers”).

The many problems with passwords

The fundamental problem remains – even when using a password manager: passwords are no good for protecting our sensitive data or identity.

Continue reading

The public transport company needs new buses

Future-British-Bus-1A public transport company in a city called Galactic City, needs to replace its aging city buses with new ones. It asks three bus vendors what they have to offer and if they can do a live test to see if their claims about performance and efficiency holds up.

The transport company uses the city buses to move people between different locations in the city. The average trip distance is about 2 km. The vendors all prepare their buses for the test. The buses are the latest and greatest, with the most efficient and powerful engines and state of the art technology.

Continue reading

Oracle RAC on VPLEX now certified

Last week EMC announced that Oracle RAC on VPLEX stretched clusters is now officially supported and certified by Oracle!

News Summary:

 

  • Oracle has certified that EMC® VPLEX™ METRO in a stretch cluster configuration can provide Oracle Real Application Clusters (Oracle RAC) customers with an easy-to-deploy, active/active solution, as they transform from single- to dual-site environments.
  • Having passed Oracle’s rigorous testing standards, the EMC VPLEX METRO solution can enable Oracle RAC to be easily configured over extended distances while enabling simultaneous access to the same data at both locations.

This is the final step in a process to help customers that have been asking for true active/active support over distance for their mission-critical Oracle Database business processes.

For those who are not yet familiar with this solution, here is a small summary:

  • Customers have been in search for ways to survive datacenter failures (i.e. “disasters”) without the need to recover and restart the databases, in such a way that any component failure or even complete site failure would not lead to database downtime
  • This was not possible before except when deploying complex configurations based on host mirroring using Oracle ASM or a 3rd party volume manager. (note that competing storage virtualization products from other storage vendors also do not offer this full capability – even though their marketing might make it seem so)
  • EMC VPLEX offers this functionality which is now completely certified and supported by Oracle, and the solution avoids risk by making the stretched cluster deployment as easy as a basic Oracle RAC install
  • The VPLEX solution offers additional benefits including better performance, better recovery from issues such as component or link failures and offers a complete solution for the whole application stack, not just Oracle
  • Note that AFAIK this solution should also work for IBM DB2 (but I haven’t confirmed)

The full news release can be found here: http://www.emc.com/about/news/press/2012/20120517-04.htm

A full series of blog posts on this solution can be found here: https://bartsjerps.wordpress.com/category/vplex/

The VPLEX witness (the final component of VPLEX that made this possible) was announced last year at EMC World 2011. Typically we see the start of market adoption between 1 to 1.5 years after bringing new technology in the market. I am working on a few customers myself who are on the edge of starting a project with this, hopefully by the end of year we have a set of good customer references!

Update: The new white paper can be found here: http://www.emc.com/collateral/software/white-papers/h8930-vplex-metro-oracle-rac-wp.pdf

Update 2: VPLEX support mentioned (briefly) on Oracle’s website: http://www.oracle.com/technetwork/database/enterprise-edition/tech-generic-linux-new-086754.html

Update 3: Demos available on EMC Demo Center:

EMC VPLEX Metro for Oracle RAC Solution Overview
Oracle RAC with VPLEX Metro Site Failure
Oracle RAC with VPLEX Metro Solution Overview
Oracle RAC with VPLEX Storage Failure

If you’re a frequent reader of my blog you might recognize familiar pictures there ;)

Oracle Stretched Cluster with VPLEX (update)

One request I got back after my series on Oracle RAC stretched clusters is if I could summarize again why anybody would choose VPLEX for storage replication over other solutions. My attempt was to describe the principles of VPLEX in enough detail for techies to understand it. For non-geeks, I will try to explain it as brief as possible.
Continue reading

Monkey Business

Monkey eating bananaMaybe you have heard the story of the Monkey Experiment. It is about an experiment with a bunch of monkeys in a cage, a ladder, and a banana. At a certain point one of the monkeys sees the banana hanging up high, starts climbing the ladder, and then the researcher sprays all monkeys with cold water. The climbing monkey tumbles down before even getting the banana, looks puzzled, wait until he’s dry again and his ego back on its feet. He tries again, same result, all monkeys are sprayed wet. Some of the others try it a few times until they learn: don’t climb for the banana or you will get wet and cold.

The second part of the experiment becomes more interesting. The researcher removes one of the monkeys and replaces him with a fresh, dry monkey with an unharmed ego. After a while he spots the banana, wonders to himself why the other monkeys are so stupid not to go for the banana, and gives it a try. But when reaching the ladder, the other monkeys kick his ass and make it very clear he is not supposed to do so. After the new monkey is conditioned not to go for the banana, the researcher replaces the “old” monkeys, one by one, with new ones. Every new monkey goes for the banana until he learns not to do so.

Eventually the cage is full of monkeys who know that they are not allowed to climb the ladder to get the banana. None of them knows why – it’s just the way it is and always has been…
Continue reading

Through the wormhole with Stretched Clusters

Last year, EMC announced a new virtualization product called VPLEX. VPLEX allows logical storage volumes to be accessible from multiple locations. It boldly goes beyond existing storage virtualisation solutions (including those from EMC) in that it is not just a storage virtualisation cluster – but merely a storage federation platform, allowing one virtualized storage volume to be dynamically accessible from multiple locations, as if they were connected through a wormhole, and being built from one or more physical storage volumes.

Wormhole in space
Continue reading

Information Lifecycle Management and Oracle databases – part 3

Archiving and purging old data

In the end, if you want to seriously reduce the effective size of a database (after using all innovations on the infrastructure level) is to move data out of the database on to something else. This is a bit against Oracle’s preferred approach as they propose to hold as much of the application data in the database for as long as possible (I wonder why…)

We could separate all archiving methods into two categories:

  • Methods that don’t change the RDBMS representation and just move tables or records to a different location in the same or different database;
  • Methods that convert database records into something else and remove it from the database layer completely

Continue reading