QDDA update – 2.2


Quick post to announce QDDA version 2.2 has been published on Github and in the Outrun-Extras YUM repository.

Reminder: The Quick and Dirty Dedupe Analyzer is an Open Source Linux tool that scans disks or files block by block to find duplicate blocks and compression ratios, so that it can report – in detail – what the expected data reduction rate is on a storage array capable of these things. It can be downloaded as standalone executable (QDDA download), as RPM package via YUM or compiled from source (QDDA Install)

QDDA 2.2 adds:

  • DellEMC VMAX and PowerMAX support (using the DEFLATE compression algorithm)
  • bash-completion (by entering on the command line, RPM version only)
  • Improved options for custom storage definitions
  • Internal C++ code improvements (not relevant to the normal user)

Note to other storage vendors: If you’d like your array to be included in the tool, drop me a note with dedupe/compression algorithm details and I’ll see what is possible.

Continue reading

Announcing qdda 2.0

It’s almost a year since I blogged about qdda (the Quick & Dirty Dedupe Analyzer).

qdda is a tool that lets you scan any Linux disk or file (or multiple disks) and predicts potential thin, dedupe and compression savings if you would move that disk/file to an All Flash array like DellEMC XtremIO or VMAX All-flash. In contrast to similar (usually vendor-based) tools, qdda can run completely independent. It does NOT require a registration or sending a binary dataset back to the mothership (which would be a security risk). Anyone can inspect the source code and run it so there are no hidden secrets.

It’s based upon the most widely deployed database engine, SQLite, and uses MD5 hashing and LZ4 compression to produce data reduction estimates.

The reason it took a while to follow-up is because I spent a lot of evening hours to almost completely rewrite the tool. A summary of changes:

  • Run completely as non-privileged user (i.e. ‘nobody’) to make it safe to run on production systems
  • Increased the hash to 60 bits so it scales to at least 80 Terabyte without compromising accuracy
  • Decreased the database space consumption by 50%
  • Multithreading so there are separate readers, workers and a single database updater which allows qdda to use multiple CPU cores
  • Many other huge performance improvements (qdda has demonstrated to scan data at about 7GB/s on a fast server, bottleneck was IO and theoretically could handle double that bandwidth before maxing out on database updates)
  • Very detailed embedded man page (manual). The qdda executable itself can show its own man page (on Linux with ‘man’ installed)
  • Improved standard reports and detailed reports with compression and dedupe histograms
  • Option to define your own custom array definitions
  • Removed system dependencies (SQLite, LZ4, and other libraries) to allow qdda to run at almost any Linux system and can be downloaded as a single executable (no more requirements to install RPM packages)
  • Many other small improvements and additions
  • Completely moved to github – where you can also download the binary

Read the overview and animated demo on the project homepage here: https://github.com/outrunnl/qdda

HTML version of the detailed manual page: https://github.com/outrunnl/qdda/blob/master/doc/qdda.md

As qdda is licensed under GPL it offers no guarantee on anything. My recommendation is to use it for learning purposes or do a first what-if analysis, and if you’re interested in data reduction numbers from the vendor, then ask them for a formal analysis using their own tools. That said, I did a few comparison tests and the data reduction numbers were within 1% of the results from vendor-supported tools. The manpage has a section on accuracy explaining the differences.

Continue reading

The Quick and Dirty Deduplication Analyzer

The best thing about being me… There are so many “me”s.

— Agent Smith, The Matrix Reloaded

One of our customers reported less than optimal space savings on XtremIO running Oracle. In order to test various scenarios with Oracle I was in search of a deduplication analysis method or tool – only to find out that there was nothing available that qualified.

TL;DR: QDDA is an Open Source tool I wrote to analyze Linux files, devices or data streams for duplicate blocks and compression estimates. It can quickly give you an idea of how much storage savings you could get using a modern All-Flash Array like XtremIO. It is safe to use on production systems and allows quick analysis of various test scenarios giving direct results, and even works with files/devices that are in use. No registration or uploading of your confidential data is required.

Continue reading

The IOPS race is over

emc-f1-carInfrastructure has always been a tough place to compete in. Unlike applications, databases or middleware, infrastructure components are fairly easy to replace with another make and model, and thus the vendors try to show off their product as better than the one from the competition.

In case of storage subsystems, the important metrics has always been performance related and IOPS (I/O operations per second) in particular.

I remember a period when competitors of our high-end arrays (EMC Symmetrix, these days usually just called EMC VMAX) tried to artificially boost their benchmark numbers by limiting the data access pattern to only a few megabytes per front-end IO port. This caused their array to handle all I/O in the small memory buffer cache of each I/O port – and none of the I/O’s would really be handled by either central cache memory or backend disks. This way they could boost their IOPS numbers much higher than ours. Of course no real world application would ever only store a few megabytes of data so the numbers were pure bogus – but marketing wise it was an interesting move to say the least.

With the introduction of the first Sun based Exadata (the Exadata V2) late 2009, Oracle also jumped on the IOPS race and claimed a staggering one million IOPS. Awesome! So the gold standard was now 1 million IOPS, and the other players had to play along with the “mine’s bigger than yours” vendor contest.
Continue reading

Baking a cake: trading CPU for IO?

Sometimes I hear people claim that by using faster storage, you can save on database licenses. True or false?

The idea is that many database servers are suffering from IO wait – which actually means that the processors are waiting for data to be transferred to or from storage – and in the meantime, no useful work can be done. Given the expensive licenses that are needed for running commercial database software, usually licensed per CPU core, this then leads to loss of efficiency.

Let’s see if we can visualise the problem here with a common world example – Baking a cake.
 
 

Continue reading

Oracle ASM vs ZFS on XtremIO

zfs-asm-plateBackground

In my previous post on ZFS I showed how ZFS causes fragmentation for Oracle database files. At the end I promised (sort of) to also come back on topic around how this affects database performance. In the meantime I have been busy with many other things, but ZFS issues still sneak up on me frequently. Eventually, I was forced to take another look at this because of two separate customers asking for ZFS comparisons agaisnt ASM at the same time.

The account team for one of the two customers asked if I could perform some testing on their lab environment to show the performance difference between Oracle on ASM and on ZFS. As things happen in this business, things were already rolling before I could influence the prerequisites and the suggested test method. Promises were already made to the customer and I was asked to produce results yesterday.

Without knowledge on the lab environment, customer requirements or even details on the test environment they had set up. Typical day at the office.

In addition to that, ZFS requires a supported host OS – so Linux is out of the question (the status on kernel ZFS for Linux is still a bit unclear and certainly it would not be supported with Oracle). I had been using FreeBSD in my post on fragmentation – because that was my platform of choice at that point (my Solaris skills are, at best, rusty). Of course Oracle on FreeBSD is a no-go so back then, I used NFS to run the database on Linux and ZFS on BSD. Which implicitly solves some of the potential issues whilst creating some new ones, but alas.

Solaris x86

slob-rules-kenteken
This time the idea was to run Oracle on Solaris (x86) that had both ZFS and ASM configured. How to perform a reasonable comparison that also shows the different behavior was unclear and when asking that question to the account team, the conference call line stayed surprisingly silent. All that they indicated up front is that the test tool on Oracle should be SLOB.

Continue reading

Getting the Best Oracle performance on XtremIO

XtremIO+Stack+NB+copy
(Blog repost from Virtual Storage Zone – Thanks to @cincystorage)

UPDATE: I’ll say it again because there seems to be some confusion: THIS IS A REPOST!

Original content is from the Virtual Storage Zone blog (not mine). Just reposted here because it’s interesting and related to Oracle, performance and EMC storage. Enjoy…

XtremIO is EMC’s all-flash scale out storage array designed to delivery the full performance of flash. The array is designed for 4k random I/O, low latency, inline data reduction, and even distribution of data blocks.  This even distribution of data blocks leads to maximum performance and minimal flash wear.  You can find all sorts of information on the architecture of the array, but I haven’t seen much talking about archive maximum performance from an Oracle database on XtremIO.

The nature of XtremIO ensures that’s any Oracle workload (OLTP, DSS, or Hybrid) will have high performance and low latency, however we can maximize performance with some configuration options.  Most of what I’ll be talking about is around RAC and ASM on Redhat Linux 6.x in a Fiber Channel Storage Area Network.

Read the full blogpost here.

 

The public transport company needs new buses

Future-British-Bus-1A public transport company in a city called Galactic City, needs to replace its aging city buses with new ones. It asks three bus vendors what they have to offer and if they can do a live test to see if their claims about performance and efficiency holds up.

The transport company uses the city buses to move people between different locations in the city. The average trip distance is about 2 km. The vendors all prepare their buses for the test. The buses are the latest and greatest, with the most efficient and powerful engines and state of the art technology.

Continue reading

Getting the most out of your server resources

hearseespeak

As an advocate on database virtualization, I often challenge customers to consider if they are using their resources in an optimal way.

And so I usually claim, often in front of a skeptical audience, that physically deployed servers hardly ever reach an average utilization of more than 20 per cent (thereby wasting over 80% of the expensive database licenses, maintenance and options).

Magic is really only the utilization of the entire spectrum of the senses. Humans have cut themselves off from their senses. Now they see only a tiny portion of the visible spectrum, hear only the loudest of sounds, their sense of smell is shockingly poor and they can only distinguish the sweetest and sourest of tastes.

– Michael Scott, The Alchemyst

About one in three times, someone in the audience objects and says that they achieve much better utilization than my stake-in-the-ground 20 percent number, and so use it as a reason (valid or not) for not having to virtualize their databases, for example, with VMware.

Continue reading

Announcing my Openworld 2013 presentation material

oow2013flashLast Tuesday I had the privilege to present at Oracle Openworld 2013 together with Sam Marraccini (the guy with the big smile here in the pic) from EMC’s Flash products division. Sam introduced the various EMC Flash offerings we have, and I discussed some experiences and best practices from the field. We really got lots of interaction with the audience, and many questions (at one point I was looking at about 5 hands raised simultaneously) which caused me to run out of time finishing some of the best practices I planned to discuss at the end. But interaction is always better than just us talking so I got the feeling the session was successful – although I’d like to hear from people in the audience what their thoughts are (feel free to comment!)

When people started to make snapshots of the slides with their iPhones, we promised the audience to make the slides available ASAP. So here they are. They will probably also be available via Oracle’s OOW pages within time. Continue reading