QDDA update – 2.2


Quick post to announce QDDA version 2.2 has been published on Github and in the Outrun-Extras YUM repository.

Reminder: The Quick and Dirty Dedupe Analyzer is an Open Source Linux tool that scans disks or files block by block to find duplicate blocks and compression ratios, so that it can report – in detail – what the expected data reduction rate is on a storage array capable of these things. It can be downloaded as standalone executable (QDDA download), as RPM package via YUM or compiled from source (QDDA Install)

QDDA 2.2 adds:

  • DellEMC VMAX and PowerMAX support (using the DEFLATE compression algorithm)
  • bash-completion (by entering on the command line, RPM version only)
  • Improved options for custom storage definitions
  • Internal C++ code improvements (not relevant to the normal user)

Note to other storage vendors: If you’d like your array to be included in the tool, drop me a note with dedupe/compression algorithm details and I’ll see what is possible.

Continue reading

Announcing qdda 2.0

It’s almost a year since I blogged about qdda (the Quick & Dirty Dedupe Analyzer).

qdda is a tool that lets you scan any Linux disk or file (or multiple disks) and predicts potential thin, dedupe and compression savings if you would move that disk/file to an All Flash array like DellEMC XtremIO or VMAX All-flash. In contrast to similar (usually vendor-based) tools, qdda can run completely independent. It does NOT require a registration or sending a binary dataset back to the mothership (which would be a security risk). Anyone can inspect the source code and run it so there are no hidden secrets.

It’s based upon the most widely deployed database engine, SQLite, and uses MD5 hashing and LZ4 compression to produce data reduction estimates.

The reason it took a while to follow-up is because I spent a lot of evening hours to almost completely rewrite the tool. A summary of changes:

  • Run completely as non-privileged user (i.e. ‘nobody’) to make it safe to run on production systems
  • Increased the hash to 60 bits so it scales to at least 80 Terabyte without compromising accuracy
  • Decreased the database space consumption by 50%
  • Multithreading so there are separate readers, workers and a single database updater which allows qdda to use multiple CPU cores
  • Many other huge performance improvements (qdda has demonstrated to scan data at about 7GB/s on a fast server, bottleneck was IO and theoretically could handle double that bandwidth before maxing out on database updates)
  • Very detailed embedded man page (manual). The qdda executable itself can show its own man page (on Linux with ‘man’ installed)
  • Improved standard reports and detailed reports with compression and dedupe histograms
  • Option to define your own custom array definitions
  • Removed system dependencies (SQLite, LZ4, and other libraries) to allow qdda to run at almost any Linux system and can be downloaded as a single executable (no more requirements to install RPM packages)
  • Many other small improvements and additions
  • Completely moved to github – where you can also download the binary

Read the overview and animated demo on the project homepage here: https://github.com/outrunnl/qdda

HTML version of the detailed manual page: https://github.com/outrunnl/qdda/blob/master/doc/qdda.md

As qdda is licensed under GPL it offers no guarantee on anything. My recommendation is to use it for learning purposes or do a first what-if analysis, and if you’re interested in data reduction numbers from the vendor, then ask them for a formal analysis using their own tools. That said, I did a few comparison tests and the data reduction numbers were within 1% of the results from vendor-supported tools. The manpage has a section on accuracy explaining the differences.

Continue reading

The Quick and Dirty Dedupe Analyzer – Part 1 – Hands on

As announced in my last blogpost, qdda is a tool that analyzes potential storage savings by scanning data and giving a deduplication, compression and thin provisioning estimate. The results are an indication whether a modern All-Flash Array (AFA) like Dell EMC XtremIO would be worth considering.

In this (lenghty) post I will go over the basics of qdda and run a few synthetic test scenarios to show what’s possible. The next posts will cover more advanced scenarios such as running against Oracle database data, multiple nodes and other exotic ones such as running against ZFS storage pools.

[ Warning: Lengthy technical content, Rated T, parental advisory required ]

Continue reading

The Quick and Dirty Deduplication Analyzer

The best thing about being me… There are so many “me”s.

— Agent Smith, The Matrix Reloaded

One of our customers reported less than optimal space savings on XtremIO running Oracle. In order to test various scenarios with Oracle I was in search of a deduplication analysis method or tool – only to find out that there was nothing available that qualified.

TL;DR: QDDA is an Open Source tool I wrote to analyze Linux files, devices or data streams for duplicate blocks and compression estimates. It can quickly give you an idea of how much storage savings you could get using a modern All-Flash Array like XtremIO. It is safe to use on production systems and allows quick analysis of various test scenarios giving direct results, and even works with files/devices that are in use. No registration or uploading of your confidential data is required.

Continue reading

Oracle ASM vs ZFS on XtremIO

zfs-asm-plateBackground

In my previous post on ZFS I showed how ZFS causes fragmentation for Oracle database files. At the end I promised (sort of) to also come back on topic around how this affects database performance. In the meantime I have been busy with many other things, but ZFS issues still sneak up on me frequently. Eventually, I was forced to take another look at this because of two separate customers asking for ZFS comparisons agaisnt ASM at the same time.

The account team for one of the two customers asked if I could perform some testing on their lab environment to show the performance difference between Oracle on ASM and on ZFS. As things happen in this business, things were already rolling before I could influence the prerequisites and the suggested test method. Promises were already made to the customer and I was asked to produce results yesterday.

Without knowledge on the lab environment, customer requirements or even details on the test environment they had set up. Typical day at the office.

In addition to that, ZFS requires a supported host OS – so Linux is out of the question (the status on kernel ZFS for Linux is still a bit unclear and certainly it would not be supported with Oracle). I had been using FreeBSD in my post on fragmentation – because that was my platform of choice at that point (my Solaris skills are, at best, rusty). Of course Oracle on FreeBSD is a no-go so back then, I used NFS to run the database on Linux and ZFS on BSD. Which implicitly solves some of the potential issues whilst creating some new ones, but alas.

Solaris x86

slob-rules-kenteken
This time the idea was to run Oracle on Solaris (x86) that had both ZFS and ASM configured. How to perform a reasonable comparison that also shows the different behavior was unclear and when asking that question to the account team, the conference call line stayed surprisingly silent. All that they indicated up front is that the test tool on Oracle should be SLOB.

Continue reading

Getting the Best Oracle performance on XtremIO

XtremIO+Stack+NB+copy
(Blog repost from Virtual Storage Zone – Thanks to @cincystorage)

UPDATE: I’ll say it again because there seems to be some confusion: THIS IS A REPOST!

Original content is from the Virtual Storage Zone blog (not mine). Just reposted here because it’s interesting and related to Oracle, performance and EMC storage. Enjoy…

XtremIO is EMC’s all-flash scale out storage array designed to delivery the full performance of flash. The array is designed for 4k random I/O, low latency, inline data reduction, and even distribution of data blocks.  This even distribution of data blocks leads to maximum performance and minimal flash wear.  You can find all sorts of information on the architecture of the array, but I haven’t seen much talking about archive maximum performance from an Oracle database on XtremIO.

The nature of XtremIO ensures that’s any Oracle workload (OLTP, DSS, or Hybrid) will have high performance and low latency, however we can maximize performance with some configuration options.  Most of what I’ll be talking about is around RAC and ASM on Redhat Linux 6.x in a Fiber Channel Storage Area Network.

Read the full blogpost here.