The Quick and Dirty Deduplication Analyzer

The best thing about being me… There are so many “me”s.

— Agent Smith, The Matrix Reloaded

One of our customers reported less than optimal space savings on XtremIO running Oracle. In order to test various scenarios with Oracle I was in search of a deduplication analysis method or tool – only to find out that there was nothing available that qualified.

TL;DR: QDDA is an Open Source tool I wrote to analyze Linux files, devices or data streams for duplicate blocks and compression estimates. It can quickly give you an idea of how much storage savings you could get using a modern All-Flash Array like XtremIO. It is safe to use on production systems and allows quick analysis of various test scenarios giving direct results, and even works with files/devices that are in use. No registration or uploading of your confidential data is required.

Read more of this post

Looking forward: 2016

We’re already over one week in 2016 and I realize I haven’t done much blogging lately.

One of the things that kept me busy is development on Outrun, and the joint Oracle / EMC Solution Center (OSC) on which I intend to write a bit more going forward.

Something I did about a year ago (without mentioning it too much) is upgrade my WordPress.com account to Professional. Not that I really need the extra add-ons, but I want my readers not to be disturbed by ads – OK, there’s ad blockers, but not everyone uses them, and on some platforms you simply can’t (iOS). Dirty Cash well spent (and no, I don’t get it reimbursed by my employer if you’d think that, my blog is mine, mine only and independent).

adblockwelcomeGiven that the number of page views on Dirty Cache passed a quarter million last year (thanks to all my readers), can you imagine the savings in bandwidth and productivity loss by not showing ads? ;-)

So what else can you expect from me this year?

Of course, more about running Oracle on EMC and why I think that’s a pretty good idea. As the competition with Oracle is heating up, I intend to write more on comparing the differences between the solutions of both companies, debunking some marketing and competitive claims, and more. I also hope to find time to maintain the wiki on the Outrun site, and in addition to Outrun documentation, it might be a good place to put Oracle / EMC related howto’s, best practices, FAQs and more.

You also might be wondering what’s going to happen around Oracle / EMC solutions during the Dell / EMC aquisition… Me too. But we can’t (and are not allowed to) comment on it until the merger is final. Until then, business as usual. When the time is ready I’ll comment on new Dell / EMC / Oracle stuff where possible.

Read more of this post

Putting an end to the password jungle

manypwdsWith my blog audience all being experts in the IT industry (I presume), I think we are all too familiar with the problems of classic password security mechanisms.

Humans are just not good at remembering long meaningless strings of tokens, especially if they need to be changed every so many months and having to keep track of many of those at the same time.
Some security experts blame humans. They say you should create strong passwords, not use a single password for different purposes, not write them down on paper – or worse – in an unencrypted form somewhere on your computer.

I disagree. I think the fundamental problem is within information technology itself. We invented computers to make life easier for ourselves – well, actually, that’s not true, ironically we invented them primarily to break military encryption codes. But the widespread adoption of computing happened because of the promise of making our lives easier.

I myself use a password manager (KeePass) to make my life a bit easier. There are many password manager tools available, and they solve part of the problem: keeping track of what password was used for what purpose. I now only need to remember one (hopefully, strong enough) password to access the password database and from there I just use the tool to log me in to websites, corporate networks and other services (let’s refer to all of those as “cloud servers”).

The many problems with passwords

The fundamental problem remains – even when using a password manager: passwords are no good for protecting our sensitive data or identity.

Read more of this post

Linux Disk Alignment Reloaded

railtrackmisalignMy all-time high post with the most pageviews is the one on Linux disk alignment: How to set disk alignment in Linux. In that post I showed an easy method on how to set and check disk alignment under linux.
Read more of this post

Looking back and forward

I have been enjoying a short holiday in which I decided to totally disconnect from work for a while and re-charge my battery. So while many bloggers and authors in our industry were making predictions for 2013, I was doing some other stuff and blogging was not part of that ;-)

Now that we survived the end of times let’s look back and forward a bit. I don’t want to burn myself making crazy predictions about this year but still like to present some thoughts for the longer term. Stay tuned…

Read more of this post

The Zero Dataloss Myth

In previous posts I have focused on the technical side of running business applications (except my last post about the Joint Escalation Center). So let’s teleport to another level and have a look at business drivers.

What happens if you are an IT architect for an organization, and you ask your business people (your internal customers) how much data loss they can tolerate in case of a disaster? I bet the answer is always the same:

“zero!”

This relates to what is known in the industry as Recovery Point Objective (RPO).

Ask them how much downtime they can tolerate in case something bad happens. Again, the consistent answer:

“none!”

This is equivalent to Recovery Time Objective (RTO).

Now if you are in “Jukebox mode” (business asks, you provide, no questions asked) then you try to give them what they ask for (RPO = zero, RTO = zero). Which makes many IT vendors and communication service providers happy, because this means you have to run expensive clustering software, and synchronous data mirroring to a D/R site using pricey data connections.

If you are in “Consultative” mode, you try to figure out what the business really wants, not just what they ask for. And you wonder if their request is feasible at all, and if so, what the cost is of achieving these service levels.

Read more of this post

Wikipedia blackout

Blackout
Just to inform you that tomorrow (wednesday jan. 18th, 2012), some of the links on my blog might not work due to Wikipedia’s one-day blackout, in protest against SOPA (and I use Wikipedia a lot as a great resource to learn from myself, and to point to my readers for more information on certain topics).

I think Wikipedia touches a true problem; governments (pushed by lobbyist groups) are pushing for an internet where you have to be cautious about what you say or publish. Best case, you might get blacklisted. Worst case? Figure it out for yourself.

I live in the Netherlands and currently something similar is going on about organizations trying to restrict people accessing certain information sources (in this case, the Pirate Bay). Whether Pirate Bay (or any other source of information for that matter) violates the law, or not, is (IMO) a different discussion. But if people (or organizations) want to restrict access by ordering ISP’s (information service providers, a.k.a. the mailman) to blacklist those sites (i.e. check your mail for offending content) instead of chasing the publishers of illegal materials of any kind, then we are well on our way to a different internet. An internet that is no longer free. I strongly oppose to that.

So, Wikipedia (and others), you have my full support.

Click the “STOP SOPA” banner (top right hand corner) if you want to learn more.

More info:

https://www.eff.org/deeplinks/2011/12/fight-blacklist-toolkit-anti-sopa-activists

https://www.eff.org/search/site/sopa

Managing Performance Expectations

Got this joke from a Dutch colleague (thanks Rohan ;-)

A customer is complaining that his shiny new storage system does not perform and (as usual) blames the storage vendor.

But sometimes you have to wonder if a customer uses a system where it was designed for…

Read more of this post