Baking a cake: trading CPU for IO?
February 1, 2016 4 Comments
Sometimes I hear people claim that by using faster storage, you can save on database licenses. True or false?
The idea is that many database servers are suffering from IO wait – which actually means that the processors are waiting for data to be transferred to or from storage – and in the meantime, no useful work can be done. Given the expensive licenses that are needed for running commercial database software, usually licensed per CPU core, this then leads to loss of efficiency.
Let’s see if we can visualise the problem here with a common world example – Baking a cake.
The recipe of the cake mentions the ingredients:
- 200 g butter
- 200 g sugar
- 200 g flour
- 4 eggs
- 2 teaspoons of baking powder
- vanilla sugar
- lemon juice
In contrast, what do you need to process a business transaction? Something like this maybe?
- 1 million database CPU cycles
- 100,000 app server CPU cycles
- 100 IO requests of 8K each
- 2 Megabytes of memory
- 1 Megabyte of network transfers
Let’s simplify things by looking at two resources only:
|200g butter||1M CPU Cycles|
|200g flour||100 IO cycles|
Let’s also assume that butter is 10 times more expensive than flour, much like the CPU cycles are more expensive than the IO cycles (per transaction).
If I want to reduce the cost of baking cakes, could I reduce the amount of butter if I added more flour? I don’t think as a customer I would buy such a cake. So, can you reduce the number of required CPU cycles by adding more IO? I guess we all understand that it will not work out…
But the original idea was to eliminate overhead. So let’s say we can buy butter in packages of 1kg (think “servers with X processors” and flour in packages of 500g (storage with X “spindles”). We only need to bake one cake so we buy one of both. After baking the cake we still have 800g of expensive butter left (underutilized resources) and 300g (inexpensive) flour.
By applying a consolidation strategy (a bit of a strange word when baking cakes, but you get the idea) we could use the same oven to bake 2 cakes instead of one at the same time. Which leaves us with 600g unused butter and 100g unused flour. The efficiency is increasing, but now I’m limited by availability of inexpensive resources (flour).
To improve the efficiency I need to make sure that all of the butter is used. There are a few things I could do to achieve that:
- Buy butter in smaller packages (smaller server with less CPUs)
- Buy more flour so I can bake more cakes (more available IOPS due to Flash storage)
Sometimes, the flash vendors (including EMC), in their unlimited enthusiasm, forget the first option. So let’s say we go for option 2 and we buy 2 packages of flour so we can bake 5 cakes without any butter left. Great!
Now, the consumption of cakes is not driven by how much I can bake, but how much I can sell in the bakery – during opening hours!
Think about it – peak production capacity is meaningless if we cannot deliver the products (transactions) at the maximum rate all the time, unless we open up the bakery 24×7 and have equal amounts of customers come shopping for cake at 3am in the morning as at 3pm in the afternoon, and on sundays as well as on tuesdays. Cakes can be stored for maybe a day or so. Business transactions need to be consumed immediately.
So, the consumption of business transactions is not driven by how much I have available but by how much the business needs. Which is why I see my customers all too often buying very expensive database appliances, capable of driving millions of transactions per minute, only to end up mostly idle due to limited business demand.
If my bakery can only sell 2 cakes out of 5, how does that help? In other words, the capability of driving more workload doesn’t necessarily mean it gets consumed.
Driving up the potential production capacity by removing bottlenecks does not automatically lead to better efficiency. As said, the actual consumed amount of transactions is driven by the demand, not by the supply – although over time the demand may increase, if the supply (of information) has become frictionless.
This post first appeared on Dirty Cache by Bart Sjerps. Copyright © 2011 – 2016. All rights reserved. Not to be reproduced for commercial purposes without written permission.