The dangers of performance testing …

October 19, 2011

I spend a lot of my time talking to people about RAID performance numbers. Almost everyone as their own bent on this subject, and many organisations have specific tests they run against hardware to “check” performance.

The real problem is … what exactly do the numbers mean and do they relate to your real data. An example comes to mind when I sat with a customer in a foreign land using a Linux performance testing tool that I was not particularly familiar with. The customer has an absolute bent on IOPs, and was very keen to see the performance of our card under the single particular performance metric test they commonly use.

The answer: 1.9 million IOPs

I smiled and should have left there and then. That’s possibly the highest number of IOPs any machine has ever produced, let alone a single RAID card. I could sell a million of these things!

However … commonsense took over and I asked the customer how much RAM was in their system. Due to language barriers this took a bit of time, but finally they cottoned on to what I was asking and the penny dropped as to why I was asking it.

Answer – 96gb (in a machine with 2 8-core processors).

So where do you think all those IOPs came from – simple answer is that the system RAM had cached the entire performance test file and all IOPs were being read from system RAM – the test was not going anywhere near the hard disks because the test was only 60Gb in size.

While this might seem like a pretty funny example (I certainly thought it was), it was a simple example of the fact that performance testing is fraught with dangers and variables. Add to that the fact that you really need to be testing a data pattern that is as close as possible to your real-world application and it all gets very messy from here on in.

I have had plenty of customers quoting Mb/sec speeds to me – when I know that customer is using a SQL database and Mb/sec is meaningless for that particular application. IOPs are the go here. I also come across plenty of customers who play with the variables of their testing software – doing such thing as a 128 queue depth when their application is never going to get anywhere near that number (very few applications are that well written, or parallel in nature that they even get to a queue depth of 16) … again, a meaningless test that will produce all sorts of numbers that will never be matched in the real world.

Therefore my recommendation to most customers is to study their applications first, and try to work out what the software will be doing under heavy load (and this one is totally out of my field – I’m not going to study every customer’s individual application) – then they can set relevant metrics in their testing software to determine what sort of performance they “should” get from any given hardware configuration. Be careful of what you ask for – you might just get numbers that look good but don’t mean anything. The trick is knowing “what” to ask for.

There are “lies, damned lies and statistics”, which I believe is a quote that can be attributed to Benjamin Disraeli, the famous English Politician.

I tend to agree.



When the small things in life matter …

October 19, 2011

Quite some time ago Adaptec released a technology called “Hybrid RAID”. This is the combination of an SSD and a spinning disk in a RAID1 or RAID10 array.

Initially it was a little unclear as to who would use this technology, with gamers, workstations and entry-level servers seeming like the immediate candidates. However as we have progressed with this technology it has become clear that the top end of town (datacenters and corporate) are extremely interested in this technology. It takes a little bit of an explanation to see why …

If you are building a 16-drive server, and want to put the OS on fast disks (eg SSD), there are not many users who will run their server boot drives on just one disk – therefore a mirror is generally the accepted way of protecting this portion of the data puzzle on a server (no matter whether datacenter or home user).

The problem becomes one of capacity and slots. If you use up two slots for SSD for your boot drive, then you only have (in this example), 14 drive slots left for data storage.

So lets look at some maths.

Server 1
2 x 30Gb SSD drives for OS
14 x 2Tb drives for data
(for this example to keep the maths easy we’ll use say that 2Tb drives are in fact 2000Gb – which of course they are not)

Server 2
1 x 30Gb SSD drives for OS
15 x 2Tb drives for data

Capacity of Server 1 is:
30Gb for OS
13 x 2000Gb for data (losing one disk for RAID5 parity) = 26000Gb

Capacity of Server 2 is:
30Gb for OS
(this is made of 30Gb from both the SSD and 30Gb from the first 2000Gb hard drive – leaving 1970Gb usable space left on that drive)
14 x 1970Gb for data (losing one disk for RAID5 parity) = 27580Gb

That’s a 6.08% increase in capacity.

Of course, you could always make a RAID5 array of the 30Gb chunks not used on 14 of the hard drives, which would give an additional 420Gb of space – might be handy for swap files etc but we’ll discount this usage for the moment.

Server 1 costs … 2 x 30Gb SSD + 14 x 2Tb HDD
Server 2 costs … 1 x 30Gb SSD + 15 x 2Tb HDD

Just looking at Google for pricing indicates that server 2 will be cheaper than server 1, especially if a good SLC SSD is used for the boot drive.

Now a 6.08% increase in capacity might not sound like much if you are just buying one server, but when you are tasked with purchasing large amounts of data, this makes a big difference. If you can reduce the number of physical servers in your datacenter you reduce rack space, running costs, cooling costs – everything is impacted by having less hardware in your datacenter.

As far as cost is concerned, this can add up to massive savings for large organisations. Server is cheaper in the first place, less serves required, less servers to power, less cooling costs associated with fewer servers – it’s win, win, win when it comes to the financial side of these calculations.

Now that’s a smart use for this simple technology.