Hard Disk Short-stroking (Part 2): RAID 0 (Striping) – Understanding why disk combinations matter 0

Introduction

In part 1 (Understanding the physical performance characteristics of hard disks) we looked at why the physical properties of hard drives affects throughput i.e. more data can be read from tracks on the outer edge of the disk than the inner thus causing less disk head movement.

In this article I will show what happens when we start combining disks by striping i.e. the “0” of RAID e.g. RAID 1+0, 5+0, 6+0.

Recap: short stroking is simply using just a subset of the available tracks on the disk – using the tracks on the outer edge of the disk.

Connectivity Protocol – SAS / SATA

I won’t go into too much detail, but SAS and SATA are methods for communicating with your storage devices, the current levels allow 6GBits/sec, however, you can connect a SATA device to a SAS connector but you’ll only get 3GBits/sec. Be very careful how you connect disks, SAS and SATA are both point to point protocols as opposed to a shared bus approach like the original SCSI. A lot of servers will multiplex a single SAS port (6Gbits/sec) for multiple hard drives – they are then basically limited to that 6Gbits/sec. I recently had it myself with a new DELL T610, the drive cage takes 8 drives, it has 2 ports that multiplex 4 drives, I accidently put all 4 drives I had on the same port thus limiting my throughput! Always good to check your IO performance to see if it approaches what you are expecting!

The RAID affect

Our requirement is to store 600GB of data, consider the figure below: ShortStroking2SingleVStripedPerf

The diagram above shows two configurations, the first a single disk, the second, two disks (same model) in RAID 0. When two disks are used the data is equally striped across the disks so less tracks are used on the individual disks thus giving better throughput (Striping is performed by chopping the data up into equal parts – so if you are writing 600GB then each stripe might be 128KiB (the size of the data that gets written across all disks), so, for two disks the strip would then be 128/2 = 64KiB (the size of the data that gets written to each individual disk in the array)).

Logical Array/Partition/Volume

There are a few different terms for this, an Array is usually done on the disk controller – it creates a container across the disks you choose with the RAID level you require. That array is then presented to the Operating System as a logical disk, on that logical disk you can create logical volumes.

Example

Given four 1TB disks, we create a RAID-0 array of 4TB, on that array we create two logical volumes each 2TB (see diagram below). ShortStroking2RAID0WithLogicalPartitions

Arrays and Logical volumes are created from the outer edge of the disk inwards, so in the example above we created the logical partitions C: was first so gets the outer edge, E: second so gets where C: finished onward. The physics of the disk have limited our throughput boundaries, the throughput range for drive C: is 110-90MB/sec, E: is less at 90-55MB/sec.

Think about the scenario above, think about a system builder creating a single RAID 10 array across all your disks, then giving you two logical partitions – C: at say 256GB for your system partition and the rest for your data. You install Windows on C: (the optimal part of your disks), and you put your SQL data on E: (less optimal part of the disks) – think for a moment, how often should a SQL hosted box page? (rarely ever – never), how much disk activity will you see on C:? (very little if any except at boot time), are we really bothered about fast boot times? (no – its a server!).

By configuring the four disks differently (as two RAID-0 arrays) you can get at the optimal parts of the disk for both logical volumes, however, you store more data on each disk because instead of chopping up into 4 pieces you are chopping up into 2.

ShortStroking2RAID0WithLogicalPartitionsSeparateDisks

One suggestion I make to people building a new server, create your arrays up front, and install the OS on the  last array so you put the OS (something that doesn’t do much IO) on inner part of the disks thus freeing up the optimal part for your data.

Summary

Hopefully you will see now that the “physics” of hard disks means performance drop off and disk placement is crucial to getting optimal performance – you need to think about how you configure your RAID arrays – won’t be tired of the engineer who just tells you to RAID 10 the lot without even questioning what the layout of your SQL Server will be!

In Part 3 of this series we enter into the world of Files (specifically – SQL Server data and log files) and we put into practice what we’ve learn’t already.