Short-stroking: Understanding the physical performance characteristics of Hard Disks (Part 1) 2

Article Series

As part of my thesis for my Business Intelligence masters (see http://reportingbrick.com) I did a lot of research around Solid State and Hard Disk drives, this series of articles reflects that research.

First I’ll talk about the physical properties of hard drives and progress through to its affect on how SQL Server performs in relation to IO.

Background

A Hard Disk has a spindle, attached to that spindle are multiple platters, each platter is circular with two sides, each side has its own head. All heads move together on the same part of the disk – they do not move independently. The idea behind multiple platters and sides is to increase the amount of data held and thus be retrieved in a single head move, track positioning – for more information see: http://www.hdd-tool.com/hdd-basic/hard-disk-and-hard-drive-physical-components.htm.

ShortStrokeBlogDiskInternals

Data density is constant regardless of where the data is held on the disk. There is however more surface area on the outer edge of the disk than the inner. The surface area of the disk is broken up into tracks, think of tracks like the concentric rings of a tree trunk that are equally spaced. Each track has a number of fixed size sector that are usually 512 bytes each; yes, it is therefore correct that there are more blocks on the outer edge of the disk than the inner – yes, we can store more data on the outer track than the inner – thus, we can read or write more data from the outer track before a track change needs to occur – a track change requires the disk head(s) to move.

Tracks

Outer Edge—————————————————————Centre

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 22
Tracks (assuming single sided platter)

For the sake of maths and getting across the concept that the closer to the centre of the disk the less sectors a track can hold – imagine that from the outside in, the number of 512 sectors is equivalent to 100*(23-{tracknumber}) – thus: 2200 for track #1, 2100 for track #2, 2000 for track #3….. 100 for track #22.

If we now translate the above into a real example, say we wanted to read 2200 sectors from the disk, imagine that the latency caused by the head movement between a adjacent tracks is 5 ms then:

If data is located on track #1 (it all fits on a single track) then the head positioning latency is just 5 ms;
If data starts on track #10 and is contiguous then it will also be on track #11 because #10 holds 1300 sectors and #11 1200 so the latency will be 2 x 5 ms (original move to first track, then move to track #11).

Against a real disk this behaviour can be clearly seen below (using the HDDScan utility).

HDDScanShortStrokeBlog

The outer edge of the disk is LBA 0 (far left), the inner edge of the disk (far right); there is a real drop in performance that increases as we approach the centre of the disk – this is sequential read, you can only imagine how severe if the performance was random read.

Short-stroking

The term is so straight forward once you know what it is you will wonder what all the fuss was about; storage vendors have been short-stroking since disks first came on the market – and still are.

Simply put – short stroking is where you only use a certain number of tracks on the disk, the tracks that provide the best performance, so for a 500GB drive you may only use 100GB of it.

Consider: why have multiple platters per disk? Why have double sided platters? The simple answer is to reduce the amount of disk head movement because you can chop the data up, for instance give a 400MB chunk of data – 1 double sided platter would store 200MB per side – that might be for maths sake 6 tracks per side; however, if you had 4 platters you could store 50MB per side which might mean all the data is held in the same track per platter side – that reduces disk head movement.

A question you may have is – why not just fill the disk with platters? The short answer is physics, the disk head rides above the platter – if it were to touch the platter, then corruption will occur, when a hard disk is shut down, heads are “parked” on a specific place on the disk. So, why are we stuck at 15Krpm rotation speed? Short answer again is physics, the spinning platter causes turbulence which affects the disk heads, helium filled disks (also read the comments) reduce the turbulence allowing extra platters.

Windows Logical Partitions/Hardware RAID Volumes

Volumes are created from the outer edge of the disk inwards, for example, given a 1TB drive, create a 250GB volume, the volume will be created on the outer edge of the disk, create another 250GB volume and it will be created after the first volume – basically volumes are placed outer to inner.

To demonstrate the effect of the physical disk geometry, create 5 logical partitions (see example below):

ShortStrokeBlogLogicalVolumes

Using IOMeter, create a file that fills up the first logical volume, copy that file to each of the other volumes; you are now in a position to test and see disk geometry in effect; check out the chart below which compares the sequential read performance on each volume – see how the position of the logical volume on the disk affects performance.

shortstrokeblogdiskgeom

RAID – Striping  (RAID x+0 e.g. 0, 10, 50 etc.)

Nothing much to say here other than again, the data is being chopped up – just like across platters.

Consider 10 disks in RAID 0, 400MB file – essentially 40MB gets stored per disk, if each disk has 4 platters then 5MB is stored per platter side.

All efforts are to resolve disk head movement.

Conclusion

Short stroking is pure and simply a method to reduce disk head movement thus reducing latency.

In practical terms we can use short stroking to dramatically improve IO rates, however, what about all that wasted capacity? Most systems are designed around peak IO that occur on specific times of the day, if your system has those peaks, look for the troughs – use the extra capacity for stuff like backups – stuff that can be performed away from the peak IO window.

In Part 2 we will focus on why file position matters.