April 7, 2014
At times, mmm quite often now actually (perhaps it’s because I’m getting older and more grumpy) the industry I’m part of really makes me embarrassed, all too often folk just quote things they’ve seen on Wikipedia or via word of mouth without even a basic knowledge of the “fact” they are quoting – a “fact” they may be basing a decision that may cost a couple of thousand pounds or a few hundred!
In this post I’m going to cover off why a 10K SAS disk can do more than 170 IOps (note: 15K disks are faster so better than 10K), this is one of the “fact’s” that have been quoted to me in the past couple of weeks that a 10K hard drive can only do 170 IOPs. Yes – a 10K drive might only do 170 IOPS but that is an average fully random workload across the entire disk – the real IOPs capability depends entirely what you are doing and where on the drive the data is in the first place!
What do I mean by “what you are doing”? With hard drives you are up against the physics of disk geometry - there are more tracks on the outside of the platters (more data per track) than on the inner (less data per track), if you need to reposition the disk head because your data is in multiple tracks you also suffer a seek and sync latency – seek to the correct track then sync wait for the platter to rotate until the head is over the piece of data to start reading – just a side note, platters are double side (two heads per platter), also, there are multiple platters inside a disk – see manufacture specs for more info.
By no means is this post exhaustive, but, it does give you an idea of what is really going on, what performance to really expect based on what you think your work load is.
Note: This research is specific to using the hard disk for SQL Server storage access patterns.
At this point I’ll refer you to another blog post I wrote a way back when I was documenting short stroking – http://dataidol.com/tonyrogerson/2013/04/04/short-stroking-understanding-the-physical-performance-characteristics-of-hard-disks-part-1/
The figure above was created using HDDScan over a 300GB 10K SAS connected disk, it clearly shows the effect of disk geometry – note this is best case for the drive reading from outside in. You can clearly see the performance drop off in throughput as you read data from the outer edge to the inner of the disk – 130MB per second down to just 80MB per second – that’s a drop of 50MB per second!!! Remember the outer edge of the disk has more data per track than the inner i.e. less track changes on the outer than the inner.
Very briefly: short stroking is a way of bounding the disk performance, if you only use up to the 100m LBA then your performance range will be 130 – 125MB per second, but you’d only be using 16% of the disk (48GB).
Test: IOMeter 20GiB file on a populated 10K 300GB disk
Details of test:
Pair of 10K SAS connected 300GB disks in a RAID 1 configuration connected to a P410i – the controller cache was set to 0% read cache (so no read cache) and the on disk cache was disabled.
The partition tested was the C: that has the system, application, page file and some database files on there, out of 300GB there was 100GB free.
I created a 20GiB file through IOMeter for the test – so just 6% of the disk, why 20GiB? Simulates a fairly big table with a good few million rows – am assuming the table would be kept defragmented using INDEX REBUILD or REORGANIZE (ALTER INDEX statement).
Why 20GB and not the entire disk? With SQL Server you query tables, as I’ve said above, your maintenance routines would put the data hopefully around the same area of the disk (I’m not going to go into internal V external fragmentation – google that, it is important, but for this particular post that disproves you can only get 170 IOps out of a 10K disk it’s not that relevant).
You would not have a single 300GB table on a pair of 300GB disks in a RAID 1 – that is a design pattern you just wouldn’t do in SQL Server because you’d not get the performance you’d want.
The table below shows the details of some tests made through IOMeter and the results; QD refers to “Queue Depth” i.e. the number of outstanding IO’s against the storage device – a Queue Depth helps improve performance, a QD of 1 is where the IO is synchronous i.e. the requestor is waiting for the IO to complete before continuing with work.
The first result with a QD of 1 and an IO size of 128KB is interesting – I did that specifically to find out where the file is approximately located on disk, I can reconcile that throughput (108MBs/s) with the output from HDDScan:
The file is pretty much where I’m expecting it to be – quite far in the disk.
Look back at the table some items to note:
For sequential IO (which you will predominately be likely doing with SQL Server) it is the throughput off disk that is the challenge – we max out on throughput rather than IOps. That makes sense because the head movement is the same whether you are pulling off 8KiB or 64KiB from the disk, in our test we are getting 14,090 IOps for an 8KiB IO at 100% sequential read with no read cache – a bit more than 170!
As you can see once you start dealing with random work loads, and a mix of read/write sequential/random workloads things begin to change; there is little point enabling the read cache on the disk controller because SQL Server has it’s own buffer pool and is a dam site more clever at knowing what data it wants, you won’t be repeating a read either. Write cache on the other hand is a fantastic way to balance out the load, on the SQL Server checkpoint process not only does SQL Server order writes but when those writes go to the disk controller cache they can be efficiently put down to disk.
Anyway – hopefully this has given you some insight into the reality that you can get more than 170 IOps out of 10K disk let alone a 15K one! Certainly help get something off my chest, I sincerely look forward to the day where the rip off prices of flash drives in the enterprise space come down to sensible levels – that said, we’ll all be up in the cloud by then, thankfully the cloud is built on commodity storage so I will get my day [eventually].
I could make this post easily into a white paper, for the case of brevity if you have specific points or questions leave them in the comments or email me directly and I’ll address (with research proof).
UPDATE 2014-04-07, 14:15
Forgot to mention that the 128KiB IO size came about because HDDScan defaults to 256 blocks per read, each block is a sector 512 bytes thus 512 * 256 is 128KiB.