Home | About the Storage Advisors | Adaptec Trusted Storage


SATA IOPS Measurement

Posted in Storage Interconnects & RAID, Advisor - Tom Treadway by Tom Treadway

Question to the Storage Advisors, from Michael K.: I’ve heard great things about SATA based DASD and JBOD devicess. I’ve found tons of information about the data transfer rate, but haven’t been able to find any hard data on how many sustained and bursted IOPS such systems can handle.

Michael, IOPS are very dependent on the access pattern. For this reason it’s normal to measure the “worse case” IOPS which are acheived on short random IOs. The IOPS can be calculated by simply adding the time to do an average seek to the time for a half rotation. Note that the transfer rate doesn’t come into play because it’s so insignificant compared to the seek and rotation time.

For example, the 500GB Western Digital SE16 SATA drive (WD5000KS) has a rotational rate of 7200 RPM. By inverting that number you get one rotation per 8.4ms. Therefore a half rotation is 4.2ms. This is also noted on the spreadsheet as “Average Latency”.

Next, the average seek times for reads are listed as 8.9ms, while writes are 10.9ms. That difference may seem odd at first, but drives eek out a little extra performance on reads by attempting to read the data before the head has settled on the track (and that settling for writes takes ~2ms). So, let’s say that you have an even mix of reads and writes, and therefore we’ll assume the average seek time is 9.9ms.

Adding 4.2ms and 9.9ms gives has an average random IO latency of 14.1ms. Since the drive can process, i.e., access the media, with just one command at a time, you can invert 14.1ms to get 71 IOPS.

You can start playing games to increase the IOPS - such as shorting the seeks by accessing a smaller section of the disk, or making the IO sequential so that requests are serviced from the drive cache. With these tricks you can get the IOPS to exceed 100,000!

Now you can see that it’s difficult to measure and specify IOPS. The number is somewhere between 71 and 100K. For this reason it’s normal to use the worse case, random access pattern value of ~71.

TT

6 Responses to “SATA IOPS Measurement”

  1. maobo Says:

    Quote: Next, the average seek times for reads are listed as 8.9ms, while writes are 10.9ms. That difference may seem odd at first, but drives eek out a little extra performance on reads by attempting to read the data before the head has settled on the track (and that settling takes ~2ms).

    This made me a litter confused. Which is better: read or write? 8.9ms vs 10.9ms, that sounds like read is better. But later you say that read needs settling on the track which will influence the performance. This means read is not better?

  2. Tom Says:

    Maobo, I see how my words may have been confusing. I tweaked them in the original post to clarify my point.

    Reads are better because they can start reading before the head is settled. Writes MUST have the head settled so that the data is written to the middle of the track.

    TT

  3. Craig Says:

    This isn’t SATA specs but reading this site I’m working on math for several caculations and this is what I’m up against. My vendor (large computer manufacturer with a two letter name logo) publishes drive specs like the following:

    Seek Time
    (typical reads, including settling) Single Track 0.4 ms
    Average 3.8 ms
    Full-Stroke 8.0 ms
    Rotational Speed 15,000 rpm

    No wfrom what you teach 15,000 RPM would “invert” to .004 seconds for a single rotation? Now I need to add to that my seek time.. here is where I’m lost, which one do I use? I’m guessing Average, but then I’m not sure what “full stroke” is?

    Please let me know if I’m wandering off base someplace. Thanks. I’m just trying to calculate the IOPS for my SCSI drive to compare to some SATA drives. Thanks again!

  4. Tom Says:

    Craig,

    That’s correct. 15K RPM translates to a 4ms rotation. So when you figure access times you will use half this value, or 2ms, because on average you only have to wait half a rotation for your data.

    To get average access time you want to add this 2ms to the average seek time of 3.8ms. If you want to characterize a read-only access pattern, then the seek time will be a little less because reads don’t have to wait for the head to settle. Likewise, writes are a little more than average because they DO have to wait for the head to settle. But I don’t know how your drive vendor characterizes “average”. They may be talking about a 50/50 mix of reads and writes, or they may be talking about an average worse case which would be writes. If that’s confusing, just use 3.8ms and you’ll be close enough.

    Full stroke is defined as a full seek from track 0 to the last track. This very rarely happens and is therefore a somewhat meaningless number.

    BTW, this method for calculating access time and IOPS is only applicable to a random workload spread evenly across the entire disk. With a properly defragmented disk most of the seeks will be constrained to a smaller area. For this reason some vendors measure IOPS with the range contrained to just 10% of the disk. So, with that said, I’m not sure if your drive vendor is defining average as seeking half way across the disk or some smaller range. From what I understand about disks, most of an average seek involves acceleration and deceleration, and therefore either definition of “average” would give you roughly the same number.

    Lastly, some folks also measure IOPS using short sequential access patterns. The only seeking that occurs is when switching from one cylinder to the next. If the test is 100% reads then most of the IOs come from the OS/controller/drive read cache. And if the test is 100% writes then the IOs are combined in an OS/controller/drive write-back cache to cause large bursts of data to disk.

    I hope this helps, Craig. Hopefully I didn’t confuse things by answer more than you asked.

    TT

  5. Craig Says:

    No confusion. Thanks Tom!

    Interesting tidbit I noticed while running some numbers. Even if your drive can “do” 170 IOPS (the math on mine work out to like 172.4IOPS). As soon as you put it in a RAID1 the performance *drops* to 113.3IOPS. Of course we’re assuming 50:50 R/W, as well as a few other things like randomness…

    You know when you think about RAID1 there is a performance hit on writes, but somehow when you see your 172IOPS drop to 113IOPS it really makes you go: “HUH?”…

    Anyway, just thought I’d share that painfully obvious observation.

    My formula:
    IOPS = 1/(((1/(RPM/60))/2) + S)

    Sorry I can never remember my order of operations so I put everything in parens!

    Thanks again! Love the site!

  6. Tom Says:

    Craig,

    It would be interesting to test your RAID-1 performance with 100% reads and 100% writes. If you’re getting 170 IOPS on a single drive, then on a RAID-1 I would expect reads to be 340 IOPS and writes to be 170 IOPS. Your 50:50 mix should be closer to 255 IOPS.

    There are a few reasons why you might be seeing only 113 IOPS. The first is that the drives just aren’t being kept busy enough. For example, let’s look at a simple 100% read test. If it takes 8 concurrent commands (just a wild ass guess for sake of this example) to allow one drive to hit 170 IOPS, then it might take 16 commands to allow two drives to hit 170 IOPS EACH, or 340 IOPS total. That of course assumes that the reads are distributed evenly.

    Another reason is that the RAID stack isn’t properly balancing the IO across the two drives. For example, some rookie RAID-1 algorithms may simply try to keep the number of commands even across the two disks, and all new commands could simply be sent to the drive with the fewest current IOs.

    As soon as the newbie designer implements this algorithm they’ll find out that short sequential reads suck. The end result of this algorithm is that alternating commands will go to the two drives. In other words, one drive might get all the even block number requests while the other gets the odd block number requests. Performance will be basically cut in half.

    So a compromise algorithm is to send commands to the drive whose head is closest to the new request. That seems tricky, but it’s still pretty difficult to get right.

    Anyway, I only bring this up because that’s possibly the problem that is causing your IOPs to be lower than expected.

    As far as the equation … Yep! That’s exactly it.

    TT

Leave a Reply