Home | About the Storage Advisors | Adaptec Trusted Storage


SAS drive performance

Posted in Storage Interconnects & RAID, Advisor - Tom Treadway by Tom Treadway

Question to the Storage Advisors, from David R.: With SAS is spindle speed as important as was with older technologies? For example, six 15K RPM drives vs eight 10K RPM — which would perform faster?

David, it really depends on where the performance bottlenecks are – in the application, the OS, the filesystem, the driver, the card interconnect (PCI), the IO controller, the storage interconnect, or the drives. In real life the bottlenecks are often in the application, but in artificial benchmark testing, which is unfortunately all that matters to most people, the test is configured to push the bottleneck onto the controller or drives.

So let me throw out some numbers to establish a baseline.

First, the PCI interface is typically an 8x PCIe connection which is capable of 2GB/s burst throughput. After removing overhead, sustained throughput is probably closer to 1.2GB/s.

Next, the SAS interface is a collection of point-to-point connections that typically run from the controller to a drive, and each of these connections run at 300MB/s. SAS is pretty efficient, so the real life throughput is only reduced slightly to, let’s say, 270MB/s. Also, it’s common for a SAS controller to have eight such connections, for a total sustained throughput of 2.1GB/s.

So you’ve got 1.2GB/s between the controller and the OS, and 2.1GB/s between the controller and the drives. With a RAID-5 array the IO load on the drive side is often higher than on the OS side, so if you squint then those two throughputs are balanced close enough for this example. Plus, as you’ll see later, these numbers are big enough to not really matter.

Now let’s look at the drives that are attached to the controller. I found the specs for two Cheetah drives on the Seagate website.

    Seagate Cheetah 15K.5 (300GB)
    Average Latency = 2ms
    Average Seek Time = 3.7ms
    Average Transfer Rate = 120MB/s
    Seagate Cheetah 10K.7 (300GB)
    Average Latency = 3ms
    Average Seek Time = 5ms
    Average Transfer Rate = 88MB/s

To be honest, I used the average seek time and transfer rate in the numbers above. Reads have faster seek times than writes because drives start reading before the head actually settles on the track. Likewise, the transfer rate is higher on the outer tracks than the inner tracks. So I just used average values; it won’t matter for this example and it’s still a fair comparison.

The first thing you will notice is that the drive transfer rate of 88-120MB/s is 2-3X lower than the 270MB/s supported by the SAS wire. Now, the drive may burst data at close to line speed out of its cache, but most transfers will still need to come from the media. It’s easy to see in this example that a 15K drive will be about 50% faster than a 10K drive in long sequential IO, such as during video processing.

But does this hold true as the drive count increases? Let’s start with a fully loaded controller with 8 drives. Now we’re talking about the controller seeing a total of 700-900MB/s. Depending on the RAID type and whether the IO is predominantly read or write, this is easily over the limit of most current controllers. So now we’re getting to the point where there will be no noticeable difference in drive types. And as you start to add SAS expanders, increasing the drive count per wire to greater than one, then the difference is completely eliminated.

We can do the same analysis with short random IO, as found with databases, using the latency and average seek times. The time spent to access just a few blocks of data is predominantly composed of the time spent seeking the drive to the correct track and then, on average, the latency of a half disk rotation to get to the data. Therefore the average access time for these 15K and 10K drives is 5.7ms and 8ms, respectively. If you invert the values you get 175 IOPS (IOs per second) and 125 IOPs, respectively. The time spent to actually transfer the data on the SAS wire is close to just a few microseconds, so the SAS interface can easily support many 1000’s of IOPS.

Again, you can see that a 15K drive is roughly 50% faster than a 10K drive for short random IOs. And, again, that comparison can break down as the drive count is increased and therefore the total IOPS coming from the array of drives exceeds the limit of the RAID controller. This controller limit is roughly 3000-6000 IOPs, which can be satisfied by 20-40 drives.

OK, the summary: Assuming the bottleneck is not in the application, OS, filesystem, etc., and you have a small number of drives, then you will most certainly see a big difference in drive performance. As the number of drives increase the bottleneck will move from the drive subsystem to the controller, application, OS, etc., and the difference will become insignificant.

And to specifically answer your question, using the numbers above six 15K drives are slightly faster than eight 10K drives. The 15K array would produce 720MB/s and 1050 IOPS, while the 10K array would produce 704MB/s and 1000 IOPS. That’s pretty close, so you’ll probably want to base your buying decision on the price of the drives, the number of drive bays, the total required capacity, etc.

I hope that helps, David. Good question, and thanks for asking.

TT

7 Responses to “SAS drive performance”

  1. nobody Says:

    Where did you get 1.2GByte/s and the overhead requirements for PCIe 8x?

    From http://en.wikipedia.org/wiki/PCI_Express
    “Each lane utilizes two low voltage differential (LVDS) signaling
    pairs at 2.5 gigabaud. Transmit and receive are separate differential
    pairs, for a total of 4 data wires per lane.”

    “PCIe 1x is often quoted to support a data rate of 250 MB/s (238
    MiB/s) in each direction, per lane. This figure is a calculation from
    the physical signalling rate (2.5 Gbaud) divided by the encoding
    overhead (10bits/byte.) This means a 16 lane (x16) PCIe card would
    then be theoretically capable of 250 * 16 = 4 GB/s (3.7 GiB/s) in each
    direction.”

    To summarise, PCIe 1x is 2.5Gbps each way (dual simplex). After you
    calculate in overhead (20%) you will have approximately 2Gbps, or
    250MB/s, to work with. PCIe 8x is 20Gbps - 20% overhead (16Gbps) each way.

  2. Tom Says:

    Nikolas, I used 8 x 2.5Gb/s, or 8 x 250MB/s, to get to 2GB/s. I ignored the ability to transfer concurrently in both directions because benchmarks are often set up to test just one direction at a time. But you are certainly correct that I should/could have started with 4GB/s instead of 2GB/s. I’ve also found that when you start to mix reads and writes of large data transfers, other optimizations in the RAID cards start to break down, causing a bottleneck.

    The 250MB/s takes into account the 8b10b conversion, which is the 20% overhead you referred to. Then I pulled out another 40% of bus overhead to get down to 150MB/s. In hindsight, that was probably over-zealous. PCIe is much more efficient than that. The wiki entry you quoted claims the overhead to be closer to 5%. So let’s say that the wire throughput is 238MB/s. That makes the 8x single direction throughput closer to 1.9GB/s rather than the 1.2GB/s that I originally used. I’m ok with that. It doesn’t really change the argument, but I tend to agree with you.

    FYI, SAS is also full duplex, but I didn’t take that into account either since drives can only have one command accessing the media. I’ll ignore the possibility of reading and writing concurrently from the drive cache, assuming that any drives actually support that.

    Thanks for keeping me honest! :-)

  3. Danny Larouche Says:

    It seem there is an error in your document. You are saying that SAS drives has 8 connections at 270MB/s for a total sustained 2.1Gb/s!!!!

    1MB=8Mb, then 1 connection at 270MB/s would give 2.1Gb/s and 8 conenction would give a 17.280Tb/s

  4. Tom Says:

    Danny, I checked what I wrote. I said 2.1GB/s, not 2.1Gb/s. I got to this number by simply multiplying 270MB/s by 8 to get 2160MB/s, and divided that by 1024 to get 2.1GB/s.

    Is that right, or did I screw something up?

    Thanks for reading!

  5. Mike Says:

    Hi Tom,
    I am working on quantifying the differences between Fiber SCSI and SAS drives and what we can expect when we start using them in conjunction or as a replacement of their bulkier predecessors connected to a SAN. I was hoping, you might be willing to consider for a moment how a SAN might be able to overcome some of these bottlenecks you mentioned “drive subsystem to the controller, application, OS, etc., ” and then equate that to the summary numbers you put together related to the differences between 10k and 15k in how they seem to be really close in IOPS as quoted:

    The 15K array would produce 720MB/s and 1050 IOPS, while the 10K array would produce 704MB/s and 1000 IOPS. That’s pretty close, so you’ll probably want to base your buying decision on the price of the drives, the number of drive bays, the total required capacity, etc.

    The reason I ask to differenciate is that I have been able to generate on a SAN using iSCSI over 2,200 IOPS with 6 SAS 15k using raid 5 via and zero caching, I get closer to your numbers inside a server connected to a 64bit PCIe card. The same test via iSCSI on a SAN 6 fiber SCSI 15k drives I get almost exactly 1050 IOPS without utilizing the cache. I know these numbers or horribly crude, just wanted to give you why I think there might be a big difference between the 2 and to see your reaction to the potential of eliminating the bottlenecks mentioned above.

    Thanks,
    Mike

  6. Tom Says:

    Mike, let me make sure I understand what you’re saying…

    With six drives on a PCI RAID card you’re able to get close to the theoretical max - about 6000 IOPS. But with those same drives behind an iSCSI controller performance drops to 2200 IOPS. I guess that makes sense. Your iSCSI controller must not be doing a very good job of processing the TCP/IP code stack. This is a ton of code, and the only way to make it faster is with a high-end x86 CPU or a TCP/IP offload engine, i.e., a TOE.

    BTW, I assume you’re testing with short random reads. Random writes to a RAID-5 will be a lot slower - about 1/4 the speed - because of the read/modify/writes.

    You also mention that you’re getting just 1050 IOPS with plain ol’ FC drives. Is that right? Or is there a RAID controller involved. That’s a very low number. The FC stack is much more efficient than iSCSI, and it’s always offloaded to hardware, so it’s hard to imagine why FC would be slower than iSCSI.

    Can you give me more info? Maybe we can work through this to understand where the bottlenecks are.

    Thanks,
    TT

  7. Vic Says:

    I would have made the same decision between the 6 and 8 drives as to the maximum sustained output being bottle necked by the rest of the subsytem. But you are forgetting performance when the load is not at the maximum of a the total system like when in a database you are performing a series of transactions which are not running in parallel but are sequential. You are not reaching anywhere near the capacity of the system. What you have essentialy are small data requests and the 50% speed advantage in seek time and latency of each 15K drive makes the difference. The difference is though you can still support the same number of users in both setups but your 12 hour nightly job will take only 9 hours with the 15K drives.

Leave a Reply