Home | About the Storage Advisors | Adaptec Trusted Storage


How long is a worm?

Posted in Storage Interconnects & RAID, Advisor - Tom Treadway by Tom Treadway

Question to the Storage Advisors, from Prahlad: While most of us are more or less clear on calculating disk IOPS, is there a similar way of calculating cache IOPS? How can we calculate cache IOPS for a storage subsystem? Also will RAID calculations have an impact on cache IOPS?

Prahlad, thanks for opening that can of worms. ;-)

Let me start by saying that a real measurement of cache hit IOPS can be useful to bound real-world performance by defining (a) the maximum possible performance (out of cache) and (b) the minimum possible performance (to disk). But with that said, it’s also often used to mislead buyers with a number that is unachievable in real-world tests.

For example, you will occasionally see a product that claims to get 10K IOPS from disk (I’m just making up numbers) and 100K IOPS from cache. So what do you, as a potential buyer, do with that? If you knew that you were getting a 50% cache hit rate then you might be tempted to guess that average IOPS would be 55K. But will the delayed response of the cache misses affect the application’s ability to issue more commands that may be cache hits? Maybe the result is much lower than 55K.

And how are cache hits measured? Usually the “to disk” numbers are measured with random IOPs. (There’s also the question of whether the disk is short-stroked to eliminate disk bottlenecks, but that’s a different issue.) But it’s become common to measure the cache hit numbers with short sequential reads. This is done because it’s dang hard to get cache hits on random reads. You could reduce the access range to some value less than the cache size, but then you’d have to do the same thing with cache miss numbers, and then you might start hitting the drive cache, which will really start to confuse things. And there’s also the little issue that “smart” RAID controllers try to avoid caching random reads because the chance that the OS wants to re-read those same is low, especially since the OS has a read-cache that is much closer to the application, and therefore much faster, than the RAID cache.

So, back to the point, cache hit numbers are often measured with short sequential reads, causing the RAID card’s read-ahead cache to kick in. But that’s an entirely different test, and it can’t be compared to numbers that you get from a random-read-to-disk test. Additionally, the driver or controller may start coalescing those short reads into larger, more efficient reads. Now you’re seeing the benefit of coalescing, not caching. The 100K IOPS referred to earlier should be called the “max coalesced IOPS”, not the “max cache hit IOPS”.

Thus the worms. ;-)

But back to your question, the effect of RAID on cache (or the effect of cache on RAID) is significant – especially for writes to a RAID-5. Write-back cache, with the required battery backup, is the main purpose of a cache on a RAID card.

First imagine random writes to a RAID-5. Each host write is converted to two disk reads and two disk writes. The disks rapidly become the bottleneck and the host IO is throttled by how fast the disk IOs can complete. Caching is just a temporary buffer, and once it fills up you reach a steady state that is only slightly faster than a non-cache controller. (Write-back cache does help a little due to sorting, but I’ll save that for another post.)

Now imagine short sequential writes to a RAID-5. Without the cache you have the same four IOs as in the random case, but it’s even worse because the writes are usually exactly one revolution later than the reads. A full revolution is worse than most seek times!

But with a cache, those short sequential writes will build up into a highly efficient full stripe write. Rather than four IOs per host write, it’s only slightly more than one IO depending on the number of drives in the array. And there are no missed disk rotations! The performance difference in write-back and write-thru cache for short sequential writes is often 10X to 100X!

Prahlad, I probably didn’t answer your question, but it’s just too complicated for my meager brain. Hopefully I threw enough words at you to help you form your own answer.

Good luck, and thanks for reading!
TT

Leave a Reply