Home | About the Storage Advisors | Adaptec Trusted Storage


A tale of multiple RAID-6s

Posted in Storage Interconnects & RAID, Advisor - Tom Treadway by Tom Treadway

There have been several posts recently regarding the reliability of RAID-5 and RAID-6, and how each applies to SATA and SAS drives. Now that your head is sufficiently spinning it’s probably worth going back and explaining how RAID-6 is defined. Unlike the other RAID levels, RAID-6 is very vendor unique.

First, let’s start with a diagram of how RAID-5 is laid out. With this baseline, the RAID-6 diagrams will make more sense.

RAID-5

As you can see, the parity is built by XOR’ing the data stripes on the same horizontal row. Note that RAID-5 can have either right-to-left (as shown) or left-to-right slanting parity. Also, there are various methods for the data layout from one horizontal row to the next. Regardless of these details all RAID-5’s have the same basic reliability and performance characteristics.

One of the first commercially available versions of RAID-6, created by IBM Almaden, is called EvenOdd. There is rumor that a similar scheme is used by NetApp. The advantage of this scheme is that it’s based on XOR. The disadvantage is that it has a few hot spots in certain diagonal blocks that cause very poor short write performance.

RAID-6 EvenOdd

A more common version of RAID-6 is based on Reed-Solomon encoding. The math is rather complex and requires a Galois Field lookup which, due to performance issues, isn’t feasible for a typical RAID IOP. Therefore R-S requires specialized hardware. Also, this encoding scheme is rumored to be similar to the ADG scheme used by HP.

RAID-6 Reed-Solomon

The following RAID-6 scheme called X-Code is interesting in that (a) it’s based on XOR, and (b) it interleaves rows of parity within rows of data. Unfortunately this scheme only works on a prime number of drives.

RAID-6 X-Code

The last scheme is an Adaptec proprietary scheme similar to X-Code because it uses XOR and has row parity, but it works on any drive count. Patents are still pending, so the actual relationship of the data to the parity is not shown.

RAID-6 Adaptec

The advantage of this scheme is that it does not require specialized hardware and is backwards compatible to older hardware. And the performance characteristics are very good.

So as you can see, all RAID-6 is not alike. There are pros and cons to each, but they all support two simultaneous drive failures.

TT

12 Responses to “A tale of multiple RAID-6s”

  1. Jon Toigo Says:

    Great post. I am cross-linking from DrunkenData.com.

  2. Jon Toigo Says:

    Tom,

    What are the practical impacts of each RAID 6 variant? Does application I/O run faster or slower over one or another? Does RAID set restore happen faster or slower over one or another?

    Please advise.

  3. Tom Says:

    Jon,

    In general, if implemented correctly, you’re going to see roughly the same performance in all the versions of RAID-6. I’d like to say that Adaptec’s version is faster, but I can’t. We’ve been able to maintain 90% of the long sequential write performance in going from RAID-5 to RAID-6, but that’s more due to internal optimizations than due to the algorithm itself. The advantage of our algorithm is that it works on all legacy hardware with a simple XOR engine.

    On short random writes all the algorithms should give a 50% hit in performance, except for EvenOdd if you hit blocks on the diagonal which will cause a 80-90% hit in performance. Stay away from EvenOdd.

    Reads should be identical between all the algorithms, unless someone screwed something up in the implementation. The performance of RAID-6 reads should equal that of RAID-5.

    Lastly, don’t even think about running Reed-Solomon in software RAID, such as Linux’s DMRAID. The data conversion involved in the Galois Field math has tremendous overhead that can bring a powerful x86 to its knees. If you run performance numbers you might think, hmm, this is pretty dang good. But if check the CPU utilization you’ll find that it’s pegged at 100%, leaving nothing for your applications.

    Regarding the restore (rebuild) time - good question. But that’s a little trickier since it heavily depends on the actual implementation. For example, everyone pretty much does RAID-5 the same way, but you’ll notice huge differences in restore time. On paper, the RAID-6 varieties should be pretty similar however I expect to see big differences when we finally get a chance to run all the competitive products.

    I’ll try to post some real life data as it becomes available.

    TT

  4. Joe Fagan Says:

    Tom,

    Nice RAID-6 discusssion. I can’t follow the argument that the need for hardware assist for Reed-Solomon encoding is a disadvantage -On the basis that it’s more flexible (protects m drive failures for m ‘parity’ drives), more efficient that some alternatives when n is odd, and works at a stripe level, why don’t the RAID vendors recognise it as an opportunity to differentiate themselves!

    RAID-6 is not just a “fad-du-jour” - it will be around forever! It can’t be improved upon (space-efficiency wise) - so why not start making some serious hardware and forget backward compatibility. Anyway, compatibility is needed only if you’re trying to support multiple raid cores or codes. Pick the best one and stick some hardware down.

    Besides, once polynomial arithmetic is down it opens the possibility for maybe hardware assisted encryption and/or compression and lots more.

    Hardware I say - Hardware!!!

    Joe

  5. Tom Says:

    Joe,

    The reason I say that hardware assist is a disadvantage for R-S is that most chips don’t have it yet. I’m contrasting this to XOR-based schemes that can effectively do RAID-6 with practically any hardware produced in the last 10 years. Of course this is only a temporary disadvantage as R-S becomes more and more common. But we’re not there yet!

    And I do certainly agree that R-S is more flexible for supporting >2 drive failures. I think we’re still pretty far away from needing that, but I’ve often imagined a box full (dozens and dozens) of 2.5″ or maybe 1″ drives delivering an insane number of IOPS. :-) Tolerating more than 2 drive failures would be a necessary feature. Of course you would need a lot of cache to try and hide that nasty RAID-6 RMW overhead on random writes.

    Viva la RAID-6!

    TT

  6. Ghen Says:

    (n,k,t) parameters of RS in RAID-6?

  7. Tom Says:

    Hmm. I’ve never seen a RAID-6 RS system described in terms of n, k and t. But I certainly could be wrong.

    It’s my understanding that this definition applies better to a serial stream of data where each individual code word (such as a byte of data) is expanded during the encoding process (for example from 8 to 10 bits) to allow a specific number of corrupted bits (in this case, 1 bit) to be detected and corrected.

    In this example, the the original data (k) is 8, the new data (n) is 10, and the number of correctable bits (t) is 1. This is sometimes represented as (10,8,1). Also, t is always less than or equal to (n-k)/2.

    This particular encoding is interesting because it’s used for FC, SAS, SATA, and PCIe, and it’s usually referred to as 8b10b.

    The reason I said that this doesn’t seem to apply to disk-based RAID-6 is that multiple blocks of data are used to create two additional blocks of redundancy. For example, a 10 drive RAID-6 will have eight drives of data and two drives of parity. (The data and parity is rotated in practice, therefore there really aren’t dedicated data and parity drives, but that’s just an irrelevent detail.)

    If (n,k,t) did apply to RAID-6, and I’m just guessing here, but I would think that t is equal to 2 since two drive failures are supported. But I have no idea what n and k would be. From the example, it wouldn’t be 10 and 8, respectively, because t would calculated to be just 1.

    Ghen, since you asked about (n,k,t), do you have any idea how to apply it? Any other mathwizs out there that can help?

    TT

  8. Joseph Malicki Says:

    Tom,

    Great article! It’s good to see such clear information on how RAID 6 works.

    Regarding RS and RAID-6 with n,t,k: In your example, n and k would be 10 and 8, and t is 1. Reed-Solomon error correction can detect and correct up to t data errors (corrupted bits), but 2t /erasures/, which is when you know which data is missing or incorrect (for errors, such as corrupted bits, you’re not necessarily sure which is corrupt).

    Since a lost hard drive is an erasure, because you know which drive is not there, up to 2 drives could be corrected.

    See http://en.wikipedia.org/wiki/Reed-Solomon_error_correction for more detail on erasures vs. errors.

    Joe

  9. Tom Says:

    Joe, that’s right. RAID-6 is typically used to correct erasures. Data errors “should” never occur unless there is a bug in the drive or RAID card- and those don’t happen. ;)

  10. Dave Glarborg Says:

    Tom,

    I am just re-entering the RAID world, after a several year hiatus (used to use it on DEC Alpha systems). Today is the first time I’ve read about RAID 6, and I appreciate the clear description. One thing I’m not clear about is the effective storage for the array. With striping, effective storage is (n/2), with RAID 5 its usually (n-1) [where n is the number of physical drives]. The descriptions seem to imply that in each case the RAID 6 effective storage is (2/3 * n), as each method seems to use one parity entitiy (row or column) for each 2 data entities. Is this correct?

    Thanks,
    Dave.

  11. Tom Says:

    Hi, Dave. Welcome back to the RAID world.

    The effective storage on RAID-6 depends on the implementation method and the drive count. In the “Adaptec RAID-6″ method shown above the overhead is exactly two if the drive count is even and therefore the effective storage is n-2. However if the drive count is odd then the overhead is slightly greater than two.

    For this reason most RAID controllers are moving to a more standard Reed-Solomon implementation where the overhead is always two, regardless of drive count. In fact, R-S is extensible to more than two drive failures with an equal number of overhead disks.

    TT

  12. Pq65 Says:

    Great description of the RAID levels. Yes, Netapp’s RAID-DP is based on EVENODD with one substantial difference over other RAID 6 approaches based on EVENODD…RAID-DP incorporates horizontal parity in its diagnonal parity calculations which means that it does fewer XOR operations than those needed by other implementations and that mattes.

Leave a Reply