Home | About the Storage Advisors | Adaptec Trusted Storage


Yet another RAID-10 vs RAID-5 question

Posted in Storage Interconnects & RAID, Advisor - Tom Treadway by Tom Treadway

Question to the Storage Advisors, from John: The age old question…RAID 5 vs. RAID 0+1 with a twist on spindles. Here’s the deal: Multiple Progress databases. Much more read intensive that write intensive. Which is faster: (2) RAID0 sets of 4 disks that are then mirrored (8 disks total) OR all 8 disks in a RAID5 config? I’m trying to figure out if more disk spindles outperforms less spindles without the RAID5 overhead.

John, good question. First, let’s talk about the RAID-0 and RAID-1 combination. You mention mirroring two 4-drive RAID-0’s. But the more typical approach is to stripe four 2-drive RAID-1’s. The end result as far as performance and capacity is the same, but the reliability and rebuild time is better with the latter - striped RAID-1’s. The reason is that if a single drive fails in the former mirrored RAID-0’s case then the entire set of 4 drives in the RAID-0 is typically taken off-line. This means that the array can tolerate at most one drive failure, but all four drives will have to be rebuilt. However with the latter striped RAID-1’s case, a single failed drive causes just one mirror pair to be rebuilt. In fact, you could actually have one drive in each RAID-1 fail and still have the array on-line.

So as far as your original question, it doesn’t actually matter which version of RAID-10 is being used. Both versions will provide the same performance assuming that no drives are failed, and also assuming that the queue depth from the host is large enough and random enough to cause all the drives to be accessed.

In your example the access pattern is mostly reads and mostly random (because it’s a database). Just for the sake of comparison, let’s say that it is 100% reads and 100% random. The end result is that all eight of the drives will see a command, or an IO. That means that if each drive can do 100 IOs per second, then the RAID-10 can do 800 IOPS total.

On a RAID-5 with 100% reads there are no RAID-5 calculations other than the block redirection due to the striping, which is almost identical to the redirection in RAID-10. So the end result is that all eight drives are used (since parity is distributed across all drives), and therefore RAID-5 will do the same 800 IOPS as RAID-10.

So, if you’re doing 100% reads, the RAID levels are identical in performance.

But your data base isn’t 100% reads, so let’s look at what happens with 100% writes. With RAID-10, each host access will be written to two drives, so the array performance will drop to one-half, or 400 IOPS. However with RAID-5 each host access is converted to four IOs (two reads and two writes), so the array performance will drop to one-quarter, or about 200 IOPS. (It’s a little more complicated than that, but this is a pretty good estimate.)

Now you’ll have to determine the real read/write ratio and calculate the harmonic mean to estimate the performance impact of these write commands. For example, if the ration was 50:50, then the RAID-10 would get around 533 IOPS and the RAID-5 would get around 320 IOPS.

The bottom-line is that the more writes you have the more RAID-5 will hurt performance. But if writes are rare, then the improved capacity of RAID-5 may warrant the slight performance hit.

Good luck!

TT

7 Responses to “Yet another RAID-10 vs RAID-5 question”

  1. John Flick Says:

    I guess I meant (2) RAID 1’s, 4 disks, mirrorred. Don’t know why I can’t get that past me.

    So…the answer is that our current setup is actually reading from all the drives? I thought that it would only read from 1 set of the drives. (IE: 4 spindles). If it’s reading from all 8 spindles…then we’re getting the best performance we probably can get. I actually don’t need the storage space. I just thought that RAID 5 with all 8 drives would read faster. It doesn’t sound like I’d get any performance boost if I change from my current RAID 10 to RAID 5 at all. Sounds like all the spindles are being used as is? Just wanted to confirm. Your column is awesome by the way.

  2. Tom Says:

    John, thanks for the love.

    Yeah, your RAID stack “should” be reading from both drives in the mirror. It usually has an algorithm that makes sure the reads are distributed, such as a simple round robin, or sometimes a “which drive is closest to the data” algorithm.

    Just to be sure, do you see the access lights flickering on all eight drives? If so, you’re good to go. If you don’t have lights, then sometimes you can just feel the drive to see if it’s seeking.

    TT

  3. Dave Says:

    Ok, pardon me for throwing in a related Q…..

    I am not convinced that measuring the quantity of IOPS produces an accurate representation of performance. Given the speed of modern scsi devices and controllers, the IOPS throughput is almost negligible in a Raid-5, as compared to having to write the data 2 times in a Raid-1(0). For example, the parity calculation offloaded to the controller card only causes a small hiccup in the throughput. What’s important to me is start to finish time. For a reasonably large amount of data being written, having to write it 2x in a Raid-1(0) environment pales to only having to write 1.35 the amount of data, (assuming the parity set is 1/3 the original data). I will certainly concede for small data blocks, there would be some disadvantage in the raid-5 scenario. Another area of discussion would be the async nature of scsi writes in a multi spindle raid-5. The main advantage here is that other devices in the raid set could be used simultaneously while that one is busy, which is not true of the raid-1(0).

    So what am I missing?

  4. Tom Says:

    Dave, when one speaks of IOPS, the access pattern is typically short and random. So the parity overhead of RAID-5 is HUGE for writes. Each host write becomes four disk IOs, plus two sets of XOR operations. So a heavy random write environment on a RAID-5 is typically about half the speed of a RAID-10.

    Also, the comment about RAID-5 parity being about 1/3 of the overhead is only true for a 3-drive RAID-5. For example a 10-drive RAID-5 has a 10% overhead. And this overhead really only affects performance in long sequential IO, which is usually not measured in IOPS. Long IO is measured in throughput, or MB/s. So to your point, which I agree with, long sequential writes to a RAID-5 are actually faster than to a RAID-10.

    But I don’t think I agree with your comment about async writes. (Or maybe I just didn’t understand it.) In both RAID-10 and RAID-5, each short random host write will cause two drives to be touched. In the case of RAID-10, each drive is written once. In the case of RAID-5, each drive is read once and written once. Meanwhile, all of the other drives in the array are available for other host commands. So multiple simultaneous host commands are supported in both RAID levels.

    TT

  5. Andy Says:

    Hi Tom,

    Here’s a tricky one for you. Which is the quickest (mainly for reads on large SQL server databases)?

    One 12 disc RAID-10 array (hardware RAID).

    Or two 6 disc RAID-10 arrays, this time mirrored as one logical disc by Windows 2000 server software raid? Ideally I’d like maximum redundancy by making the RAID-10 array redundant itself, so if a pair of mirror discs go on one array we should be okay.

    … And a quickie.

    I’ve currently got 6 discs in RAID-10, I wasn’t sure if I was better leaving all the discs on the same SCSI channel or splitting them. I’ve got a Adaptec SCSI RAID 2230SLP card which provides 640mb/s across two channels. Does splitting the channels in RAID-10 slow down the array - do you have to have the discs on the same channel by design?

    Much appreciated,

    Andy.

  6. Tom Says:

    Hi, Andy. Good questions.

    Regarding the 12-drive RAID-10 compared to the mirrored 6-drive RAID-10s, the read performance should be identical. In either case, every drive should participate equally in the IO traffic - assuming the RAID stacks properly balance the workload. So if each drive can provide 100 IOPS, then both array configurations should deliver 1200 IOPS total. If you see differences in performance, then it’s due to issues in the RAID stacks and has nothing to do with the actual array configuration.

    The downside of the mirrored 6-drive RAID-10 config is that (a) you get half the capacity, and (b) writes will have half the IOPS of the 12-drive RAID-10 because each data block is written four times. The upside is that you can withstand a minimum of two drive failures as opposed to just one.

    Regarding splitting the array across busses. this will absolutely not hurt performance. And it could help performance if you’re bottlenecked on the SCSI bus. But typically the bottleneck is in the drives or the OS, so splitting rarely helps. This should especially be true if you’re doing mostly random IO because the SCSI bus is hardly being utilized. Most of the time is spent just waiting for the drives to seek.

    The main reason to use two busses if (a) the IO load is mostly long sequential and exceeds the single bus speed, (b) you simply run out of device IDs (15) on a single bus, or (c) you want to spread your mirrors across two busses in order to protect against a SCSI cable failure.

    I hope that helps. Thanks for reading.

    TT

  7. Steven Says:

    Hi,

    Very interesting thread!

    I just wondering if more disks are really better….

    a) according reliability each disk increase failing probability.
    (as each platter seem to increase failing probability)

    b) how about the latency to send (split) signals to multiple disks
    and to bundle them again?

    Is there a break-even in raid-0 adding disks….
    I see a step from single disk to 2disk raid 0, from 2 to 4…..
    but what about from 4 to 8 or even 8 to 16?

    my personal experience is that newer disks with less
    platter and higher density give more performance increase
    than adding disks to existing raid-0
    (”subjective waiting time” on long seq IO)

    …maybe i’m wrong?

Leave a Reply