Home | About the Storage Advisors | Adaptec Trusted Storage


Software RAID: It’s not just about performance

Posted in Storage Interconnects & RAID, Advisor - Tom Treadway by Tom Treadway

Question to the Storage Advisors, from Bill: I’m torn between RAID 5 and 10. I want a killer workstation, money is not the deciding factor. I have 4 10,000 rpm Raptor drives. I want speed and fault tolerance. I know that with RAID-5, the performance of a third-party card is far superior than using an on-motherboard controller, but what about RAID-10? I know there is far less computing with RAID-10, so is the performance difference as great?

Bill, are you talking about a software solution, or hardware RAID implemented on the motherboard? I think you’re talking about software RAID for the motherboard IO chips, so I’m going to go in that direction.

Honestly, the RAID-10 performance with software RAID will typically be as good as or better than with hardware RAID. [See caveats below.] You probably have plenty of x86 CPU cycles to spare, and RAID-10 doesn’t really take that many cycles to begin with.

But the question of software vs hardware RAID goes way beyond the performance measurements. I wrote briefly on the topic here, but allow me to repeat myself and make a few more points.

First, how do the features compare? For example, how do they handle dissimilar drives? Do they allow you to morph online from one RAID level to another or to add additional drives as your requirements change? Do they support hotspares and background rebuilds?

Maybe more importantly, are they bootable? It’s tricky to make a software RAID bootable. Typically you will have to play tricks with putting the boot loader on non-redundant drives, or maintaining a copy on each drive. There are complicated ways to do this under Linux, but I don’t know how to do it with Windows.

And back to performance, what’s the write throughput? The “easiest” way to improve write performance is to implement a write-back cache. But I say “easiest” somewhat facetiously because getting the implementation right is dang hard. And how do you implement a battery in a SW RAID write-back cache? You can’t.

With all that said, you probably don’t need hardware RAID on a workstation. Just make sure you thoroughly test the software solution. Fail a few drives and make sure the system keeps running. Then try to boot. If you can pass those tests a few times, then you’re “probably” ok with software RAID.

Good luck.

TT

12 Responses to “Software RAID: It’s not just about performance”

  1. ming zhang Says:

    agree with the claim about write-back cache. but feel u might miss what happened in Linux world recently.

    FYI, Linux MD support morph from one raid level to another. add another drive to do dynamic resize is supported. hot space among multiple raids and background rebuid are supported as well.

    Support drives with slighly different size is just a little trick. reserver enough spaces at the end will do the work.

  2. Tom Says:

    Yep, I agree that Linux MD supports morph, etc. MD has come a long way, and there is an impressive list of features on the roadmap. I believe there was also item regarding non-volatile writes, or maybe something about the RAID-5 write hole - I forget the details.

    MD is great for superusers that like to build and tweak their system, but I think it’s insane to consider putting mission critical data on it.

    Just my opinion.

    TT

  3. Bill Says:

    Forgive my ignorance, but I’m not quite sure what the definition of a “software RAID” is. I was really asking about the RAID capability using the on-board RAID controller on the motherboard (i.e. Intel ICH7R controller on an ASUS P5WDG2 WS Professional MB).

    On the machine I currently have, I have a 3rd party controller card in a PCI-X slot running my 4 Raptor drives in a RAID 5. Is this what you consider a Hardware RAID? How is this different than having the ICH7R chipset on the MB do the same thing? (BTW, I’ve read comparisons of the two and know that the 3rd party controller card is way better.)

    So I guess the pressing question on my mind is about the performance difference in a controller card and the ICH7R onboard controller for RAID-10.

  4. Tom Says:

    Bill, hardware RAID is typically defined as the RAID stack running on a dedicated CPU, while software RAID is defined as the stack running on the host x86 CPU, such as in the driver or filesystem. The RAID stack is the code that knows how to issue writes to both drives in a RAID-1, for example, or do a read/modify/write for a RAID-5.

    With that said, the ICH7R is not a hardware RAID solution. The only reason Intel put an “R” on the chipset name is that it contains XOR hardware for improving RAID-5 writes. But the Matrix RAID driver still runs on the host CPU.

    I don’t know the performance of the Intel Matrix/ICHR7 product, so I wanted to instead talk about the general differences in software and hardware RAID and let you draw your own conclusions. Sorry if that was confusing. I tend to explain how to make a watch rather than simply tell you what time it is. :-)

    Thanks for reading. I hope this is helpful.

    TT

  5. maobo Says:

    As I know that software raid6’s performance is about 70% of raid5 in normal mode. But if there is one disk failure the performance degrade greatly for raid6 compared with raid5 for write accesss. For read access the performance is equal. What I said is that in Linux MD implemention. Is it right?

  6. Tom Says:

    [My assumption for the following comments is that RAID-6 is implemeted using a Reed-Solomon scheme with hardware support for the P+Q generation.]

    Regarding the performance of RAID-5 vs RAID-6, it really depends on access patterns. For example, on random reads the performance should be identical. And on sequential reads it should be just one drive’s worth of throughput difference, i.e., around 100MB/s difference. Put another way, an 8-drive RAID-5 can get around 700MB/s on sequential reads while an 8-drive RAID-6 would be closer to 600MB/s.

    Writes are where you can see the biggest difference. Random writes to a RAID-5 will require four disk IOs, while random writes to a RAID-6 will require six disk IOs. So you could see a 50% difference in performance. Depending on the hardware implementation, sequential writes can be much closer, usually proportional to the sequential read difference, but lower due to the extra memory access for computing P and Q.

    Now let’s look at the performance for degraded mode. If a drive fails, then the data on that drive is reconstructed by reading all the data on the other drives in the same major stripe. That’s true regardless of whether it’s RAID-5 or RAID-6. If we can assume that P and Q is generated concurrently, then the only difference in RAID-5 and RAID-6 is the number of memory accesses and one additional drive access. So if you squint, maybe the difference is around 10-30%, depeding on how many drives you have.

    OK, so all of my comments so far concerns hardware RAID, not Linux MD RAID. The performance of MD is VERY dependent on the CPU and memory speed of the host system. If the system can’t keep up with hardware RAID, then performance could drop off dramatically for optimal writes and degraded mode reads or writes.

    Hope that helps.

    TT

  7. maobo Says:

    If each column is a disk drive, and each number is a data block, and XX is a parity block, and if stripes are 4 blocks wide, then data on a raid5 might look like:
    1 5 9 XX
    2 6 10 XX
    3 7 11 XX
    4 8 12 XX
    13 17 XX 21
    14 18 XX 22
    15 19 XX 23
    16 20 XX 24
    That what I think the stripe is
    1 5 9 XX
    2 6 10 XX
    3 7 11 XX
    4 8 12 XX

    And the strip is 1 5 9 XX

    and the chunk size is 4 blocks size. if one block is 4kB then the chunk size is 16kB.
    Is that right? Thank you.

  8. Tom Says:

    Yep, that’s a correct layout. There are actually four or so different ways to do it, but they’re all very similar and the one you selected is common. The differences have to do with whether the parity starts on the left or right, and how the block count is restarted. (It’s hard to explain without a picture, and it’s really quite moot, so I won’t get into it.)

    As far as the definition, your use of “stripe” is pretty common. I prefer the term major stripe just to be clear that it crosses all the drives. In other words, 1-12, plus the parity, is a major stripe. 13-24, plus the parity, is a major stripe. And so on…

    But I don’t think your use of strip is common. What you call a chunk is a more common definition of strip. And I prefer minor stripe. In other words, 1-4, 5-8, 9-12, 13-16, etc., are all minor stripes.

    As far as what you call strip (one block from each drive that comprises a “parity unit”), I’ve never actually seen a name for that.

    TT

  9. maobo Says:

    I have tested the performance of Linux raid6 with 8 SATA disks.
    The testbed is as following:
    Bonnie++ version: 1.03
    CPU: Intel(R) Xeon(TM) CPU 3.00GHz
    Mem: 512MB
    SATA disk: WD2500YD
    SATA LBA Card: Marvell 88SX6081
    And the result is as following:
    ## bonnie++ -d /mnt -u root -s 2048 -m maobo
    … …
    Version 1.03 ——Sequential Output—— –Sequential Input- –Random-
    -Per Chr- –Block– -Rewrite- -Per Chr- –Block– –Seeks–
    Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
    maobo 2G 33321 76 103650 27 47451 18 44065 87 193129 32 344.4 5
    ——Sequential Create—— ——–Random Create——–
    -Create– –Read— -Delete– -Create– –Read— -Delete–
    files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
    16 2345 96 +++++ +++ +++++ +++ 3130 98 +++++ +++ 12172 99
    maobo,2G,33321,76,103650,27,47451,18,44065,87,193129,32,344.4,5,16,2345,96,+++++,+++,+++++,+++,3130,98,+++++,+++,12172,99

    After one disk failure the performance is :
    Version 1.03 ——Sequential Output—— –Sequential Input- –Random-
    -Per Chr- –Block– -Rewrite- -Per Chr- –Block– –Seeks–
    Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
    maobo 2G 31661 72 82616 24 39827 17 43615 89 183464 30 268.9 5
    ——Sequential Create—— ——–Random Create——–
    -Create– –Read— -Delete– -Create– –Read— -Delete–
    files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
    16 2585 98 +++++ +++ +++++ +++ 3130 99 +++++ +++ 12015 99
    maobo,2G,31661,72,82616,24,39827,17,43615,89,183464,30,268.9,5,16,2585,98,+++++,+++,+++++,+++,3130,99,+++++,+++,12015,99

    The referenced object is see http://neil.brown.name/blog/20041215100345
    Which seems to be strange for the performance. I also have tested the single disk’s performance: Read 59473 KB/s ; Write 50935 KB/s

    And for the compare with raid0 of the 6 disks. The performance is as following:
    Version 1.03 ——Sequential Output—— –Sequential Input- –Random-
    -Per Chr- –Block– -Rewrite- -Per Chr- –Block– –Seeks–
    Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
    maobo 2G 41832 96 254851 78 83041 44 50799 97 341684 44 393.0 6
    ——Sequential Create—— ——–Random Create——–
    -Create– –Read— -Delete– -Create– –Read— -Delete–
    files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
    16 2632 99 +++++ +++ +++++ +++ 3129 98 +++++ +++ 11945 98
    maobo,2G,41832,96,254851,78,83041,44,50799,97,341684,44,393.0,6,16,2632,99,+++++,+++,+++++,+++,3129,98,+++++,+++,11945,98

    See it is reasonable for raid0 with 6 disks.

    As that raid6 with 8 disks which the data disk is 6 and the theoretical read performance should be 6*single read performance = 59KB/s *6 = 354KB/s. While the experiment performance is only about the half. Is that the SATA LBA’s problem or the PCI-x 133MHZ ’s problem?

    The FS I use is the ext3.

    I am also put this message on the Linux-raid mailing list. The reason why I put here for that your explain is very clear and easy understand. Thank you!

    maobo

  10. Tom Says:

    Maobo, it was hard to follow the numbers that you posted, but I certainly agree with your comment that reads to an 8-drive RAID-6 should be about the performance of 6-drives combined. However that is only true if you use a large enough transfer size to minimize the effect of command overhead. You should reach a steady-state throughput at around 128KB. What transfer size did you use?

    Also PCI-X 133 certainly should be the problem. That bus will give you ~1GB/s burst, and ~800MB/s sustained. And you’re seeing only 354KB/s.

    Also, are you sure that all the drives are being kept busy? Besides making sure that the driver and SATA IO chip is capable of putting a command out to all drives concurrently, you also want to make sure that command queuing is used on the drive. This will eliminate the problem of a drive being starved waiting for a new command.

    Lastly, maybe the problem is with the md driver. Can you try the same test with just six non-RAID’ed drives?

    TT

    P.S. You mention the filesystem you’re using, but you’re not going through the filesystem, are you? That could also be a limitation. I had assumed that you were accessing the drives through the raw or block driver. (Sorry, I’m not a Linux guru, and I may be using the wrong term.)

  11. maobo Says:

    Yes. But indeed I used the FS. For that I used the tool bonnie++ to test the raid performance. And I tested the single disk. The performance for read is 59MB/s and write is 50MB/s.
    While the performance is :
    raid6(8 disks): read:191MB/s write:104MB/s
    raid6(8 disks with one disk failure): read:186MB/s write:85MB/s
    raid6(8 disks with two disks failure): read:129MB/s write:82MB/s
    raid5(7 disks): read: 202MB/s write: 128MB/s
    raid0(6 disks): read: 341MB/s write: 254MB/s
    I will use the other tools to test the performance later. Can you give me some advise which tool to be used? IOmeter? or others.Thank you!

    maobo

  12. Tom Says:

    We mostly use IOMeter under Windows. I know there’s a version for Linux, but I’m not that familiar with it.

    TT

Leave a Reply