Home | About the Storage Advisors | Adaptec Trusted Storage


RAID-5: Pining for the fjords?

Posted in Storage Interconnects & RAID, Advisor - Tom by Tom Treadway

After all these years of hearing that it was just shagged out following a prolonged squawk, I think RAID-5’s time has finally come. Yep, it’s dead. It’s a stiff … bereft of life … off the twig … kicked the bucket … joined the bleedin’ choir invisible. IT’S AN EX-RAID LEVEL!!!

So, why is RAID-5 dying? First we should probably look at why we ever used it in the first place, specifically in relation to its simpler cousin, RAID-10. While they both protect against drive failure and data loss, RAID-10 is much more straightforward to implement and has much higher performance. (The details of this have been exhaustively covered in other papers so I won’t go into it here.) The only advantage of RAID-5 is that it requires less drive overhead for basically the same level of data protection. Each RAID-5 requires just one drive’s worth of storage to protect all the other drives in the array. For example, an 8-drive RAID-5 uses one drive’s worth of storage for parity and is therefore 87.5% efficient. However each RAID-1 requires exactly half the drives for protection, so an 8-drive RAID-10 is only 50% efficient.

Every year I’m told that drives are getting bigger and cheaper and therefore folks can afford to just throw away half their storage for the benefits of RAID-10. And that’s mostly true – especially for cheap, direct attach PATA or SATA drives. But do we ever have “too much” network storage in the office? No way! The IT folks are always complaining that they don’t have enough empty drive bays, enough power, enough switch ports, whatever, plus more drives just means more hardware that will break. There’s never enough room on the network shares for our “stuff”. So dropping from 80-90% efficiency to 50% just won’t fly. Regardless of how big drives become, we (with Microsoft’s eager help) will find ways of filling them. So it appears that the requirement for RAID-5 efficiency is here to stay.

And along comes RAID-6.

The easiest way to explain RAID-6 is that it’s just like RAID-5, but it supports two concurrent drive failures rather than just one. On the surface you might think that implementing RAID-6 simply requires XOR’ing the data onto a second parity drive. Well, that’s kind of right, but it’s reeeeeaaallllyy a lot harder than that. I’ll save those details for another time. Just take for granted that a RAID-6 will tolerate two drives failing without loss of user data.

Will I ever get two drives failing at once? MTBFs on currently shipping drives are up over a million hours – that’s 114 years! What’s the chance of two drives failing at once? Yeah, I agree the chance of that happening is slim, but remember that those drives are pretty large and during the time it takes to rebuild the array you’re vulnerable to data loss due to a second drive failure. And why did that drive fail to begin with? Maybe it wasn’t just a random drive failure. Maybe it was system related, such as a fan failing and temperatures rising, or noise on the power cables, or flakey cables. When taking environmental failures into account it’s common to reduce the second drive’s MTBF to 1/10th the value of the first drive. Now take into account all the systems you’ve installed or shipped. What’s the chance of just one of those systems experiencing a two drive failure? The chance of failure for each individual installation is still relatively low, but the chance that at least one of those installations will lose data can be pretty high.

A second way to get a two drive failure is purely human error. When a drive in a RAID-5 fails, a well-designed system will light a fault LED next to the failed drive. Assuming that the system is in use 24/7, the administrator will remove that failed drive from the live system in order to replace it with a new drive. Hopefully he or she is able to do that without (a) removing the wrong drive, or (b) yanking hard enough on the drive carrier to dislodge an adjacent drive. Of course neither should ever happen, but accidents do happen.

While those are both great reasons for using RAID-6, the single biggest reason is based on the chance of drive errors during an array rebuild after just a single drive failure. Rebuilding the data on a failed drive requires that all the other data on the other drives be pristine and error free. If there is a single error in a single sector, then the data for the corresponding sector on the replacement drive cannot be reconstructed. Data is lost. In the drive industry, the measurement of how often this occurs is called the Bit Error Rate (BER). Simple calculations will show that the chance of data loss due to BER is much greater than all the other reasons combined. Also, PATA and SATA drives have historically had much greater BERs, i.e., more bit errors per drive, than SCSI and SAS drives, causing some vendors to recommend RAID-6 for SATA drives if they’re used for mission critical data.

RAID-6 sounds great. What’s the downside? In read operations the performance is basically identical to RAID-5 since there is no need to read or manipulate the parity data, assuming that the array contains no failed drives. And on long sequential write operations the overhead of calculating the additional parity is not significant compared to all the other data that is being written. The RAID-6 performance on a well-designed controller should be within 10% of the RAID-5 performance. The only significant degradation is seen on short random writes as typically seen in database updates. But this access pattern is also relatively poor on RAID-5, and therefore most administrators choose to run their databases on RAID-10 arrays. The bottom-line is that in all the access patterns that matter, RAID-6 performance is close enough to RAID-5 performance to make the issue moot.

In the last six months most of the major RAID vendors have started shipping products incorporating RAID-6. While they all use different algorithms, the results are the same – they all support two drive failures. Some products require specialized hardware and others are backwards compatible with XOR-based hardware, but the performance and reliability of all the methods should be roughly the same. Eventually you can expect to see all hardware RAID controllers supporting some form of RAID-6, with the XOR-based controllers having an early advantage since they don’t require new hardware. Once everyone supports RAID-6, there really is no need for RAID-5.

The moral of the story? Don’t be fooled by the beautiful plumage, and make sure your Norwegian Blue hasn’t been nailed to the perch. And more importantly, don’t be distracted by all these silly Monty Python analogies and make sure you use RAID-6.

TT

10 Responses to “RAID-5: Pining for the fjords?”

  1. Tom Says:

    Mark, to get the benefit of two-drive failure protection you unfortunately have to dedicate two drives drives worth of storage for your parity. So a 4-drive RAID-5 would drop to 50% storage efficiency - the same as RAID-10. Regardless of all the wonderful things I said about parrots and prolonged squawks, these low-drive count RAID-5s are probably the one place you’ll see RAID-5 continue to live.

    Regarding the performance, the reads would be roughly the same between RAID-5 and RAID-6. And long sequential write performance would be “almost” the same. We see about a 10% drop for very large drive counts, but a 4-drive array should be drive limited so that 10% will drop to almost zero. As I mentioned in the original post, short random writes will suffer the most. A RAID-5 short random write takes 4 IOs while a RAID-6 write takes 6 IOs. So you can expect a 50% drop in performance. That’s why databases typically aren’t run on a RAID-5.

    As far as your video workload, the IO should be fairly long and sequential, so I would expect that you wouldn’t see much of a difference in RAID-5 and RAID-6, assuming you can tolerate the drop in capacity.

    Lastly, all the new controllers will continue to support RAID-5. That “should” be a statement about all vendor’s RAID controllers, not just Adaptec’s.

    TT

  2. Mark Neal Says:

    Thanks, Tom. This helps. I’ll keep with RAID 5 for smaller arrays and consider RAID 6 for larger ones.

    I have heard some say that with RAID, data backup is redundant and a waste of time. With a properly running RAID 5 (or other level) array, do I need to backup my data? Are there conditions under which a RAID array will lose data, except a multiple-drive simultaneous failure?

  3. Tom Says:

    Mark, they’re on crack! Of course they still need to backup. I guess they expect that the worst that will ever happen is they’ll lose one drive in a RAID-5 or two in a RAID-6. And if they look at my MTBF calculations in a later post they’ll probably think they have hundreds or thousands of years before experience a data loss.

    But those are crazy thoughts. The reason we need RAID is to avoid downtime. It’s just damn inconvenient to rebuild the system from tape everytime a drive fails.

    The reason we need BACKUP is to survive catastrophic failures, such as the building burning down, a virus wiping out your data, a disgruntled employee reformatting the disk, a newbie user deleting their files, or just plain old file version rollback. RAID won’t protect against any of those failures.

    Good question.

    TT

  4. Jef Says:

    Can a Raid 5 - 7 hotswap drive bay be broken up into two or more system drives

    on a Raid 5, can we actually just make for example, 7 drives out of 7 drive bays, instead of being concerned of mirrors ?

    if i dont want mirros, can i create 7 system drives ?

    premise: gaining storage space, not mirrors

  5. Tom Says:

    By “system drives”, do you mean a logical drive, like a C: and D:? If so, then that level of partitioning is completely transparent to the RAID card, so yes, you can definitely do that. Just be aware that if you’re reading or writing sequentially on C: and D: it will actually appear to the RAID as two streams on opposite sides of the drives, and long seeks will occur. Performance could plummit if the drives can’t read ahead fast enough.

    As far as creating a RAID-5 out of 7 drives - yes, you can do that. But I don’t know what you meant about mirrrors. Mirroring is a completely different RAID level and has nothing to do with RAID-5.

    And if you did build a 7-drive RAID-5, you could divide it into 7 system drives, C:, D:, E:, etc.

    Is that what you were asking?

    TT

  6. Stan Says:

    Interesting information Tom.. I recently built an 8 drive RAID-5 file server for my home media.. 2.1 TB usable (8 x 300GB -1) and had a two drive failure yesterday… talk about bummed. It’s my own fault.. I was running too small a power supply and could measure the voltage sagging at the drive connectors.. I was intending on putting a bigger supply in “any day now”, but the supply choked before I did.. corrupting two of the drives on the way out. I now have a 650 watt Antec in there and the rails are regulating nicely. Your comments about backup really struck home for me.. as I had hoped to get by without doing backups on this array.. trying to rely on the reliability of RAID-5.. (all this stupidity without crack too) So with that said, what’s the home data whore like me supposed to do for backup solutions? I anticipate this volume being pretty close to full in time.. and how’s a guy supposed to keep it backed up without babysitting a tape drive (tape swapping) or spending a zillion dollars on a robot tape backup system?

  7. Tom Says:

    Stan, I feel your pain. I’ve ripped about 1000 CDs (all legally purchased, of course), and many of them had to be hand-tweaked to get the metadata correct. It would take me weeks to redo all of them if I had a drive crash as you experienced. So I’m ultra-paranoid about these things, and always keep one or two copies on USB drives. Also, I try to keep those drives offsite just in case the house burns down. I suppose losing my ripped CDs wouldn’t be my biggest concern if my house burned down, but I guess you can see where my head is at.

    As far as RAID-5 or 6, I have to admit that I use neither at home. Drives are so dang cheap that I just keep mirrored backups. If I have a drive failure then I just have to copy the data from the backup. It doesn’t bother me that it might take a few days. Of course the reason folks normally buy RAID is to avoid this down-time if a drive fails. But that’s just not my concern since I’m not running a business from my home.

    I’d guess I have about 2.5TB of storage spread throughput my home, but only maybe 800GB of real data (including my ripped music). The rest of the space is either unused or is used for backing up. (I have my machines automaticaly backup to each other.) If you have 2.1TB of real data, then backing up to USB drives is going to be a little expensive. What’s a 1TB drive nowadays? Maybe $750? So you’re looking at $1500 jsut to protect against drive failure (and sofware failure). That’s getting a little pricey.

    So I can tell you what I do. First, I use Symantec (Norton) Ghost to make nightly incremental backups of my non-music data, i.e., all of my C: drives. Then I use SecCopy to keep backups of my music current. SecCopy will update just the files that have changed, so you won’t have to copy 2.1TB of data every time you want to do a backup. And lastly, and probably the piece most interesting for you, is an offsite backup service from Carbonite. This is something I just started using recently, but for $50 a year you can backup your C: drive to the Carbonite storage servers (where ever they happen to be - I’ve heard the storage comes from Amazon, but I could be wrong) and all of the data is encrypted before it leaves your machine. The data trickles out REAL slow in the background, but eventually it all gets backed up. I think it took me about 14 days to backup 40GB. I’ve noticed absolutely no slow down on my system and so far have been very happy. I wonder if Carbonite would notice if you started backing up 2.1TB of data? They probably have something in the license agreement that says they can cancel your service at any time, but perhaps it’s worth a shot.

    Good luck with everything.

    TT

  8. Kevin Says:

    Tom, I have 5 TB of unraided data that I need to raid.

    Would I be fine with a 12 drive raid 5 setup (500gb drives)
    or should I go raid 6?

    At what point does raid 5 get sketchy (Seen those 16 port cards, 24 port cards…)?

    I’m trying to do this with as little resources as possible, we’re a
    new and still cash strapped video editing company…

  9. Tom Says:

    Kevin, that’s such a tough question. With any RAID-5 you face the probability that you will get an error during rebuild, causing loss of data. This probability increases as the drive count and drive capacity increases, so it’s hard to say when that probability becomes too high for you to accept. And of course this is the exact problem that RAID-6 solves.

    But you need to decide if you really care if you get an error during rebuild. These errors will cause you to lose data (assuming the controller supports bad block or bad stripe marking), but that’s not a problem if you have good backups. You can always just restore the corrupted file(s).

    Also, you need to look at performance. You might not be able to tolerate the drop in write performance on a RAID-6. A database application with heavy writes will see as much as a 50% hit. But a video app, such as you’re running, could see a hit of 10% of less. And of course if you’re mostly reading, not writing, then you won’t see any performance hit at all.

    Sorry. That wasn’t a definitive answer to your question, but hopefully it helps.

    TT

  10. Bob Says:

    Tom

    Is the rebuild time under R6 significantly better than or similar to R5.

    I have one vendor (who shall remain nameless) sy it is.

    I thought the R6 mostly provided additional protection and little improvement - if any - in rebuild time?

Leave a Reply