Home | About the Storage Advisors | Adaptec Trusted Storage


Living Loving MAID

Posted in Storage Applications, Storage Interconnects & RAID, Advisor - Tom Treadway by Tom Treadway

No, this isn’t the same maid you might have seen going around town in her aged Cadillac. And you’ll certainly find no reference to a purple umbrella or a fifty cent hat.

MAID stands for Massive Arrays of Idle Disks. The term was first coined in the January 2002 whitepaper of the same name, created by Messrs. Colarelli, Grunwald and Neufeld at the University of Colorado in Boulder.

Given CU’s proximity to StorageTek, one of the world’s leading manufacturers of large tape libraries, this whitepaper focused on replacing these tape libraries with a massive array of disks. For example, the StorageTek 9310 could support (at the time the whitepaper was written) up 6000 tapes, each with a capacity of 60GB. Also at that time, a typical consumer grade disk drive was 60GB, so the equivalent disk-based library would have to contain 6000 drives. How convenient.

The key to an affordable and power-efficient tape library is that there is only a limited number of actual tape drives in the library. In other words, all 6000 tapes are not being concurrently accessed by 6000 tape drives. For example, the 9310 can contain a maximum of 80 tape drives. Therefore, to stay in the power budget of the 9310, CU’s folks had to come up with a scheme that would avoid having all 6000 drives continually powered up, thereby providing cause for the word “Idle” to be included in the MAID acronym.

The MAID whitepaper goes on to explain two methods of implementation, with and without cache, but the basic premise of both methods is that a virtualization manager will manage a pool of passive drives that are spun up only as needed. Besides conserving power, this also reduces wear-and-tear on the drives, increasing their lifespan significantly.

The paper does not go into the details of the algorithm used to determine which drives are spun up or whether RAID is used to protect the data from drive failure. But clearly it is critical to answer both questions. Assuming the drives were in use 24/7, and using the methods and values described in an earlier post, the library technician should expect to replace a failed drive every 7 days! Even if the drives were only spun up 10% of the time, that’s still a drive replacement every 3 months. Clearly RAID has to be part of the solution.

Enter Copan Systems.

In late 2004, Copan released the world’s first, and seemingly only, disk-based backup product, the Revolution 200T, based on the MAID concept. There has been a flurry of activity lately as new versions of the Revolution were released, but in this post I’m going to try to steer away from doing a product review, and instead concentrate on describing the MAID-related aspects of their product.

I’ll start by refreshing your memory on the layout of a typical RAID-4. Admittedly RAID-4 isn’t used in real-life, but it will help the reader visualize the MAID algorithms – or at least Copan’s interpretation of the MAID algorithms.

RAID-4
You can see in this picture that each horizontal row of stripes is used to calculate the parity. For example, minor stripes A, B, C and D are XOR’ed to create the first parity stripe. This pattern is repeated for each row, or major stripe. In fact, if you were to rotate the party position within each major stripe, you’d have RAID-5. But for clarity, let’s leave out the rotation.

Now imagine doing a system image backup where the backup is written to stripes A through L. The minor stripe size could be selected to have each “chunk” of the backup cross an entire major stripe (A thru D), or fall within each stripe (first A, then B, etc.). To enable a clear and understandable comparison to MAID, let’s assume that each chunk falls entirely within a minor stripe. For example, the backup application is writing 64KB at a time, and the minor stripe size is 256KB.

When A is written by the application, the parity is updated via a method called Read/Modify/Write. This is a standard RAID-5 technique that won’t be rehashed here, but in summary: The new data A’ is XOR’ed with the old data A and the old parity P, creating the new parity P’.

Since the backup application is dumping huge amount of data to the array, disks v thru z will practically see a continuous stream of data. Therefore disks v thru z will have to be continually powered up. And continually experiencing a slow MTBF-related death.

This is where it gets interesting.

Following the MAID principles, Copan rotated the RAID-4 algorithm by 90 degrees. Rather than stripe the data across multiple disks, they concatenated the disks as follows. Note that I show each disk having just three stripes, but imagine a real disk that has millions of these stripes.

MAID
Now imagine the same backup application writing to stripes A, B, and C. Using the Read/Modify/Write algorithm it’s only necessary to access disks v and z. Disks w, x and y can remain spun down. Once disk v fills up, it’s spun down and disk w spins up. Assuming the ratio of online virtual tapes to number of disks is low, it’s possible to keep a majority of the disks spun down at any one time.

And that is your basic MAID algorithm.

If you stare at this drawing long enough you’ll uncover several issues. First, isn’t disk z a bottleneck? And won’t z wear out faster than the other drives? And what if the access patterns to the virtual tapes weren’t uniform, and some disks got hit more than others? Yep. And Copan resolves those issues by a little magic they call Disk Aerobics. By examining individual work loads and idle times, as well as error thresholds, they are able to distribute the load more fairly.

But at this point we’re getting into more of a product review than a MAID algorithm review. So, to summarize:

MAID allows a subset of the drives to remain powered, while still providing RAID redundancy. It trades traditional performance found in a RAID-5 for lower-power and higher MTBF. High IO’s per second are traded for “decent” throughput with occasional 15 second delays as drives are spun up.

So while MAID should not be considered a replacement for traditional RAID algorithms in typical server environment, clearly MAID-based storage is a great replacement for tape libraries.

Enjoy.

TT

April 4, 2006 upate: Did I say “great replacement”? Ming’s comment below reminded me that I was maybe a little over-zealous in my praise. The Copan system is certainly an “acceptable” replacement in my opinion, but I should probably reserve the word “great” for a higher performance system. Thanks for the reality check, Ming!

2 Responses to “Living Loving MAID”

  1. ming zhang Says:

    if this is the maid algorithm, then this speed is not decent at all.

    most time virtual tape backup come with long sequentical stream write. so if u write A B C D. in raid4/5, it is a full write and u need not read parity at all. so it can archieve 4x single disk speed here.

    but for maid, it will do r-m-w for each block. it will be VERY slow. start from LBA X, assume the block size is y, so u need to read Y from disk at X, seek back to X, write Y to disk at X. (luckily here that next read will start from X+Y so save a seek here) . i do not have a number here, but i would say this is much slower than a sequential throughput u can get from a single disk.

    so how could this to be decent?

  2. Tom Says:

    Copan does some tricks with the initial write where they’re able to “assume” that the data on the unused drives is zero, and therefore they don’t need to do a r/m/w. But they are certainly less clear what they do after the first disk fills up.

    I agree that the total throughput will be less than that of a single disk. If they do their algorithms PERFECTLY, then they might get half a disk’s throughput. Maybe that’s about 40MB/s?

    40MB/s is definitely less than the throughput of a high-end tape, but maybe Copan’s customer’s don’t care.

    Anybody out there actually use one of these things?

    TT

Leave a Reply