Home | About the Storage Advisors | Adaptec Trusted Storage


Livello di RAID migliore per le piccole files

Posted in Storage Interconnects & RAID, Advisor - Tom Treadway by Tom Treadway

Question to the Storage Advisors, from Alessandro: I am trying to help an IT manager friend of mine to understand which solution would be the best one to store a LARGE amount of very small files. He is usually using what I know to be the worst solution in terms of performance (which are very important to him): Raid-5. I know the best thing would be RAID 1+0 because of seek times etc but I’d like to know what’s the best way to span files across more disks so that accesses and seeks could be distributed on more disks. Is the only way creating more raid and distributing files amongst them? I wasn’t able to find any kind of information on the net googling around or searching for benchmarks, white papers, best practices on emc, hp and other storage vendors’ websites.

Buon giorno, Alessandro. First let me congratulate you on a most excellent name. Speaking for the Tom’s of the world, we are envious. :-)

In your question you said you wanted to “store” a large amount of small files. This sounds like a “write-mostly” environment – perhaps some sort of backup appliance. However later you mention distributing accesses which makes me think that a lot of reads are still happening – perhaps some sort of database of individual files.

Another piece of the puzzle is whether these accesses to the small files are sequential or random. If the access pattern is mostly writes, again like a backup appliance, then it might be a sequential pattern. However if there are a lot of reads, it’s probably random.

I just don’t have enough information to answer you accurately. But let me take a shot at this…

If the environment is a lot of short random IO, with a high queue depth, then you’ll want each command to fall entirely within a single drive (or stripe). The goal is to have every drive processing a unique command. This means that if each drive can do 100 IOPS, then an array of eight drives can do 800 IOPS (temporarily ignoring RAID-5 write issues). Every time a command falls across a stripe boundary it will tie up two drives. If every command fell across such a boundary, the performance would drop to 400 IOPS. The real answer with just occasional boundary crossings is somewhere in between 400 and 800 IOPS.

Still assuming short random IO, if the access pattern is mostly reads then all RAID levels should have roughly the same performance. Pick the RAID level that is the most cost effective.

However if the access pattern has a lot of writes, then RAID-5 is probably the worst choice. Every host command will be turned into two disk reads and two disk writes. Performance will drop 4X at a minimum – probably more. RAID-10 makes more sense for this environment.

But if your environment is mostly short sequential writes, as with a backup appliance, then the most important part of the solution is not the RAID level, but it’s having a good battery-backed write-back cache on the controller. With a write-back cache and a RAID-5, the short host IOs will be combined into very efficient long sequential writes that can use a full-stripe-write algorithm. The result is that RAID-5 and RAID-10 will have similar performance. You won’t see the 4X hit described in the random IO case. You can read more about this in my previous post on the effects of cache on RAID performance.

Anyway, I hope that helped. Feel free to drop a comment if you want to dig into this.

Thanks,
TT

P.S. Note that I didn’t address the issue of files. The RAID card has no idea what a file is. All we can offer for tuning purposes is the assumption that a file size somehow corresponds to the command IO size seen by the RAID card.

Leave a Reply