RAID 5 and database
Posted in General, Storage Applications, Platforms, Storage Interconnects & RAID, Storage Management, Application Environments, Advisor - Neil by NeilJust finished a quick road trip to the other side of the country where I was espousing the benefits of our new MaxIQ product. As a consequence of talking to database integrators we spent a lot of time discussing existing implementations that they were having problems with (and therefore candidates for MaxIQ).
Something that came to light on a regular basis was the fact that a lot of integrators use RAID 5 for every system they implement, whether it be a fileserver or a database server. Now RAID 5 is a good all rounder. It’s great for fileserving and general server use, makes good use of the available disk space and most people are comfortable enough with the technology to actually think they understand it.
So what’s wrong with that. Simply put, RAID 5 is (in general) no good for environments with small random writes. Since I was promoting MaxIQ which is excellent for small random reads, naturally I found myself in an environment where there were a lot of random writes. On almost all occasions customers were using RAID 5. Most were using SAS disks, which meant they had recognised the signfiicance of the performance issues they were faced on database servers and had opted to offset their performance issues with fast-spinning/seeking disks.
Therefore we had a scenario where customers were trying to fix performance problems with hardware alone. Put a faster RAID card in the machine, put faster disks in the machine, add more RAM, improve the processor … but what about something as simple as using a different RAID level? RAID 5 is a great performer on many disk types, over a wide variety of read/write scenarios and data sizes, but it has one weakness. Random writes become slow because of the minor stripe write characteristics of RAID 5. There are multiple reads and writes on disks, plus parity calculations to be made for every small write in a RAID 5.
So what to do? Simply for most database applications you should consider using RAID 10. RAID 10 does not do parity, but simply writes the same data in two separate disks within the array. Consequently for a variety of technical reasons RAID 10 has faster random writes than RAID 5. Yes, there are scenarios where RAID 5 is good for database (in mostly read-type database environments), but in general, especially in the SMB market with accounting database, SQL, Exchange etc, RAID 10 is a better option for database performance.
Remember … you don’t have to make your entire disk structure of the same RAID type (mix and match for your different data types), you can have more than 4 drives in the RAID 10 and you can have different disk types connected to the same card, so now you can mix your raid type across different physical hardware arrays (just don’t put both disk types in the same array).
Point of the exercise? Hardware alone won’t give the performance every time. It will help, but you need to keep an eye on your RAID type to ensure it matches your data set.
Ciao
Neil