<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="wordpress/1.5.2" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
>

<channel>
	<title>Storage Advisors</title>
	<link>http://storageadvisors.adaptec.com</link>
	<description>Storage Solutions for Real World IT Professionals</description>
	<pubDate>Wed, 03 Oct 2007 15:20:21 +0000</pubDate>
	<generator>http://wordpress.org/?v=1.5.2</generator>
	<language>en</language>

		<item>
		<title>RAID Stripe width</title>
		<link>http://storageadvisors.adaptec.com/2007/09/07/raid-stripe-width/</link>
		<comments>http://storageadvisors.adaptec.com/2007/09/07/raid-stripe-width/#comments</comments>
		<pubDate>Fri, 07 Sep 2007 23:43:39 +0000</pubDate>
		<dc:creator>Steve</dc:creator>
		
	<category>General</category>
	<category>Storage Applications</category>
	<category>Storage Interconnects &#038; RAID</category>
	<category>Storage Management</category>
	<category>Advisor - Steve Rogers</category>
		<guid>http://storageadvisors.adaptec.com/2007/09/07/raid-stripe-width/</guid>
		<description><![CDATA[Does the number of disks in a RAID-5 array affect the performance ?]]></description>
			<content:encoded><![CDATA[	<p>Does the number of disks in a RAID-5 array affect the performance of the array?<br />
Received a question from a reader wanting to know if the number of Drives in a RAID set affect the performance of the Array. The short answer is yes it most certainly does!<br />
Commonly referred to as the stripe width, which refers to the number of parallel stripes that can be written to or read from simultaneously. This is of course equal to the number of disks in the array. So a four-disk striped array would have a stripe width of four. Read and write performance of a striped array increases as stripe width increases, all else being equal. The reason is that adding more drives to the array increases the parallelism of the array, allowing access to more drives simultaneously. As an example, you will generally have superior transfer performance from an array of eight 18 GB drives than from an array of four 36 GB of the same drive family, all else being equal.</p>
	<p>I will also point out that adding more drives to your stripe width is a two–edged sword; especially with today’s Disk capacities increasing, soon to be 1TB per disk;  The more high-capacity drives you have in a RAID set, the more likely it is to get bit errors and or a drive failure that will require a RAID rebuild; and RAID re-build / reconstruction times of these drives can be very long.  I refer to my astute colleague, Tom Treadway, who has written several blogs about how MTDDL decreases as drive count increases - mostly due to BER. Not to be redundant in this Blog entry, his articles can be found at our Blog page referencing several titles: NOT everybody loves SATA, Is RAID-6 made of wood?, Real-life RAID reliability; and lastly, RAID reliability calculations.</p>
	<p>From a performance perspective, my testing has shown that a stripe width of 8 to 12 drives is usually maximum for most controllers or storage arrays. Beyond that, you are usually maxing out the interface to the host or the backend of the storage array.  Moreover, the more RAID sets you have, the more isolated your failures will be to a single data set that won’t affect data  on the other  RAID sets.  If you go larger, higher RAID levels, like RAID 6, 50 or 60 are more &#8220;data resilient &#8221; choices.</p>
	<p>There really aren’t many practical reasons to go with a stripe width much larger, unless you have a hugely performance dependent application like  -for real-time data capture -  holding temporary/transitory data meaning,  you don’t plan on keeping the data there very long, at least not without backing it up or having a copy somewhere else.</p>
	<p>SR</p>
]]></content:encoded>
			<wfw:commentRSS>http://storageadvisors.adaptec.com/2007/09/07/raid-stripe-width/feed/</wfw:commentRSS>
	</item>
		<item>
		<title>Effect of drive count on RAID-5</title>
		<link>http://storageadvisors.adaptec.com/2007/07/10/effect-of-drive-count-on-raid-5/</link>
		<comments>http://storageadvisors.adaptec.com/2007/07/10/effect-of-drive-count-on-raid-5/#comments</comments>
		<pubDate>Tue, 10 Jul 2007 19:30:12 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
		
	<category>Storage Interconnects &#038; RAID</category>
	<category>Advisor - Tom Treadway</category>
		<guid>http://storageadvisors.adaptec.com/2007/07/10/effect-of-drive-count-on-raid-5/</guid>
		<description><![CDATA[Question to the Storage Advisors:  What are the practical limitations on the number of disks in a RAID-5 set?  I understand that a larger number of disks worsens the probabilities of bad things like a failure or paying the worst possible seek/delay cost in random accesses.  Do you have any mathematical models for evaluating these penalties? ...]]></description>
			<content:encoded><![CDATA[	<p><strong>Question to the Storage Advisors, from Dean:  What are the practical limitations on the number of disks in a RAID-5 set?  I understand that a larger number of disks worsens the probabilities of bad things like a failure or paying the worst possible seek/delay cost in random accesses.  Do you have any mathematical models for evaluating these penalties?</strong></p>
	<p>Dean, there are several ways to look at this issue.  Some of the areas I point out below are pretty obvious, but for the sake of completeness I’ll cover it all:</p>
	<p><strong>Storage efficiency:</strong>  As the disk count increases so does the storage efficiency.  This is because there is one disk’s worth of redundancy (parity) per array.  For example a 3-disk RAID-5 has one disk’s worth of parity and two disk’s worth of usable space, therefore the efficiency is 67%, i.e., 67% of the total disk space is available for user data.  Likewise, a 10-disk RAID-5 has an efficiency of 90%.  As a formula, it looks like this:</p>
	<p><em>Efficiency = (DiskCount-1) / DiskCount</em></p>
	<p><strong>Degraded performance: </strong> A degraded RAID-5 is an array with a failed disk.  If the user tries to read a block on the failed disk the RAID software will have to access all the other disks in the array to reconstruct that missing data.  However if the user tries to read a block on one of the remaining good disks then nothing special happens.  The data is simply read from the disk.</p>
	<p>So let’s go back to the 3-disk example and assume a single failed disk.  And let’s assume that the user is reading just one block, and let’s see how that read transforms to one or more disk accesses.  If the user reads the two good disks then each read will be converted to just one disk read.  However if the user reads the bad disk then that read will turn into two disk reads (from the good disks).  So on average, three random disk reads (one per disk) will result in four disk IOs, or an increase in IOs of 33% from the optimal array case.  Now let’s look at the 10-disk array.  A read from nine of the disks will result in just one IO, however a read to the bad disk will result in nine IOs, i.e., one read from each of the remaining good disks.  So ten random reads will result in 18 disk IOs, an increase of 80%.  As a formula, it looks like this:</p>
	<p><em>IOIncrease = ((Disk Count-1) + (DiskCount-1) – DiskCount ) / DiskCount</em></p>
	<p>which reduces to:</p>
	<p><em>IOIncrease = (DiskCount-2) / DiskCount</em></p>
	<p>Now let’s look at writes - they’re a little more complicated.  Each host write will typically result in four disk IOs – two reads and two writes.  One read and write will be on a data disk while the other read and write will be on a parity disk.  We’ll need to see what happens if any of those IOs go to a bad disk.  If a read touches a bad data disk then all the other disks will have to be read, just as in the previous example.  However the data disk won’t be written, obviously, because the disk is bad.  Therefore the read becomes <em>(DiskCount - 1)</em> reads, and the write just goes away.  The parity disk still has both the read and write.  So the total number of IOs is <em>(DiskCount - 1) + 2</em>.</p>
	<p>If a read touches a bad parity disk, then nothing special happens.  There is no parity to update and therefore the write to the data disk is just a plain ol’ write.  The total number of IOs will go from four to just one.</p>
	<p>OK, to summarize, normally a write causes 4 IOs.  However if the data disk is bad the total IOs will increase to <em>(DiskCount + 1)</em>.  Likewise, if the parity disk is bad the total IOs will decrease to 1.</p>
	<p>If we go back to our example of ten IOs spread evenly (one per disk) across ten disks, you’d see that 8 host writes will result in 4 IOs, one host write will result in 11 IOs and one will result in 1 IO, for a total of 44 IOs.  In an optimal array all 10 host writes would result in 4 IOs each, of a total of 40 IOs.  That’s only an increase of 10%.  The formula looks like this:</p>
	<p><em>Increase = ((DiskCount-2)*4 + (DiskCount+1) + 1 – (DiskCount*4)) / (DiskCount*4)</em></p>
	<p>which reduces to:</p>
	<p><em>Increase = (DiskCount-6) / (DiskCount*4)</em></p>
	<p>It’s interesting to note that if the disk count is exactly 6 then the increase is zero and the total IOs don’t change.  If the disk count is less than 6 then the total IOs actually drop!</p>
	<p>Also, you may have noticed that I conveniently left out the XOR time.  In general, assuming that the controller doesn’t have a memory bottleneck or the XOR isn’t done in software, then the XOR time is relatively small compared to the disk time, so it can be just left out of the equation.</p>
	<p><strong>Rebuild time:</strong>  <em>[Note that this section was reworded on July 18, 2007.  An observant reader had noticed that I was in the weeds.  <img src='http://storageadvisors.adaptec.com/wp-images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> ]</em>  When a bad disk is replaced it is re-created by reading from all the other disks in the array.  Luckily these reads can be issued in parallel and therefore the rebuild time does NOT increase linearly as the disk count increase.  In other words, to rebuild a 3-disk array will require reading two entire disks and writing a third disk.  Likewise rebuilding a 10-disk array will require reading nine entire disks and writing a tenth disk.  The time to read two disks should be roughly the same time to read nine disks.  There are some additional complications regarding XOR and potential limitations in the hardware, but they don&#8217;t have too much effect on the rebuild time and probably aren&#8217;t interesting enough to repeat here.</p>
	<p><strong>Second disk failure:</strong>  There is a small chance that another disk will fail before the first one is replaced.  The chance of an array failing is simply the chance of a disk failing multiplied by the number of disks in the array.  Therefore the more disks in the array, the higher the likelihood of at least one disk in the array failing.  BTW, the chance of a disk failing is inversely proportional to the MTBF (Mean Time Before Failure) of the disk.  MTBF is a common way of indicating disk reliability.  Here’s the simple formula:</p>
	<p><em>ChanceOfArrayFailure = ChanceOfDiskFailure * DiskCount</em></p>
	<p>Some folks like to correlate multiple failures, meaning that if one disk fails then there is a higher chance (such as 10X) that a second disk will fail – possibly due to power supply problems, overheating, or shared cable issues (as with parallel SCSI).  But really these chances are extremely small and it’s really not worth going into the detail again.  A more thorough review can be found <a href="http://storageadvisors.adaptec.com/2005/11/02/actual-reliability-calculations-for-raid/">here</a>.</p>
	<p><strong>Bit error during rebuild:</strong>  This is probably the biggest negative with extremely large RAID-5 arrays.  Every sector on the disk has a very small chance of being unreadable, even using error correction codes (ECC).  This is referred to as the Bit Errors Rate (BER).  A typical low-end SATA disk will have an uncorrectable bit error for every 10^14 bits read and a typical high-end SAS disk will have an uncorrectable bit error every 10^15 bits.  These seem like big numbers, but keep in mind that there are almost 10^13 bits in a 1TB disk.  That means that you’re almost guaranteed to get a bit error if you read from ten SATA disks.</p>
	<p>So what does it mean if you get a bit error?  Basically it means that one entire 512 byte sector will be unreadable, further meaning that the corresponding failed sector can’t be rebuilt.  The bottom line is that you’ve just lost data.  A good RAID controller will mark that sector “offline”, allowing the OS to get an error which will cause the user to restore the corresponding file from backup.  A bad RAID controller will ignore the error (causing hidden data corruption) or will abort the rebuild, leaving the user one disk failure away from total loss of all data.</p>
	<p>Going back to our 10-disk example, let’s assume that we’re using 1TB SATA disks with a total of 8.8&#215;10^12 bits and an error rate of 10^14.  We’ll have to read nine disks to rebuild the bad disk, resulting in 7.9&#215;10^13 total read bits.  Divide that by the 10^14 error rate and you have a 79% chance of getting a bit error.  In other words, you probably won’t be able to rebuild the array!</p>
	<p>(Note that the situation is often not that horrific.  A good RAID controller will perform a continuous background media check looking for bit errors before the disk fails.  If one is found then it is repaired while the array is still optimal.  It’s difficult to say how much that improves your chances of rebuilding, but it’s generally accepted that background media checks are a “good thing”.)</p>
	<p>Here’s the resultant formula:</p>
	<p><em>ChanceOfDataLoss = (DiskCount–1) * DiskCapacityInBits / BER</em></p>
	<p>In addition to minimizing the chance of a BER by performing the background scan, it’s common to divide the RAID-5 into multiple RAID-5 arrays combined with striping – more commonly referred to as RAID-50.  All of the equations above can be easily adapted to a RAID-50 configuration.</p>
	<p>Dean, I hope that helps answer your question.  I realize it was WAY more information than you were looking for, but I was on a roll.  <img src='http://storageadvisors.adaptec.com/wp-images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
	<p>TT
</p>
]]></content:encoded>
			<wfw:commentRSS>http://storageadvisors.adaptec.com/2007/07/10/effect-of-drive-count-on-raid-5/feed/</wfw:commentRSS>
	</item>
		<item>
		<title>A lovely montage of RAID webinars</title>
		<link>http://storageadvisors.adaptec.com/2007/07/02/raid-webinars/</link>
		<comments>http://storageadvisors.adaptec.com/2007/07/02/raid-webinars/#comments</comments>
		<pubDate>Mon, 02 Jul 2007 14:04:37 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
		
	<category>Storage Interconnects &#038; RAID</category>
	<category>Advisor - Tom Treadway</category>
		<guid>http://storageadvisors.adaptec.com/2007/07/02/raid-webinars/</guid>
		<description><![CDATA[I wanted to let you fine folks know that I’ve recorded a few webinars on many of the topics that I routinely discuss here in this blog.  Each one is about 15 minutes long and comes with smorgasbord of tasty slides.  The biggest downside is that my delivery is only slightly better than Ben Stein discussing the Hawley-Smoot Tariff Act.  (Anyone?  Anyone?) ...]]></description>
			<content:encoded><![CDATA[	<p>I wanted to let you fine folks know that I’ve recorded a few webinars on many of the topics that I routinely discuss here in this blog.  Each one is about 15 minutes long and comes with smorgasbord of tasty slides.  The biggest downside is that my delivery is only slightly better than Ben Stein discussing the <a href="http://en.wikipedia.org/wiki/Hawley-Smoot_Tariff_Act">Hawley-Smoot Tariff Act</a>.  (Anyone?  Anyone?)</p>
	<p>Here’s a list of the topics that can be found at the <a href="http://www.adaptec.com/dataprotectiondays/">webinar website</a>.</p>
	<p><strong>How to Map Applications to RAID Configurations </strong>- Learn how to match the read and write access patterns in your daily applications to the best RAID configuration for your business. </p>
	<p><strong>Maximize Data Protection for Cost-Effective SATA Drives </strong>- You don&#8217;t have to compromise reliability when using SATA. We&#8217;ll show you how to deploy RAID 6 to provide maximum fault tolerance.</p>
	<p><strong>RAID Error Handling </strong>- Learn how comprehensive error handling capabilities built into Adaptec RAID increase the effectiveness of data protection.</p>
	<p><strong>Success with Snapshot Backups </strong>- Gain a thorough understanding of how you can enhance data protection with the fast backup and instant data access of virtual point-in-time snapshots.</p>
	<p><strong>Growing Data Protection as Your Server Grows </strong>- Get tips on how to manage RAID and remove the complexity of capacity management as your storage grows.</p>
	<p><strong>Software vs. Hardware RAID </strong>- Learn the advantages and disadvantages of software RAID and hardware RAID.</p>
	<p>If anyone has any questions just drop me a line and I&#8217;ll start a blog post.</p>
	<p>Enjoy,<br />
TT
</p>
]]></content:encoded>
			<wfw:commentRSS>http://storageadvisors.adaptec.com/2007/07/02/raid-webinars/feed/</wfw:commentRSS>
	</item>
		<item>
		<title>Write-back cache:  Battery vs Disk</title>
		<link>http://storageadvisors.adaptec.com/2007/06/28/write-back-cache-battery-vs-disk/</link>
		<comments>http://storageadvisors.adaptec.com/2007/06/28/write-back-cache-battery-vs-disk/#comments</comments>
		<pubDate>Thu, 28 Jun 2007 13:13:51 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
		
	<category>Storage Interconnects &#038; RAID</category>
	<category>Advisor - Tom Treadway</category>
		<guid>http://storageadvisors.adaptec.com/2007/06/28/write-back-cache-battery-vs-disk/</guid>
		<description><![CDATA[Question to the Storage Advisors:  Which is better: (a) backup battery for cache as found on OEM RAID controllers or (b) writing cache content to one or more disk drives?]]></description>
			<content:encoded><![CDATA[	<p><strong>Question to the Storage Advisors, from anonymous:  Which is better: (a) backup battery for cache as found on OEM RAID controllers or (b) writing cache content to one or more disk drives?</strong></p>
	<p>Good question.  For those not verse in the dark arts of cache write-back strategies, we’re talking about methods for protecting user data that has been written by the OS but hasn’t actually made it to the media yet.  It’s common for a disk controller to improve write performance by accepting the data from the OS and saying that it’s been written to disk, when in reality it’s still in memory (or disk, as suggested by the poster’s question).  This technique is referred to as “write-back” because the data is written in the background.  The opposite of write-back is “write-through” where the controller really does write the data to disk before telling the OS that it’s finished.</p>
	<p>[Note that controllers aren’t the only things that have a write-back cache - the OS and drives also have one.  But to avoid complicating a somewhat simple question, let’s just ignore those other caches for now.  The OS has ways of protecting itself, and drive caches should be disabled if a write-back controller is being used.]</p>
	<p>It’s very important that this un-written controller data is protected because it’s the only copy of that data.  The OS thinks that the data is written to disk and therefore purges it from memory or wherever it came from.  If a power failure occurs before this write-back data is written to disk, then it’s permanently lost.  With large caches we’re talking about 100’s of MBs of data.  And it can even be worse than that because the missing writes could be to a file structure or database, resulting in massive corruption and loss of files that aren’t even being accessed.  It can be a real mess.  And the user won’t know about it until they read back corrupted data – which isn’t always obvious.</p>
	<p>So, on to the question:  What’s the best way to protect this unwritten data?  The most common approach is to simply put a battery on the disk controller.  If power is lost to the system, including to the drives, the controller memory will transition to battery-backed mode and preserve any write-back data that hadn’t made it to disk yet.  The battery is typically selected to provide at least 72 hours of backup time – protecting data across a weekend.</p>
	<p>An alternative method, as suggested by the poster, is to save this write-back data to disk.  There are different ways to implement this, but the most common is to “simply” store the data in a transaction log on the disk.  Now, note that the data is typically stored on the disk (in either method – battery or log) using some form of RAID, protecting against data loss due to a drive failure.  RAID-5 is a pretty commonly selected RAID level, but has very poor random write performance – a problem which just happens to be greatly alleviated by some form of write-back cache.  So for this example, let’s assume that RAID is being used.  This means that the write-back data being logged to disk should also be protected from disk failure.  The easiest way to do this is to simply write the log file to two disks.  (Some users prefer RAID-6 which protects against two drive failures, in which case the transaction log should be written to three disks!)</p>
	<p>OK, now let’s look at the pros and cons of the controller-based battery and disk-based log approaches.</p>
	<p><strong>Backup Protection Time:</strong>  A battery has a limited storage time – around 72 hours as previously pointed out.  However a transaction log on disk can last almost indefinitely, i.e., the lifetime of the drive.  So here the advantage clearly goes to disk-based logs.  (BTW, some folks are looking at ways to automatically move the controller cache data to a more permanent storage device, like CompactFlash, allowing controller backup times similar to transaction log backup times.  So this will eventually become moot.)</p>
	<p><strong>Life Expectancy:</strong>  Another issue with batteries is that they don’t last forever.  They eventually degrade and fail, lasting maybe a few years before they need to be replaced.  Drives obviously don’t have this issue.</p>
	<p><strong>Capacity:</strong>  This one is really a nit, but I figured I’d list it to be complete.  If a controller has 256MB of memory, for example, then the transaction log will require 2&#215;256MB of disk space, or 512MB.  With 1TB drives, this one is a big fat “don’t care”.</p>
	<p><strong>Cost: </strong> Batteries and the associated circuitry probably add about $100 to the user-cost of a controller, while 512MB of disk space for the log is practically free.  $100 might be a big deal for a home user (who probably doesn’t need RAID or write-back cache anyway), but it’s just another nit for serious IT folks.  Once you add up the price of the motherboard, OS, drives, etc., $100 is in the noise.</p>
	<p><strong>Performance:</strong>  So far the advantage has clearly gone to the disk log, but performance is probably the most important factor when choosing a cache backup protection method.  With a battery-backed controller there are no additional steps to protect the data in cache.  “It just works.”  Of course there is a lot of magic in the hardware design to make it “just work”, but that has no effect on the performance.  On the other hand, with disk-based logs the data has to be written to two different disks.  This will probably entail two seeks, assuming that the drives had been servicing requests in some other section of the media.  And eventually, that logged data will have to be read back from disk and moved to the permanent location – causing two more seeks and reads.  So now a single OS write will cause four additional IOs to the disks.</p>
	<p>So how the heck do we figure out the performance hit due to these four IOs?  Let’s try this crude method:</p>
	<p>Assume that RAID-5 is being used.  Therefore each random OS write will cause four disk IOs - two reads and two writes.  With disk-based logging there are four additional IOs to log and “un-log” the data for a total of eight IOs.  Using this approach we can see that disk logging has twice as many IOs as controller battery-protected cache, therefore you get about a 2X difference in performance.  Of course real performance modeling will be more complex than this, so just squint at the numbers and figure that the difference is anywhere from 50% to 150%.  That’s a big dang difference.</p>
	<p>The bottom line is that most users that are concerned with performance aren’t concerned with saving $100, therefore battery-backed cache is clearly the winner.</p>
	<p>Enjoy,<br />
TT
</p>
]]></content:encoded>
			<wfw:commentRSS>http://storageadvisors.adaptec.com/2007/06/28/write-back-cache-battery-vs-disk/feed/</wfw:commentRSS>
	</item>
		<item>
		<title>Combining multiple arrays</title>
		<link>http://storageadvisors.adaptec.com/2007/05/18/combining-multiple-arrays/</link>
		<comments>http://storageadvisors.adaptec.com/2007/05/18/combining-multiple-arrays/#comments</comments>
		<pubDate>Fri, 18 May 2007 12:44:29 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
		
	<category>Storage Interconnects &#038; RAID</category>
	<category>Advisor - Tom Treadway</category>
		<guid>http://storageadvisors.adaptec.com/2007/05/18/combining-multiple-arrays/</guid>
		<description><![CDATA[Question to the Storage Advisors:  I have several large SCSI arrays ranging in size from 600 GB to almost 2 TB.  They add up to about 4.5 TB in total.  They are currently seen as separate drives in Windows 2003 server R2.  I'd like to find an adapter card that will let me span them so they appear as one large array.  I'd rather not span them using disk manager in Windows 2003.]]></description>
			<content:encoded><![CDATA[	<p><strong>Question to the Storage Advisors, from John:  I have several large SCSI arrays ranging in size from 600 GB to almost 2 TB.  They add up to about 4.5 TB in total.  They are currently seen as separate drives in Windows 2003 server R2.  I&#8217;d like to find an adapter card that will let me span them so they appear as one large array.  I&#8217;d rather not span them using disk manager in Windows 2003.</strong></p>
	<p>John, first, are you willing to temporarily dump your data to tape in order to reconfigure the storage?  This is probably a requirement because I&#8217;m not aware of any RAID controller that can combine multiple logical drives without losing data.</p>
	<p>Next, are these SCSI arrays created by an external SCSI-to-SCSI controller that you plan to keep?  If so, you need to make sure the logical drives are presented in a way that makes the drives look enough like a physical drive to be striped by your new PCI RAID card.  You’re probably ok on this point.  Also, are these logical drives presented by the SCSI-to-SCSI controller all the same size?  If not, you’re going to have trouble striping them and using all of your capacity since each component of the RAID-0 has to be the same size.  (In other words, the minor stripe size has to be the same on all drives.)  You might end up creating one RAID-0 across all the logical drives, and then another across the “extra” sections of the logical drives, and so on.  And of course you’ll have to have a RAID card that supports this; most of them don’t.  (Hopefully I’m clear about this – if not, let me know and I can draw a picture.)</p>
	<p>The easiest way to fix this is simply attach all your drives to a single PCI RAID card and create a single new array.  Don’t bother trying to stripe the arrays that you currently have.  Just start from scratch.</p>
	<p>Feel free to drop more details about drive count and sizes and I can make a more concrete recommendation on how it can be configured.  Also, have you already figured out what RAID type you need to use?</p>
	<p>TT
</p>
]]></content:encoded>
			<wfw:commentRSS>http://storageadvisors.adaptec.com/2007/05/18/combining-multiple-arrays/feed/</wfw:commentRSS>
	</item>
		<item>
		<title>Yet another RAID-10 vs RAID-5 question</title>
		<link>http://storageadvisors.adaptec.com/2007/04/17/yet-another-raid-10-vs-raid-5-question/</link>
		<comments>http://storageadvisors.adaptec.com/2007/04/17/yet-another-raid-10-vs-raid-5-question/#comments</comments>
		<pubDate>Tue, 17 Apr 2007 12:28:20 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
		
	<category>Storage Interconnects &#038; RAID</category>
	<category>Advisor - Tom Treadway</category>
		<guid>http://storageadvisors.adaptec.com/2007/04/17/yet-another-raid-10-vs-raid-5-question/</guid>
		<description><![CDATA[Question to the Storage Advisors:  The age old question...RAID 5 vs. RAID 0+1 with a twist on spindles.  Here's the deal:  Multiple Progress databases.  Much more read intensive that write intensive.  Which is faster:  (2) RAID0 sets of 4 disks that are then mirrored (8 disks total)  OR all 8 disks in a RAID5 config?  I'm trying to figure out if more disk spindles outperforms less spindles without the RAID5 overhead.]]></description>
			<content:encoded><![CDATA[	<p><strong>Question to the Storage Advisors, from John:  The age old question&#8230;RAID 5 vs. RAID 0+1 with a twist on spindles.  Here&#8217;s the deal:  Multiple Progress databases.  Much more read intensive that write intensive.  Which is faster:  (2) RAID0 sets of 4 disks that are then mirrored (8 disks total)  OR all 8 disks in a RAID5 config?  I&#8217;m trying to figure out if more disk spindles outperforms less spindles without the RAID5 overhead.</strong></p>
	<p>John, good question.  First, let&#8217;s talk about the RAID-0 and RAID-1 combination.  You mention mirroring two 4-drive RAID-0&#8217;s.  But the more typical approach is to stripe four 2-drive RAID-1&#8217;s.  The end result as far as performance and capacity is the same, but the reliability and rebuild time is better with the latter - striped RAID-1&#8217;s.  The reason is that if a single drive fails in the former mirrored RAID-0&#8217;s case then the entire set of 4 drives in the RAID-0 is typically taken off-line.  This means that the array can tolerate at most one drive failure, but all four drives will have to be rebuilt.  However with the latter striped RAID-1&#8217;s case, a single failed drive causes just one mirror pair to be rebuilt.  In fact, you could actually have one drive in each RAID-1 fail and still have the array on-line.</p>
	<p>So as far as your original question, it doesn&#8217;t actually matter which version of RAID-10 is being used.  Both versions will provide the same performance assuming that no drives are failed, and also assuming that the queue depth from the host is large enough and random enough to cause all the drives to be accessed.</p>
	<p>In your example the access pattern is mostly reads and mostly random (because it&#8217;s a database).  Just for the sake of comparison, let&#8217;s say that it is 100% reads and 100% random.  The end result is that all eight of the drives will see a command, or an IO.  That means that if each drive can do 100 IOs per second, then the RAID-10 can do 800 IOPS total.</p>
	<p>On a RAID-5 with 100% reads there are no RAID-5 calculations other than the block redirection due to the striping, which is almost identical to the redirection in RAID-10.  So the end result is that all eight drives are used (since parity is distributed across all drives), and therefore RAID-5 will do the same 800 IOPS as RAID-10.</p>
	<p>So, if you&#8217;re doing 100% reads, the RAID levels are identical in performance.</p>
	<p>But your data base isn&#8217;t 100% reads, so let&#8217;s look at what happens with 100% writes.  With RAID-10, each host access will be written to two drives, so the array performance will drop to one-half, or 400 IOPS.  However with RAID-5 each host access is converted to four IOs (two reads and two writes), so the array performance will drop to one-quarter, or about 200 IOPS.  (It&#8217;s a little more complicated than that, but this is a pretty good estimate.)</p>
	<p>Now you&#8217;ll have to determine the real read/write ratio and calculate the harmonic mean to estimate the performance impact of these write commands.  For example, if the ration was 50:50, then the RAID-10 would get around 533 IOPS and the RAID-5 would get  around 320 IOPS.</p>
	<p>The bottom-line is that the more writes you have the more RAID-5 will hurt performance.  But if writes are rare, then the improved capacity of RAID-5 may warrant the slight performance hit.</p>
	<p>Good luck!</p>
	<p>TT
</p>
]]></content:encoded>
			<wfw:commentRSS>http://storageadvisors.adaptec.com/2007/04/17/yet-another-raid-10-vs-raid-5-question/feed/</wfw:commentRSS>
	</item>
		<item>
		<title>Databases, cages and RAID-10, oh my!</title>
		<link>http://storageadvisors.adaptec.com/2007/03/21/databases-cages-and-raid-10-oh-my/</link>
		<comments>http://storageadvisors.adaptec.com/2007/03/21/databases-cages-and-raid-10-oh-my/#comments</comments>
		<pubDate>Wed, 21 Mar 2007 15:01:33 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
		
	<category>Storage Interconnects &#038; RAID</category>
	<category>Advisor - Tom Treadway</category>
		<guid>http://storageadvisors.adaptec.com/2007/03/21/databases-cages-and-raid-10-oh-my/</guid>
		<description><![CDATA[Question to the Storage Advisors:  We're buying 2 cages that use a PCIe interface and 24 146G 15K drives. We're going to use RAID 10.  Each cage only holds 12 drives.  We'll be using 2 spares in each cage.  So that's a stripe of 5 disk groups.  At 175 IOPs that's approx 875 IOPs across the stripe.  If we go higher than 875 IOPs we begin seeing higher queue lengths correct?  Both cages are hooked to the same server thru the PCIe controller that supports up to 8Gb/s.  This is for an Oracle db, and will host 4 dbs.  2 dbs are IO intensive, 2 are not.  Right now we plan on separating the 2 IO intensive dbs to 1 cage each. Would it be better to use both cages per db? I don't think you can stripe disks across cages(?), so would it be better to create two stripes and put datafiles from both dbs on both cages so that we effectively use 2 stripes of 5 disk groups?]]></description>
			<content:encoded><![CDATA[	<p><strong>Question to the Storage Advisors, from Chris:  Need some quick insight if you have a moment.  This is for an Oracle db server btw.  We&#8217;re buying 2 cages that use a PCIe interface and 24 146G 15K drives. We&#8217;re going to use RAID 10.  Each cage only holds 12 drives.  We&#8217;ll be using 2 spares in each cage (since we have to have 1 spare and can&#8217;t use RAID 10 on 11 drives).  So that&#8217;s a stripe of 5 disk groups.  At 175 IOPs that&#8217;s approx 875 IOPs across the stripe (if I&#8217;m using the correct terminology).  If we go higher than 875 IOPs we begin seeing higher queue lengths correct?  Both cages are hooked to the same server thru the PCIe controller that supports up to 8Gb/s.  This is for an Oracle db, and will host 4 dbs.  2 dbs are IO intensive, 2 are not.  Right now we plan on separating the 2 IO intensive dbs to 1 cage each. So here is my question:  Would it be better to use both cages per db? I don&#8217;t think you can stripe disks across cages(?), so would it be better to create two stripes and put datafiles from both dbs on both cages so that we effectively use 2 stripes of 5 disk groups?  Does that make sense?  (Love your blogs by the way!)</strong></p>
	<p>Chris, first, thanks for the love.  <img src='http://storageadvisors.adaptec.com/wp-images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' />   We&#8217;re feeling you, man.</p>
	<p>You had a lot of good questions, so let&#8217;s try to handle them one at a time.</p>
	<p><strong>RAID-10 drive count:</strong>  It&#8217;s true that a typical RAID-10 has to use an even number of drive since the array is basically a bunch of two-drive RAID-10s striped together.  Obviously you realize this, but here’s a picture for the noobs (I say affectionately):</p>
	<p><img src="http://graphics.adaptec.com/us/TT_SA/RAID-10.jpg" alt="RAID-10" /></p>
	<p>However there is a different form of RAID-10 that does support an odd drive count, and the Adaptec name for that RAID level is RAID-1E.  (I guess “E” means “extended” or “extra”, not “even”.  RAID-1o for “odd” probably would have been a better name, but it’s hard to tell the difference in a zero and an “o”.)  Here’s a picture of a RAID-1E disk layout.</p>
	<ol><img src="http://graphics.adaptec.com/us/TT_SA/RAID-1E.jpg" alt="RAID-1E" /></ol>
	<p>So you should probably consider RAID-1E with one hot spare instead of RAID-10 with two hot spares.  RAID-1E would give you one more drive’s worth of performance, with no real drop in reliability.</p>
	<p><strong>RAID-10 IOPS calculation:</strong>  I assume you’re talking about each drive getting 175 IOPS, right?  That’s about 2X high for a purely random workload, but if the access pattern has a mix of short seeks and sequential IO, then I suppose I could squint enough to see 175 IOPS.  So with reads to a RAID-10, every drive in the array should be able to process a command – assuming that the queue depth from the host is large enough.  For a 10 drive array you’d probably need around 20 concurrent host commands to be assured that each drive had a command to process.  This means that a read-only access pattern to a 10-drive RAID-10 should be able to give you 1750 IOPS.  But on writes, since each command has to go to two drives, that number is cut in half to 875 IOPS.  If you’ve got a mixture of reads and writes, then the real IOPS is somewhere in between.</p>
	<p><strong>Database distribution:</strong>  If I understand you correctly, you plan to create two separate RAID-10 arrays, one per cage.  And you plan to let each RAID-10 array support one intensive and one not-intensive database.  In general, that sounds like a good idea.  That should keep the workload evenly distributed across the drives.  However, that gets me to the next question:</p>
	<p><strong>RAID-10 spanning cages: </strong> The RAID controller should definitely support spanning arrays across cages.  Some people like to keep an array within a cage simply for the sake of portability.  For example, they might want to move one entire cage (along with the databases) to a new server.  You obviously can’t do that if the array spans both cages.  However, having one array has a lot of advantages.  For one, you have only one drive letter to deal with.  If you have two logical drives, then you’re guaranteed to someday have one drive almost full while the other has plenty of space.  And at that point you’re basically screwed.  So having just one large logical drive is always more flexible.</p>
	<p>Another benefit of a larger RAID-10 is that the performance of a single database can be twice as great (assuming that there are enough host commands to keep it busy).  That may not be clear, so look at it this way.  Imagine having two databases.  One is on one RAID-10 while the other is on the other RAID-10.  If both databases are busy, then all the drives are busy, and performance is maxed out.  But now imagine just one of those databases being busy.  The other is completely idle.  Now you have half of your disks just sitting idle contributing nothing to the performance of the system.  At this point you’ll be wishing that the active database spanned all the drive, allowing performance to basically double.  So when you said that you have four separate databases, think about when they’re actually be used.  For example, are the intensive databases always intensive at the same time?</p>
	<p>I hope this helps.  If you have any questions, you know where to find me.</p>
	<p>TT</p>
	<p>P.S.  With all of these drives, you should also consider RAID-6.  There are PLENTY of posts in this blog about RAID-6.  Also, you didn’t mention whether you’re using SATA, SAS, or SCSI.  From the drive sizes, I assume you’re using SCSI or SAS.  You should take a serious look at SATA with RAID-6.  You’ll get a LOT more capacity for a fraction of the cost, and even higher reliability than SCSI.  The downside is that short random writes will suffer.  I can’t tell from your post what your read/write ratio looks like, but if it’s mostly reads then RAID-6 is your friend.
</p>
]]></content:encoded>
			<wfw:commentRSS>http://storageadvisors.adaptec.com/2007/03/21/databases-cages-and-raid-10-oh-my/feed/</wfw:commentRSS>
	</item>
		<item>
		<title>Fun with Disk Monitor</title>
		<link>http://storageadvisors.adaptec.com/2007/03/20/fun-with-disk-monitor/</link>
		<comments>http://storageadvisors.adaptec.com/2007/03/20/fun-with-disk-monitor/#comments</comments>
		<pubDate>Tue, 20 Mar 2007 14:18:43 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
		
	<category>Storage Interconnects &#038; RAID</category>
	<category>Advisor - Tom Treadway</category>
		<guid>http://storageadvisors.adaptec.com/2007/03/20/fun-with-disk-monitor/</guid>
		<description><![CDATA[Question to the Storage Advisors:  How many disk IOs does it take to write one file? I have a RAID10 array writing a 120KB file.  Is this one IO for the entire file, or is it one IO per sector on the disk/write cache (which would then be approx 240 IOs)?...]]></description>
			<content:encoded><![CDATA[	<p><strong>Question to the Storage Advisors, from Neal G.:  How many disk IOs does it take to write one file? I have a RAID10 array writing a 120KB file.  Is this one IO for the entire file, or is it one IO per sector on the disk/write cache (which would then be approx 240 IOs)?</strong></p>
	<p>Neal, it really depends on the operating system and how the application is writing the file.  Plus, besides the file write, there are plenty of other disk accesses such as directory, FAT or inode reads and writes.</p>
	<p>But your question made me curious, so I did a quick experiment with SysInternals&#8217; Disk Monitor utility which traps IO type (read vs write), transfer length, and disk number.  Then I simply copied a 120KB file from one drive to another, and captured the IOs to the destination drive.</p>
	<p>Using Windows XP Pro it was nice to see that the OS issued large writes - 64KB when possible.  So the 120KB file was written using just two IOs.  But I also saw about a dozen other short (16KB or less) reads and writes to other sections of the disk, which I assume are directory and FAT updates.</p>
	<p>Do a search for SysInternals and you will find Disk Monitor.  Microsoft bought them last year, but I believe their tools are still free.</p>
	<p>Enjoy,<br />
TT
</p>
]]></content:encoded>
			<wfw:commentRSS>http://storageadvisors.adaptec.com/2007/03/20/fun-with-disk-monitor/feed/</wfw:commentRSS>
	</item>
		<item>
		<title>Bottlenecks:  From disk to backbone</title>
		<link>http://storageadvisors.adaptec.com/2007/03/20/bottlenecks-from-disk-to-backbone/</link>
		<comments>http://storageadvisors.adaptec.com/2007/03/20/bottlenecks-from-disk-to-backbone/#comments</comments>
		<pubDate>Tue, 20 Mar 2007 13:43:15 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
		
	<category>Storage Interconnects &#038; RAID</category>
	<category>Advisor - Tom Treadway</category>
		<guid>http://storageadvisors.adaptec.com/2007/03/20/bottlenecks-from-disk-to-backbone/</guid>
		<description><![CDATA[Question to the Storage Advisors:  I just wonder, what kind of storage solution that can saturate 10Gb Ethernet backbone. If I assume I have a file server (of course equipped with 10GbE NIC on PCI-E 8x), can an external SAS/SATA-II JBOD box (using 4xwide SAS link) produce enough throughput (taking into account the hard disk speed and the link bandwidth) to saturate the network? If not, where will be the bottleneck?...]]></description>
			<content:encoded><![CDATA[	<p><strong>Question to the Storage Advisors, from Darwin:  I just wonder, what kind of storage solution that can saturate 10Gb Ethernet backbone. If I assume I have a file server (of course equipped with 10GbE NIC on PCI-E 8x), can an external SAS/SATA-II JBOD box (using 4xwide SAS link) produce enough throughput (taking into account the hard disk speed and the link bandwidth) to saturate the network? If not, where will be the bottleneck? Thanks.</strong></p>
	<p>Darwin, let&#8217;s break down the bottlenecks in this picture.</p>
	<p>First, each drive is capable of about 100MB/s from the media.  Some drives are faster and some are slower, but this is a good average number.</p>
	<p>Next, each SAS 3Gb link is capable of 300MB/s bit rate.  But once you apply overhead and encoding conversion, you&#8217;re down around 270MB/s - let&#8217;s say 250MB/s just to be safe.  So with a 4x link, you can get close to 1000MB/s.</p>
	<p>Now let&#8217;s look at your PCIe 8x bus.  Each 2.5GHz PCIe link can do 250MB/s.  I&#8217;ve seen efficiency estimates as low as 70%, so let&#8217;s say that each link can do 175MB/s.  Therefore an 8x bus can do 1400MB/s.  In fact PCIe is full duplex so each bus can actually be transferring in both directions for a total of 2800MB/s.  But this is all moot.  The SAS 4x bus is limited to 1000MB/s, so the PCIe 8x bus is overkill.</p>
	<p>Lastly, let&#8217;s look at your 10GbE link.  That link can probably burst at around 1GB/s, or 1000MB/s.  Coincidently, that almost exactly matches the SAS 4x throughput that we calculated earlier.</p>
	<p>So it would seem that your system is nicely balanced with 10 drives feeding a 10GbE link.  But of course it&#8217;s never that easy.</p>
	<p>First, can your disk controller hit the 1000MB/s mark?  That answer will depend a lot on whether you&#8217;re using RAID for protection against drive failure, and furthermore what type of RAID and what access pattern you&#8217;re using.  For example, if you&#8217;re doing random IO to the disks, then each drive is spending the majority of its time waiting for the head to seek or the media to rotate.  Using the 71 IOPS from the <a href="http://storageadvisors.adaptec.com/2007/03/20/sata-iops-measurement/">previous post</a> on SATA IOPS measurement, and assuming a 4KB transfer size, each drive is supplying less than 300KB/s.  Therefore it would take over 3000 drives to hit the magic 1000MB/s mark!  That&#8217;s not going to happen.</p>
	<p>Next, what is the overhead of the drivers and firmware sitting between the OS and the drives?  This number is probably around 10%, so it&#8217;s not too interesting.  We can move on.</p>
	<p>The next biggest problem, after drive access pattern, is &#8220;what is driving the GbE link&#8221;?  Is it iSCSI with a TOE to offload the TCP/IP processing?  You mention that this is a file server, so I assume that you&#8217;re just running NAS.  If so, you&#8217;ve got a ton of overhead in processing filesystem metadata.  You&#8217;re probably lucky to get 100-200MB/s.</p>
	<p>So you can probably see that you asked a very complex question with lots of unknown variables.  But hopefully all the simple factoids that I presented will help you find the bottlenecks in your system and tune the performance accordingly.</p>
	<p>Good luck.</p>
	<p>TT
</p>
]]></content:encoded>
			<wfw:commentRSS>http://storageadvisors.adaptec.com/2007/03/20/bottlenecks-from-disk-to-backbone/feed/</wfw:commentRSS>
	</item>
		<item>
		<title>SATA IOPS Measurement</title>
		<link>http://storageadvisors.adaptec.com/2007/03/20/sata-iops-measurement/</link>
		<comments>http://storageadvisors.adaptec.com/2007/03/20/sata-iops-measurement/#comments</comments>
		<pubDate>Tue, 20 Mar 2007 13:13:56 +0000</pubDate>
		<dc:creator>Tom</dc:creator>
		
	<category>Storage Interconnects &#038; RAID</category>
	<category>Advisor - Tom Treadway</category>
		<guid>http://storageadvisors.adaptec.com/2007/03/20/sata-iops-measurement/</guid>
		<description><![CDATA[Question to the Storage Advisors:  I've heard great things about SATA based DASD and JBOD devicess.  I've found tons of information about the data transfer rate, but haven't been able to find any hard data on how many sustained and bursted IOPS such systems can handle...]]></description>
			<content:encoded><![CDATA[	<p><strong>Question to the Storage Advisors, from Michael K.:  I&#8217;ve heard great things about SATA based DASD and JBOD devicess.  I&#8217;ve found tons of information about the data transfer rate, but haven&#8217;t been able to find any hard data on how many sustained and bursted IOPS such systems can handle.</strong></p>
	<p>Michael, IOPS are very dependent on the access pattern.  For this reason it&#8217;s normal to measure the &#8220;worse case&#8221; IOPS which are acheived on short random IOs.  The IOPS can be calculated by simply adding the time to do an average seek to the time for a half rotation.  Note that the transfer rate doesn&#8217;t come into play because it&#8217;s so insignificant compared to the seek and rotation time.</p>
	<p>For example, the 500GB Western Digital SE16 SATA drive (WD5000KS) has a rotational rate of 7200 RPM.  By inverting that number you get one rotation per 8.4ms.  Therefore a half rotation is 4.2ms.  This is also noted on the spreadsheet as &#8220;Average Latency&#8221;.</p>
	<p>Next, the average seek times for reads are listed as 8.9ms, while writes are 10.9ms.  That difference may seem odd at first, but drives eek out a little extra performance on reads by attempting to read the data before the head has settled on the track (and that settling for writes takes ~2ms).  So, let&#8217;s say that you have an even mix of reads and writes, and therefore we&#8217;ll assume the average seek time is 9.9ms.</p>
	<p>Adding 4.2ms and 9.9ms gives has an average random IO latency of 14.1ms.  Since the drive can process, i.e., access the media, with just one command at a time, you can invert 14.1ms to get 71 IOPS.</p>
	<p>You can start playing games to increase the IOPS - such as shorting the seeks by accessing a smaller section of the disk, or making the IO sequential so that requests are serviced from the drive cache.  With these tricks you can get the IOPS to exceed 100,000!</p>
	<p>Now you can see that it&#8217;s difficult to measure and specify IOPS.  The number is somewhere between 71 and 100K.  For this reason it&#8217;s normal to use the worse case, random access pattern value of ~71.</p>
	<p>TT
</p>
]]></content:encoded>
			<wfw:commentRSS>http://storageadvisors.adaptec.com/2007/03/20/sata-iops-measurement/feed/</wfw:commentRSS>
	</item>
	</channel>
</rss>
