Thread: Contemplating SSD Hardware RAID
I'm looking for advice from the I/O gurus who have been in the SSD game for a while now. I understand that the majority of consumer grade SSD drives lack the required capacitor to complete a write on a sudden power loss. But, what about pairing up with a hardware controller with BBU write cache? Can the write cache be disabled at the drive and result in a safe setup? I'm exploring the combination of an Areca 1880ix-12 controller with 6x OCZ Vertex 3 V3LT-25SAT3 2.5" 240GB SATA III drives in RAID-10. Has anyone tried this combination? What nasty surprise am I overlooking here? Thanks -Dan
On 06/20/2011 11:54 PM, Dan Harris wrote: > I understand that the majority of consumer grade SSD drives lack the > required capacitor to complete a write on a sudden power loss. But, > what about pairing up with a hardware controller with BBU write > cache? Can the write cache be disabled at the drive and result in a > safe setup? Sometimes, but not always, and you'll be playing a risky and unpredictable game to try it. See http://wiki.postgresql.org/wiki/Reliable_Writes for some anecdotes. And even if the reliability works out, you'll kill the expected longevity and performance of the drive. > I'm exploring the combination of an Areca 1880ix-12 controller with 6x > OCZ Vertex 3 V3LT-25SAT3 2.5" 240GB SATA III drives in RAID-10. Has > anyone tried this combination? What nasty surprise am I overlooking > here? You can expect database corruption the first time something unexpected interrupts the power to the server. That's nasty, but it's not surprising--that's well documented as what happens when you run PostreSQL on hardware with this feature set. You have to get a Vertex 3 Pro to get one of the reliable 3rd gen designs from them with a supercap. (I don't think those are even out yet though) We've had reports here of the earlier Vertex 2 Pro being fully stress tested and working out well. I wouldn't even bother with a regular Vertex 3, because I don't see any reason to believe it could be reliable for database use, just like the Vertex 2 failed to work in that role. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
On 2011-06-21 08:33, Greg Smith wrote: > On 06/20/2011 11:54 PM, Dan Harris wrote: > >> I'm exploring the combination of an Areca 1880ix-12 controller with >> 6x OCZ Vertex 3 V3LT-25SAT3 2.5" 240GB SATA III drives in RAID-10. >> Has anyone tried this combination? What nasty surprise am I >> overlooking here? > > You can expect database corruption the first time something unexpected > interrupts the power to the server. That's nasty, but it's not > surprising--that's well documented as what happens when you run > PostreSQL on hardware with this feature set. You have to get a Vertex > 3 Pro to get one of the reliable 3rd gen designs from them with a > supercap. (I don't think those are even out yet though) We've had > reports here of the earlier Vertex 2 Pro being fully stress tested and > working out well. I wouldn't even bother with a regular Vertex 3, > because I don't see any reason to believe it could be reliable for > database use, just like the Vertex 2 failed to work in that role. > I've tested both the Vertex 2, Vertex 2 Pro and Vertex 3. The vertex 3 pro is not yet available. The vertex 3 I tested with pgbench didn't outperform the vertex 2 (yes, it was attached to a SATA III port). Also, the vertex 3 didn't work in my designated system until a firmware upgrade that came available ~2.5 months after I purchased it. The support call I had with OCZ failed to mention it, and by pure coincidende when I did some more testing at a later time, I ran the firmware upgrade tool (that kind of hides which firmwares are available, if any) and it did an update, after that it was compatible with the designated motherboard. Another disappointment was that after I had purchased the Vertex 3 drive, OCZ announced a max-iops vertex 3. Did that actually mean I bought an inferior version? Talking about a bad out-of-the-box experience. -1 ocz fan boy. When putting such a SSD up for database use I'd only consider a vertex 2 pro (for the supercap), paired with another SSD of a different brand with supercap (i.e. the recent intels). When this is done on a motherboard with > 1 sata controller, you'd have controller redundancy and can also survive single drive failures when a drive wears out. Having two different SSD versions decreases the chance of both wearing out the same time, and make you a bit more resilient against firmware bugs. It would be great if there was yet another supercapped SSD brand, with a modified md software raid that reads all three drives at once and compares results, instead of the occasional check. If at least two drives agree on the contents, return the data. -- Yeb Havinga http://www.mgrid.net/ Mastering Medical Data
On 2011-06-21 09:51, Yeb Havinga wrote: > On 2011-06-21 08:33, Greg Smith wrote: >> On 06/20/2011 11:54 PM, Dan Harris wrote: >> >>> I'm exploring the combination of an Areca 1880ix-12 controller with >>> 6x OCZ Vertex 3 V3LT-25SAT3 2.5" 240GB SATA III drives in RAID-10. >>> Has anyone tried this combination? What nasty surprise am I >>> overlooking here? I forgot to mention that with an SSD it's important to watch the remaining lifetime. These values can be read with smartctl. When putting the disk behind a hardware raid controller, you might not be able to read them from the OS, and the hardware RAID firmware might be to old to not know about the SSD lifetime indicator or not even show it. -- Yeb Havinga http://www.mgrid.net/ Mastering Medical Data
* Yeb Havinga: > I forgot to mention that with an SSD it's important to watch the > remaining lifetime. These values can be read with smartctl. When > putting the disk behind a hardware raid controller, you might not be > able to read them from the OS, and the hardware RAID firmware might be > to old to not know about the SSD lifetime indicator or not even show > it. 3ware controllers offer SMART pass-through, and smartctl supports it. I'm sure there's something similar for Areca controllers. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
On 06/21/2011 07:19 AM, Florian Weimer wrote: > 3ware controllers offer SMART pass-through, and smartctl supports it. > I'm sure there's something similar for Areca controllers. > Depends on the model, drives, and how you access the management interface. For both manufacturers actually. Check out http://notemagnet.blogspot.com/2008/08/linux-disk-failures-areca-is-not-so.html for example. There I talk about problems with a specific Areca controller, as well as noting in a comment at the end that there are limitations with 3ware supporting not supporting SMART reports against SAS drives. Part of the whole evaluation chain for new server hardware, especially for SSD, needs to be a look at what SMART data you can get. Yeb, I'd be curious to get more details about what you've been seeing here if you can share it. You have more different models around than I have access to, especially the OCZ ones which I can't get my clients to consider still. (Their concerns about compatibility and support from a relatively small vendor are not completely unfounded) -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
Am Dienstag, 21. Juni 2011 05:54:26 schrieb Dan Harris: > I'm looking for advice from the I/O gurus who have been in the SSD game > for a while now. > > I understand that the majority of consumer grade SSD drives lack the > required capacitor to complete a write on a sudden power loss. But, > what about pairing up with a hardware controller with BBU write cache? > Can the write cache be disabled at the drive and result in a safe setup? > > I'm exploring the combination of an Areca 1880ix-12 controller with 6x > OCZ Vertex 3 V3LT-25SAT3 2.5" 240GB SATA III drives in RAID-10. Has > anyone tried this combination? What nasty surprise am I overlooking here? > > Thanks > -Dan Wont work. period. long story: the loss of the write in the ssd cache is substantial. You will loss perhaps the whole system. I have tested since 2006 ssd - adtron 2GB for 1200 Euro at first ... i can only advice to use a enterprise ready ssd. candidates: intel new series , sandforce pro discs. i tried to submit a call at apc to construct a device thats similar to a buffered drive frame (a capacitor holds up the 5 V since cache is written back) , but they have not answered. so no luck in using mainstream ssd for the job. loss of the cache - or for mainstream sandforce the connection - will result in loss of changed frames (i.e 16 Mbytes of data per frame) in ssd. if this is the root of your filesystem - forget the disk. btw.: since 2 years i have tested 16 discs for speed only. i sell the disc after the test. i got 6 returns for failure within those 2 years - its really happening to the mainstream discs. -- Mit freundlichen Grüssen Anton Rommerskirchen
On 2011-06-21 17:11, Greg Smith wrote: > On 06/21/2011 07:19 AM, Florian Weimer wrote: >> 3ware controllers offer SMART pass-through, and smartctl supports it. >> I'm sure there's something similar for Areca controllers. > > Depends on the model, drives, and how you access the management > interface. For both manufacturers actually. Check out > http://notemagnet.blogspot.com/2008/08/linux-disk-failures-areca-is-not-so.html > for example. There I talk about problems with a specific Areca > controller, as well as noting in a comment at the end that there are > limitations with 3ware supporting not supporting SMART reports against > SAS drives. > > Part of the whole evaluation chain for new server hardware, especially > for SSD, needs to be a look at what SMART data you can get. Yeb, I'd > be curious to get more details about what you've been seeing here if > you can share it. You have more different models around than I have > access to, especially the OCZ ones which I can't get my clients to > consider still. (Their concerns about compatibility and support from > a relatively small vendor are not completely unfounded) > This is what a windows OCZ tool explains about the different smart values (excuse for no mark up) for a Vertex 2 Pro. SMART READ DATA Revision: 10 Attributes List 1: SSD Raw Read Error Rate Normalized Rate: 120 total ECC and RAISE errors 5: SSD Retired Block Count Reserve blocks remaining: 100% 9: SSD Power-On Hours Total hours power on: 451 12: SSD Power Cycle Count Count of power on/off cycles: 61 13: SSD Soft Read Error Rate Normalized Rate: 120 100: SSD GBytes Erased Flash memory erases across the entire drive: 128 GB 170: SSD Number of Remaining Spares Number of reserve Flash memory blocks: 17417 171: SSD Program Fail Count Total number of Flash program operation failures: 0 172: SSD Erase Fail Count Total number of Flash erase operation failures: 0 174: SSD Unexpected power loss count Total number of unexpected power loss: 13 177: SSD Wear Range Delta Delta between most-worn and least-worn Flash blocks: 0 181: SSD Program Fail Count Total number of Flash program operation failures: 0 182: SSD Erase Fail Count Total number of Flash erase operation failures: 0 184: SSD End to End Error Detection I/O errors detected during reads from flash memory: 0 187: SSD Reported Uncorrectable Errors Uncorrectable RAISE errors reported to the host for all data access: 0 194: SSD Temperature Monitoring Current: 26 High: 37 Low: 0 195: SSD ECC On-the-fly Count Normalized Rate: 120 196: SSD Reallocation Event Count Total number of reallocated Flash blocks: 0 198: SSD Uncorrectable Sector Count Total number of uncorrectable errors when reading/writing a sector: 0 199: SSD SATA R-Errors Error Count Current SATA RError count: 0 201: SSD Uncorrectable Soft Read Error Rate Normalized Rate: 120 204: SSD Soft ECC Correction Rate (RAISE) Normalized Rate: 120 230: SSD Life Curve Status Current state of drive operation based upon the Life Curve: 100 231: SSD Life Left Approximate SDD life Remaining: 99% 232: SSD Available Reserved Space Amount of Flash memory space in reserve (GB): 17 235: SSD Supercap Health Condition of an external SuperCapacitor Health in mSec: 0 241: SSD Lifetime writes from host Number of bytes written to SSD: 448 GB 242: SSD Lifetime reads from host Number of bytes read from SSD: 192 GB Same tool for a Vertex 3 (not pro) SMART READ DATA Revision: 10 Attributes List 1: SSD Raw Read Error Rate Normalized Rate: 120 total ECC and RAISE errors 5: SSD Retired Block Count Reserve blocks remaining: 100% 9: SSD Power-On Hours Total hours power on: 7 12: SSD Power Cycle Count Count of power on/off cycles: 13 171: SSD Program Fail Count Total number of Flash program operation failures: 0 172: SSD Erase Fail Count Total number of Flash erase operation failures: 0 174: SSD Unexpected power loss count Total number of unexpected power loss: 10 177: SSD Wear Range Delta Delta between most-worn and least-worn Flash blocks: 0 181: SSD Program Fail Count Total number of Flash program operation failures: 0 182: SSD Erase Fail Count Total number of Flash erase operation failures: 0 187: SSD Reported Uncorrectable Errors Uncorrectable RAISE errors reported to the host for all data access: 0 194: SSD Temperature Monitoring Current: 128 High: 129 Low: 127 195: SSD ECC On-the-fly Count Normalized Rate: 100 196: SSD Reallocation Event Count Total number of reallocated Flash blocks: 0 201: SSD Uncorrectable Soft Read Error Rate Normalized Rate: 100 204: SSD Soft ECC Correction Rate (RAISE) Normalized Rate: 100 230: SSD Life Curve Status Current state of drive operation based upon the Life Curve: 100 231: SSD Life Left Approximate SDD life Remaining: 100% 241: SSD Lifetime writes from host Number of bytes written to SSD: 162 GB 242: SSD Lifetime reads from host Number of bytes read from SSD: 236 GB There's some info burried in http://archives.postgresql.org/pgsql-performance/2011-03/msg00350.php where two Vertex 2 pro's are compared; the first has been really hammered with pgbench, the second had a few months duty in a workstation. The raw value of SSD Available Reserved Space seems to be a good candidate to watch to go to 0, since the pgbenched-drive has 16GB left and the workstation disk 17GB. Would be cool to graph with e.g. symon (http://i.imgur.com/T4NAq.png) -- Yeb Havinga http://www.mgrid.net/ Mastering Medical Data
On 2011-06-21 22:10, Yeb Havinga wrote: > > > There's some info burried in > http://archives.postgresql.org/pgsql-performance/2011-03/msg00350.php > where two Vertex 2 pro's are compared; the first has been really > hammered with pgbench, the second had a few months duty in a > workstation. The raw value of SSD Available Reserved Space seems to be > a good candidate to watch to go to 0, since the pgbenched-drive has > 16GB left and the workstation disk 17GB. Would be cool to graph with > e.g. symon (http://i.imgur.com/T4NAq.png) > I forgot to mention that both newest firmware of the drives as well as svn versions of smartmontools are advisable, before figuring out what all those strange values mean. It's too bad however that OCZ doesn't let the user choose which firmware to run (the tool always picks the newest), so after every upgrade it'll be a surprise what values are supported or if any of the values are reset or differently interpreted. Even when disks in production might not be upgraded eagerly, replacing a faulty drive means that one probably needs to be upgraded first and it would be nice to have a uniform smart value readout for the monitoring tools. -- Yeb Havinga http://www.mgrid.net/ Mastering Medical Data
On Tue, Jun 21, 2011 at 2:25 PM, Yeb Havinga <yebhavinga@gmail.com> wrote: > strange values mean. It's too bad however that OCZ doesn't let the user > choose which firmware to run (the tool always picks the newest), so after > every upgrade it'll be a surprise what values are supported or if any of the That right there pretty much eliminates them from consideration for enterprise applications.
On Tue, Jun 21, 2011 at 3:32 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Tue, Jun 21, 2011 at 2:25 PM, Yeb Havinga <yebhavinga@gmail.com> wrote: > >> strange values mean. It's too bad however that OCZ doesn't let the user >> choose which firmware to run (the tool always picks the newest), so after >> every upgrade it'll be a surprise what values are supported or if any of the > > That right there pretty much eliminates them from consideration for > enterprise applications. As much as I've been irritated with Intel for being intentionally oblique on the write caching issue -- I think they remain more or less the only game in town for enterprise use. The x25-e has been the only drive up until recently to seriously consider for write heavy applications (and Greg is pretty skeptical about that even). I have directly observed vertex pro drives burning out in ~ 18 months in constant duty applications (which if you did the math is about right on schedule) -- not good enough IMO. ISTM Intel is clearly positioning the 710 Lyndonville as the main drive in database environments to go with for most cases. At 3300 IOPS (see http://www.anandtech.com/show/4452/intel-710-and-720-ssd-specifications) and some tinkering that results in 65 times greater longevity than standard MLC, I expect the drive will be a huge hit as long as can sustain those numbers writing durably and it comes it at under the 10$/gb price point. merlin
On 06/21/2011 05:35 PM, Merlin Moncure wrote: > On Tue, Jun 21, 2011 at 3:32 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote: > >> On Tue, Jun 21, 2011 at 2:25 PM, Yeb Havinga<yebhavinga@gmail.com> wrote: >> >> >>> It's too bad however that OCZ doesn't let the user >>> choose which firmware to run (the tool always picks the newest), so after >>> every upgrade it'll be a surprise what values are supported or if any of the >>> >> That right there pretty much eliminates them from consideration for >> enterprise applications. >> > As much as I've been irritated with Intel for being intentionally > oblique on the write caching issue -- I think they remain more or less > the only game in town for enterprise use. That's at the core of why I have been so consistently cranky about them. The sort of customers I deal with who are willing to spend money on banks of SSD will buy Intel, and the "Enterprise" feature set seems completely enough that it doesn't set off any alarms to them. The same is not true of OCZ, which unfortunately means I never even get them onto the vendor grid in the first place. Everybody runs out to buy the Intel units instead, they get burned by the write cache issues, lose data, and sometimes they even blame PostgreSQL for it. I have a customer who has around 50 X25-E drives, a little stack of them in six servers running two similar databases. They each run about a terabyte, and refill about every four months (old data eventually ages out, replaced by new). At the point I started working with them, they had lost the entire recent history twice--terabyte gone, whoosh!--because the power reliability is poor in their area. And network connectivity is bad enough that they can't ship this volume of updates to elsewhere either. It happened again last month, and for the first time the database was recoverable. I converted one server to be a cold spare, just archive the WAL files. And that's the only one that lived through the nasty power spike+outage that corrupted the active databases on both the master and the warm standby of each set. All four of the servers where PostgreSQL was writing data and expected proper fsync guarantees, all gone from one power issue. At the point I got involved, they were about to cancel this entire PostgreSQL experiment because they assumed the database had to be garbage that this kept happening; until I told them about this known issue they never considered the drives were the problem. That's what I think of when people ask me about the Intel X25-E. I've very happy with the little 3rd generation consumer grade SSD I bought from Intel though (320 series). If they just do the same style of write cache and reliability rework to the enterprise line, but using better flash, I agree that the first really serious yet affordable product for the database market may finally come out of that. We're just not there yet, and unfortunately for the person who started this round of discussion throwing hardware RAID at the problem doesn't make this go away either. -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
On 06/21/2011 05:17 PM, Greg Smith wrote: > If they just do the same style of write cache and reliability rework > to the enterprise line, but using better flash, I agree that the > first really serious yet affordable product for the database market > may finally come out of that. After we started our research in this area and finally settled on FusionIO PCI cards (which survived several controlled and uncontrolled failures completely intact), a consultant tried telling us he could build us a cage of SSDs for much cheaper, and with better performance. Once I'd stopped laughing, I quickly shooed him away. One of the reasons the PCI cards do so well is that they operate in a directly memory-addressable manner, and always include capacitors. You lose some overhead due to the CPU running the driver, and you can't boot off of them, but they're leagues ahead in terms of safety. But like you said, they're certainly not what most people would call affordable. 640GB for two orders of magnitude more than an equivalent hard drive would cost? Ouch. Most companies are familiar---and hence comfortable---with RAIDs of various flavors, so they see SSD performance numbers and think to themselves "What if that were in a RAID?" Right now, drives aren't quite there yet, or the ones that are cost more than most want to spend. It's a shame, really. But I'm willing to wait it out for now. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@peak6.com ______________________________________________ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email