Thread: Contemplating SSD Hardware RAID

Contemplating SSD Hardware RAID

From

Dan Harris

Date:

21 June 2011, 01:04:26

I'm looking for advice from the I/O gurus who have been in the SSD game
for a while now.

I understand that the majority of consumer grade SSD drives lack the
required capacitor to complete a write on a sudden power loss.  But,
what about pairing up with a hardware controller with BBU write cache?
Can the write cache be disabled at the drive and result in a safe setup?

I'm exploring the combination of an Areca 1880ix-12 controller with 6x
OCZ Vertex 3 V3LT-25SAT3 2.5" 240GB SATA III drives in RAID-10.  Has
anyone tried this combination?  What nasty surprise am I overlooking here?

Thanks
-Dan

Re: Contemplating SSD Hardware RAID

From

Greg Smith

Date:

21 June 2011, 03:33:49

On 06/20/2011 11:54 PM, Dan Harris wrote:
> I understand that the majority of consumer grade SSD drives lack the
> required capacitor to complete a write on a sudden power loss.  But,
> what about pairing up with a hardware controller with BBU write
> cache?  Can the write cache be disabled at the drive and result in a
> safe setup?

Sometimes, but not always, and you'll be playing a risky and
unpredictable game to try it.  See
http://wiki.postgresql.org/wiki/Reliable_Writes for some anecdotes.  And
even if the reliability works out, you'll kill the expected longevity
and performance of the drive.

> I'm exploring the combination of an Areca 1880ix-12 controller with 6x
> OCZ Vertex 3 V3LT-25SAT3 2.5" 240GB SATA III drives in RAID-10.  Has
> anyone tried this combination?  What nasty surprise am I overlooking
> here?

You can expect database corruption the first time something unexpected
interrupts the power to the server.  That's nasty, but it's not
surprising--that's well documented as what happens when you run
PostreSQL on hardware with this feature set.  You have to get a Vertex 3
Pro to get one of the reliable 3rd gen designs from them with a
supercap.  (I don't think those are even out yet though)  We've had
reports here of the earlier Vertex 2 Pro being fully stress tested and
working out well.  I wouldn't even bother with a regular Vertex 3,
because I don't see any reason to believe it could be reliable for
database use, just like the Vertex 2 failed to work in that role.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: Contemplating SSD Hardware RAID

From

Yeb Havinga

Date:

21 June 2011, 04:52:19

On 2011-06-21 08:33, Greg Smith wrote:
> On 06/20/2011 11:54 PM, Dan Harris wrote:
>
>> I'm exploring the combination of an Areca 1880ix-12 controller with
>> 6x OCZ Vertex 3 V3LT-25SAT3 2.5" 240GB SATA III drives in RAID-10.
>> Has anyone tried this combination?  What nasty surprise am I
>> overlooking here?
>
> You can expect database corruption the first time something unexpected
> interrupts the power to the server.  That's nasty, but it's not
> surprising--that's well documented as what happens when you run
> PostreSQL on hardware with this feature set.  You have to get a Vertex
> 3 Pro to get one of the reliable 3rd gen designs from them with a
> supercap.  (I don't think those are even out yet though)  We've had
> reports here of the earlier Vertex 2 Pro being fully stress tested and
> working out well.  I wouldn't even bother with a regular Vertex 3,
> because I don't see any reason to believe it could be reliable for
> database use, just like the Vertex 2 failed to work in that role.
>

I've tested both the Vertex 2, Vertex 2 Pro and Vertex 3. The vertex 3
pro is not yet available. The vertex 3 I tested with pgbench didn't
outperform the vertex 2 (yes, it was attached to a SATA III port). Also,
the vertex 3 didn't work in my designated system until a firmware
upgrade that came available ~2.5 months after I purchased it. The
support call I had with OCZ failed to mention it, and by pure
coincidende when I did some more testing at a later time, I ran the
firmware upgrade tool (that kind of hides which firmwares are available,
if any) and it did an update, after that it was compatible with the
designated motherboard.

Another disappointment was that after I had purchased the Vertex 3
drive, OCZ announced a max-iops vertex 3. Did that actually mean I
bought an inferior version? Talking about a bad out-of-the-box
experience. -1 ocz fan boy.

When putting such a SSD up for database use I'd only consider a vertex 2
pro (for the supercap), paired with another SSD of a different brand
with supercap (i.e. the recent intels). When this is done on a
motherboard with > 1 sata controller, you'd have controller redundancy
and can also survive single drive failures when a drive wears out.
Having two different SSD versions decreases the chance of both wearing
out the same time, and make you a bit more resilient against firmware
bugs. It would be great if there was yet another supercapped SSD brand,
with a modified md software raid that reads all three drives at once and
compares results, instead of the occasional check. If at least two
drives agree on the contents, return the data.

--
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data

Re: Contemplating SSD Hardware RAID

From

Yeb Havinga

Date:

21 June 2011, 04:55:59

On 2011-06-21 09:51, Yeb Havinga wrote:
> On 2011-06-21 08:33, Greg Smith wrote:
>> On 06/20/2011 11:54 PM, Dan Harris wrote:
>>
>>> I'm exploring the combination of an Areca 1880ix-12 controller with
>>> 6x OCZ Vertex 3 V3LT-25SAT3 2.5" 240GB SATA III drives in RAID-10.
>>> Has anyone tried this combination?  What nasty surprise am I
>>> overlooking here?

I forgot to mention that with an SSD it's important to watch the
remaining lifetime. These values can be read with smartctl. When putting
the disk behind a hardware raid controller, you might not be able to
read them from the OS, and the hardware RAID firmware might be to old to
not know about the SSD lifetime indicator or not even show it.

--
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data

Re: Contemplating SSD Hardware RAID

From

Florian Weimer

Date:

21 June 2011, 08:19:52

* Yeb Havinga:

> I forgot to mention that with an SSD it's important to watch the
> remaining lifetime. These values can be read with smartctl. When
> putting the disk behind a hardware raid controller, you might not be
> able to read them from the OS, and the hardware RAID firmware might be
> to old to not know about the SSD lifetime indicator or not even show
> it.

3ware controllers offer SMART pass-through, and smartctl supports it.
I'm sure there's something similar for Areca controllers.

--
Florian Weimer                <fweimer@bfk.de>
BFK edv-consulting GmbH       http://www.bfk.de/
Kriegsstraße 100              tel: +49-721-96201-1
D-76133 Karlsruhe             fax: +49-721-96201-99

Re: Contemplating SSD Hardware RAID

From

Greg Smith

Date:

21 June 2011, 12:12:06

On 06/21/2011 07:19 AM, Florian Weimer wrote:
> 3ware controllers offer SMART pass-through, and smartctl supports it.
> I'm sure there's something similar for Areca controllers.
>

Depends on the model, drives, and how you access the management
interface.  For both manufacturers actually.  Check out
http://notemagnet.blogspot.com/2008/08/linux-disk-failures-areca-is-not-so.html
for example.  There I talk about problems with a specific Areca
controller, as well as noting in a comment at the end that there are
limitations with 3ware supporting not supporting SMART reports against
SAS drives.

Part of the whole evaluation chain for new server hardware, especially
for SSD, needs to be a look at what SMART data you can get.  Yeb, I'd be
curious to get more details about what you've been seeing here if you
can share it.  You have more different models around than I have access
to, especially the OCZ ones which I can't get my clients to consider
still.  (Their concerns about compatibility and support from a
relatively small vendor are not completely unfounded)

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: Contemplating SSD Hardware RAID

From

Anton Rommerskirchen

Date:

21 June 2011, 15:39:57

Am Dienstag, 21. Juni 2011 05:54:26 schrieb Dan Harris:
> I'm looking for advice from the I/O gurus who have been in the SSD game
> for a while now.
>
> I understand that the majority of consumer grade SSD drives lack the
> required capacitor to complete a write on a sudden power loss.  But,
> what about pairing up with a hardware controller with BBU write cache?
> Can the write cache be disabled at the drive and result in a safe setup?
>
> I'm exploring the combination of an Areca 1880ix-12 controller with 6x
> OCZ Vertex 3 V3LT-25SAT3 2.5" 240GB SATA III drives in RAID-10.  Has
> anyone tried this combination?  What nasty surprise am I overlooking here?
>
> Thanks
> -Dan

Wont work.

period.

long story: the loss of the write in the ssd cache is substantial.

You will loss perhaps the whole system.

I have tested since 2006 ssd - adtron 2GB for 1200 Euro at first ...

i can only advice to use a enterprise ready ssd.

candidates: intel new series , sandforce pro discs.

i tried to submit a call at apc to construct a device thats similar to a
buffered drive frame (a capacitor holds up the 5 V since cache is written
back) , but they have not answered. so no luck in using mainstream ssd for
the job.

loss of the cache - or for mainstream sandforce the connection - will result
in loss of changed frames (i.e 16 Mbytes of data per frame) in ssd.

if this is the root of your filesystem - forget the disk.

btw.: since 2 years i have tested 16 discs for speed only. i sell the disc
after the test. i got 6 returns for failure within those 2 years - its really
happening to the mainstream discs.

--
Mit freundlichen Grüssen
Anton Rommerskirchen

Re: Contemplating SSD Hardware RAID

From

Yeb Havinga

Date:

21 June 2011, 17:11:06

On 2011-06-21 17:11, Greg Smith wrote:
> On 06/21/2011 07:19 AM, Florian Weimer wrote:
>> 3ware controllers offer SMART pass-through, and smartctl supports it.
>> I'm sure there's something similar for Areca controllers.
>
> Depends on the model, drives, and how you access the management
> interface.  For both manufacturers actually.  Check out
> http://notemagnet.blogspot.com/2008/08/linux-disk-failures-areca-is-not-so.html
> for example.  There I talk about problems with a specific Areca
> controller, as well as noting in a comment at the end that there are
> limitations with 3ware supporting not supporting SMART reports against
> SAS drives.
>
> Part of the whole evaluation chain for new server hardware, especially
> for SSD, needs to be a look at what SMART data you can get.  Yeb, I'd
> be curious to get more details about what you've been seeing here if
> you can share it.  You have more different models around than I have
> access to, especially the OCZ ones which I can't get my clients to
> consider still.  (Their concerns about compatibility and support from
> a relatively small vendor are not completely unfounded)
>

This is what a windows OCZ tool explains about the different smart
values (excuse for no mark up) for a Vertex 2 Pro.

SMART READ DATA
     Revision: 10
     Attributes List
       1: SSD Raw Read Error Rate                Normalized Rate: 120
total ECC and RAISE errors
       5: SSD Retired Block Count                Reserve blocks
remaining: 100%
       9: SSD Power-On Hours                     Total hours power on: 451
      12: SSD Power Cycle Count                  Count of power on/off
cycles: 61
      13: SSD Soft Read Error Rate               Normalized Rate: 120
     100: SSD GBytes Erased                      Flash memory erases
across the entire drive: 128 GB
     170: SSD Number of Remaining Spares         Number of reserve Flash
memory blocks: 17417
     171: SSD Program Fail Count                 Total number of Flash
program operation failures: 0
     172: SSD Erase Fail Count                   Total number of Flash
erase operation failures: 0
     174: SSD Unexpected power loss count        Total number of
unexpected power loss: 13
     177: SSD Wear Range Delta                   Delta between most-worn
and least-worn Flash blocks: 0
     181: SSD Program Fail Count                 Total number of Flash
program operation failures: 0
     182: SSD Erase Fail Count                   Total number of Flash
erase operation failures: 0
     184: SSD End to End Error Detection         I/O errors detected
during reads from flash memory: 0
     187: SSD Reported Uncorrectable Errors      Uncorrectable RAISE
errors reported to the host for all data access: 0
     194: SSD Temperature Monitoring             Current: 26  High: 37
Low: 0
     195: SSD ECC On-the-fly Count               Normalized Rate: 120
     196: SSD Reallocation Event Count           Total number of
reallocated Flash blocks: 0
     198: SSD Uncorrectable Sector Count         Total number of
uncorrectable errors when reading/writing a sector: 0
     199: SSD SATA R-Errors Error Count          Current SATA RError
count: 0
     201: SSD Uncorrectable Soft Read Error Rate Normalized Rate: 120
     204: SSD Soft ECC Correction Rate (RAISE)   Normalized Rate: 120
     230: SSD Life Curve Status                  Current state of drive
operation based upon the Life Curve: 100
     231: SSD Life Left                          Approximate SDD life
Remaining: 99%
     232: SSD Available Reserved Space           Amount of Flash memory
space in reserve (GB): 17
     235: SSD Supercap Health                    Condition of an
external SuperCapacitor Health in mSec: 0
     241: SSD Lifetime writes from host          Number of bytes written
to SSD: 448 GB
     242: SSD Lifetime reads from host           Number of bytes read
from SSD: 192 GB

Same tool for a Vertex 3 (not pro)

SMART READ DATA
     Revision: 10
     Attributes List
       1: SSD Raw Read Error Rate                Normalized Rate: 120
total ECC and RAISE errors
       5: SSD Retired Block Count                Reserve blocks
remaining: 100%
       9: SSD Power-On Hours                     Total hours power on: 7
      12: SSD Power Cycle Count                  Count of power on/off
cycles: 13
     171: SSD Program Fail Count                 Total number of Flash
program operation failures: 0
     172: SSD Erase Fail Count                   Total number of Flash
erase operation failures: 0
     174: SSD Unexpected power loss count        Total number of
unexpected power loss: 10
     177: SSD Wear Range Delta                   Delta between most-worn
and least-worn Flash blocks: 0
     181: SSD Program Fail Count                 Total number of Flash
program operation failures: 0
     182: SSD Erase Fail Count                   Total number of Flash
erase operation failures: 0
     187: SSD Reported Uncorrectable Errors      Uncorrectable RAISE
errors reported to the host for all data access: 0
     194: SSD Temperature Monitoring             Current: 128  High: 129
Low: 127
     195: SSD ECC On-the-fly Count               Normalized Rate: 100
     196: SSD Reallocation Event Count           Total number of
reallocated Flash blocks: 0
     201: SSD Uncorrectable Soft Read Error Rate Normalized Rate: 100
     204: SSD Soft ECC Correction Rate (RAISE)   Normalized Rate: 100
     230: SSD Life Curve Status                  Current state of drive
operation based upon the Life Curve: 100
     231: SSD Life Left                          Approximate SDD life
Remaining: 100%
     241: SSD Lifetime writes from host          Number of bytes written
to SSD: 162 GB
     242: SSD Lifetime reads from host           Number of bytes read
from SSD: 236 GB


There's some info burried in
http://archives.postgresql.org/pgsql-performance/2011-03/msg00350.php
where two Vertex 2 pro's are compared; the first has been really
hammered with pgbench, the second had a few months duty in a
workstation. The raw value of SSD Available Reserved Space seems to be a
good candidate to watch to go to 0, since the pgbenched-drive has 16GB
left and the workstation disk 17GB. Would be cool to graph with e.g.
symon (http://i.imgur.com/T4NAq.png)

--
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data

Re: Contemplating SSD Hardware RAID

From

Yeb Havinga

Date:

21 June 2011, 17:26:35

On 2011-06-21 22:10, Yeb Havinga wrote:
>
>
> There's some info burried in
> http://archives.postgresql.org/pgsql-performance/2011-03/msg00350.php
> where two Vertex 2 pro's are compared; the first has been really
> hammered with pgbench, the second had a few months duty in a
> workstation. The raw value of SSD Available Reserved Space seems to be
> a good candidate to watch to go to 0, since the pgbenched-drive has
> 16GB left and the workstation disk 17GB. Would be cool to graph with
> e.g. symon (http://i.imgur.com/T4NAq.png)
>

I forgot to mention that both newest firmware of the drives as well as
svn versions of smartmontools are advisable, before figuring out what
all those strange values mean. It's too bad however that OCZ doesn't let
the user choose which firmware to run (the tool always picks the
newest), so after every upgrade it'll be a surprise what values are
supported or if any of the values are reset or differently interpreted.
Even when disks in production might not be upgraded eagerly, replacing a
faulty drive means that one probably needs to be upgraded first and it
would be nice to have a uniform smart value readout for the monitoring
tools.

--
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data

Re: Contemplating SSD Hardware RAID

From

Scott Marlowe

Date:

21 June 2011, 17:32:21

On Tue, Jun 21, 2011 at 2:25 PM, Yeb Havinga <yebhavinga@gmail.com> wrote:

> strange values mean. It's too bad however that OCZ doesn't let the user
> choose which firmware to run (the tool always picks the newest), so after
> every upgrade it'll be a surprise what values are supported or if any of the

That right there pretty much eliminates them from consideration for
enterprise applications.

Re: Contemplating SSD Hardware RAID

From

Merlin Moncure

Date:

21 June 2011, 18:35:40

On Tue, Jun 21, 2011 at 3:32 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Tue, Jun 21, 2011 at 2:25 PM, Yeb Havinga <yebhavinga@gmail.com> wrote:
>
>> strange values mean. It's too bad however that OCZ doesn't let the user
>> choose which firmware to run (the tool always picks the newest), so after
>> every upgrade it'll be a surprise what values are supported or if any of the
>
> That right there pretty much eliminates them from consideration for
> enterprise applications.

As much as I've been irritated with Intel for being intentionally
oblique on the write caching issue -- I think they remain more or less
the only game in town for enterprise use.  The x25-e has been the only
drive up until recently to seriously consider for write heavy
applications (and Greg is pretty skeptical about that even).   I have
directly observed vertex pro drives burning out in ~ 18 months in
constant duty applications (which if you did the math is about right
on schedule) -- not good enough IMO.

ISTM Intel is clearly positioning the 710 Lyndonville as the main
drive in database environments to go with for most cases.   At 3300
IOPS (see http://www.anandtech.com/show/4452/intel-710-and-720-ssd-specifications)
and some tinkering that results in 65 times greater longevity than
standard MLC, I expect the drive will be a huge hit as long as can
sustain those numbers writing durably and it comes it at under the
10$/gb price point.

merlin

Re: Contemplating SSD Hardware RAID

From

Greg Smith

Date:

21 June 2011, 19:17:46

On 06/21/2011 05:35 PM, Merlin Moncure wrote:
> On Tue, Jun 21, 2011 at 3:32 PM, Scott Marlowe<scott.marlowe@gmail.com>  wrote:
>
>> On Tue, Jun 21, 2011 at 2:25 PM, Yeb Havinga<yebhavinga@gmail.com>  wrote:
>>
>>
>>> It's too bad however that OCZ doesn't let the user
>>> choose which firmware to run (the tool always picks the newest), so after
>>> every upgrade it'll be a surprise what values are supported or if any of the
>>>
>> That right there pretty much eliminates them from consideration for
>> enterprise applications.
>>
> As much as I've been irritated with Intel for being intentionally
> oblique on the write caching issue -- I think they remain more or less
> the only game in town for enterprise use.

That's at the core of why I have been so consistently cranky about
them.  The sort of customers I deal with who are willing to spend money
on banks of SSD will buy Intel, and the "Enterprise" feature set seems
completely enough that it doesn't set off any alarms to them.  The same
is not true of OCZ, which unfortunately means I never even get them onto
the vendor grid in the first place.  Everybody runs out to buy the Intel
units instead, they get burned by the write cache issues, lose data, and
sometimes they even blame PostgreSQL for it.

I have a customer who has around 50 X25-E drives, a little stack of them
in six servers running two similar databases.  They each run about a
terabyte, and refill about every four months (old data eventually ages
out, replaced by new).  At the point I started working with them, they
had lost the entire recent history twice--terabyte gone,
whoosh!--because the power reliability is poor in their area.  And
network connectivity is bad enough that they can't ship this volume of
updates to elsewhere either.

It happened again last month, and for the first time the database was
recoverable.  I converted one server to be a cold spare, just archive
the WAL files.  And that's the only one that lived through the nasty
power spike+outage that corrupted the active databases on both the
master and the warm standby of each set.  All four of the servers where
PostgreSQL was writing data and expected proper fsync guarantees, all
gone from one power issue.  At the point I got involved, they were about
to cancel this entire PostgreSQL experiment because they assumed the
database had to be garbage that this kept happening; until I told them
about this known issue they never considered the drives were the
problem.  That's what I think of when people ask me about the Intel X25-E.

I've very happy with the little 3rd generation consumer grade SSD I
bought from Intel though (320 series).  If they just do the same style
of write cache and reliability rework to the enterprise line, but using
better flash, I agree that the first really serious yet affordable
product for the database market may finally come out of that.  We're
just not there yet, and unfortunately for the person who started this
round of discussion throwing hardware RAID at the problem doesn't make
this go away either.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

Re: Contemplating SSD Hardware RAID

From

Shaun Thomas

Date:

22 June 2011, 12:50:57

On 06/21/2011 05:17 PM, Greg Smith wrote:

> If they just do the same style of write cache and reliability rework
> to the enterprise line, but using better flash, I agree that the
> first really serious yet affordable product for the database market
> may finally come out of that.

After we started our research in this area and finally settled on
FusionIO PCI cards (which survived several controlled and uncontrolled
failures completely intact), a consultant tried telling us he could
build us a cage of SSDs for much cheaper, and with better performance.

Once I'd stopped laughing, I quickly shooed him away. One of the reasons
the PCI cards do so well is that they operate in a directly
memory-addressable manner, and always include capacitors. You lose some
overhead due to the CPU running the driver, and you can't boot off of
them, but they're leagues ahead in terms of safety.

But like you said, they're certainly not what most people would call
affordable. 640GB for two orders of magnitude more than an equivalent
hard drive would cost? Ouch. Most companies are familiar---and hence
comfortable---with RAIDs of various flavors, so they see SSD performance
numbers and think to themselves "What if that were in a RAID?" Right
now, drives aren't quite there yet, or the ones that are cost more than
most want to spend.

It's a shame, really. But I'm willing to wait it out for now.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email