Re: Intel SSDs that may not suck - Mailing list pgsql-performance

From Greg Smith
Subject Re: Intel SSDs that may not suck
Date
Msg-id 4D9222B9.9020109@2ndQuadrant.com
Whole thread Raw
In response to Re: Intel SSDs that may not suck  (Yeb Havinga <yebhavinga@gmail.com>)
List pgsql-performance
On 03/29/2011 06:34 AM, Yeb Havinga wrote:
> While I appreciate the heads up about these new drives, your posting
> suggests (though you formulated in a way that you do not actually say
> it) that OCZ products do not have a long term reliability. No factual
> data. If you have knowledge of sandforce based OCZ drives fail, that'd
> be interesting because that's the product line what the new Intel SSD
> ought to be compared with.

I didn't want to say anything too strong until I got to the bottom of
the reports I'd been sorting through.  It turns out that there is a very
wide incompatibility between OCZ drives and some popular Gigabyte
motherboards:

http://www.ocztechnologyforum.com/forum/showthread.php?76177-do-you-own-a-Gigabyte-motherboard-and-have-the-SMART-error-with-FW1.11...look-inside

(I'm typing this message on a system with one of the impacted
combinations, one reason why I don't own a Vertex 2 Pro yet.  That I
would have to run a "Beta BIOS" does not inspire confidence.)

What happens on the models impacted is that you can't get SMART data
from the drive.  That means no monitoring for the sort of expected
failures we all know can happen with any drive.  So far that looks to be
at the bottom of all the anecdotal failure reports I'd found:  the
drives may have been throwing bad sectors or some other early failure,
and the owners had no idea because they thought SMART would warn
them--but it wasn't working at all.  Thus, don't find out there's a
problem until the drive just dies altogether one day.

More popular doesn't always mean more reliable, but for stuff like this
it helps.  Intel ships so many more drives than OCZ that I'd be shocked
if Gigabyte themselves didn't have reference samples of them for
testing.  This really looks like more of a warning about why you should
be particularly aggressive with checking SMART when running recently
introduced drives, which it sounds like you are already doing.

Reliability in this area is so strange...a diversion to older drives
gives an idea how annoyed I am about all this.  Last year, I gave up on
Western Digital's consumer drives (again).  Not because the failure
rates were bad, but because the one failure I did run into was so
terrible from a SMART perspective.  The drive just lied about the whole
problem so aggressively I couldn't manage the process.  I couldn't get
the drive to admit it had a problem such that it could turn into an RMA
candidate, despite failing every time I ran an aggressive SMART error
check.  It would reallocate a few sectors, say "good as new!", and then
fail at the next block when I re-tested.  Did that at least a dozen
times before throwing it in the "pathological drives" pile I keep around
for torture testing.

Meanwhile, the Seagate drives I switched back to are terrible, from a
failure percentage perspective.  I just had two start to go bad last
week, both halves of an array which is always fun.  But, the failure
started with very clearly labeled increases in reallocated sectors, and
the drive that eventually went really bad (making the bad noises) was
kicked back for RMA.  If you've got redundancy, I'll take components
that fail cleanly over ones that hide what's going on, even if the one
that fails cleanly is actually more likely to fail.  With a rebuild
always a drive swap away, having accurate data makes even a higher
failure rate manageable.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


pgsql-performance by date:

Previous
From:
Date:
Subject: Re: Intel SSDs that may not suck
Next
From: "Kevin Grittner"
Date:
Subject: Re: very long updates very small tables