Re: Fwd: Re: SSDD reliability - Mailing list pgsql-general

From Toby Corkindale
Subject Re: Fwd: Re: SSDD reliability
Date
Msg-id 4DC356A4.8020004@strategicdata.com.au
Whole thread Raw
In response to Re: Fwd: Re: SSDD reliability  (Florian Weimer <fweimer@bfk.de>)
Responses Re: Fwd: Re: SSDD reliability  (Toby Corkindale <toby.corkindale@strategicdata.com.au>)
Re: Fwd: Re: SSDD reliability  ("mark" <dvlhntr@gmail.com>)
List pgsql-general
On 05/05/11 18:36, Florian Weimer wrote:
> * Greg Smith:
>
>> Intel claims their Annual Failure Rate (AFR) on their SSDs in IT
>> deployments (not OEM ones) is 0.6%.  Typical measured AFR rates for
>> mechanical drives is around 2% during their first year, spiking to 5%
>> afterwards.  I suspect that Intel's numbers are actually much better
>> than the other manufacturers here, so a SSD from anyone else can
>> easily be less reliable than a regular hard drive still.
>
> I'm a bit concerned with usage-dependent failures.  Presumably, two SDDs
> in a RAID-1 configuration are weared down in the same way, and it would
> be rather inconvenient if they failed at the same point.  With hard
> disks, this doesn't seem to happen; even bad batches fail pretty much
> randomly.

Actually I think it'll be the same as with hard disks.
ie. A batch of drives with sequential serial numbers will have a fairly
similar average lifetime, but they won't pop their clogs all on the same
day. (Unless there is an outside influence - see note 1)

The wearing-out of SSDs is not as exact as people seem to think. If the
drive is rated for 10,000 erase cycles, then that is meant to be a
MINIMUM amount. So most blocks will get more than that amount, and maybe
a small number will die before that amount. I guess it's a probability
curve, engineered such that 95% or some other high percentage will
outlast that count. (and the SSDs have reserved blocks which are
introduced to take over from failing blocks, invisibly to the end-user
-since it can always read from the failing-to-erase block)

Note 1:
I have seen an array that was powered on continuously for about six
years, which killed half the disks when it was finally powered down,
left to cool for a few hours, then started up again.

pgsql-general by date:

Previous
From: Toby Corkindale
Date:
Subject: SMART attributes for SSD (was: SSDD reliability)
Next
From: Josh Kupershmidt
Date:
Subject: Re: Queries Regarding Postgresql Replication