Re: Recommendations for SSDs in production? - Mailing list pgsql-general

From Yeb Havinga
Subject Re: Recommendations for SSDs in production?
Date
Msg-id 4ECE44BB.9060102@gmail.com
Whole thread Raw
In response to Re: Recommendations for SSDs in production?  (David Boreham <david_list@boreham.org>)
Responses Re: Recommendations for SSDs in production?  (Yeb Havinga <yebhavinga@gmail.com>)
List pgsql-general
On 2011-11-04 16:24, David Boreham wrote:
> On 11/4/2011 8:26 AM, Yeb Havinga wrote:
>>
>> First, if your'e interested in doing a test like this yourself, I'm
>> testing on ubuntu 11.10, but even though this is a brand new
>> distribution, the smart database was a few months old.
>> 'update-smart-drivedb' had as effect that the names of the values
>> turned into something useful: instead of #LBA's written, it now shows
>> #32MiB's written. Also there are now three 'workload' related
>> parameters.
>>
> I submitted the patch for these to smartmontools a few weeks ago and
> it is now in the current db but not yet in any of the distro update
> packages. I probably forgot to mention in my post here that you need
> the latest db for the 710. Also, if you pull the trunk source code and
> build it yourself it has the ability to decode the drive stats log
> data (example pasted below). I haven't yet found a use for this
> myself, but it does seem to have a little more informaiton than the
> SMART attributes. (Thanks to Christian Franke of the smartmontools
> project for implementing this feature)
>
> Your figures from the workload wear roughly match mine. In production
> we don't expect to subject the drives to anything close to 100% of the
> pgbench workload (probably around 1/10 of that on average), so the
> predicted wear life of the drive is 10+ years in our estimates, under
> production loads.
>
> The big question of course is can the drive's wearout estimate be
> trusted ? A little more information from Intel about how it is
> calculated would help allay concerns in this area.

TLDR: some numbers after three week media wear testing on a software
mirror with intel 710 and ocz vertex 2 pro.

The last couple of weeks I've been running pgbench for an hour then
sleep for 10 minutes in an infinite loop, just to see how values would grow.

This is the intel 710 mirror leg:

225 Host_Writes_32MiB       0x0032   100   100   000    Old_age
Always       -       3020093
226 Workld_Media_Wear_Indic 0x0032   100   100   000    Old_age
Always       -       2803
227 Workld_Host_Reads_Perc  0x0032   100   100   000    Old_age
Always       -       0
228 Workload_Minutes        0x0032   100   100   000    Old_age
Always       -       21444
232 Available_Reservd_Space 0x0033   100   100   010    Pre-fail
Always       -       0
233 Media_Wearout_Indicator 0x0032   098   098   000    Old_age
Always       -       0
241 Host_Writes_32MiB       0x0032   100   100   000    Old_age
Always       -       3020093
242 Host_Reads_32MiB        0x0032   100   100   000    Old_age
Always       -       22259

Note: raw value of 226 (E2) = 2803. According to
http://www.tomshardware.com/reviews/ssd-710-enterprise-x25-e,3038-4.html
you have to divide it by 1024 to get a percentage. That would be 2%.
This matches with 098 of the (not raw) value at 233 (E9).

This is the ocz vertex 2 PRO mirror leg:

   5 Retired_Block_Count     0x0033   100   100   003    Pre-fail
Always       -       0
  12 Power_Cycle_Count       0x0032   100   100   000    Old_age
Always       -       22
100 Gigabytes_Erased        0x0032   000   000   000    Old_age
Always       -       21120
170 Reserve_Block_Count     0x0032   000   000   000    Old_age
Always       -       34688
177 Wear_Range_Delta        0x0000   000   000   000    Old_age
Offline      -       3
230 Life_Curve_Status       0x0013   100   100   000    Pre-fail
Always       -       100
231 SSD_Life_Left           0x0013   100   100   010    Pre-fail
Always       -       0
232 Available_Reservd_Space 0x0000   000   000   000    Old_age
Offline      -       33
233 SandForce_Internal      0x0000   000   000   000    Old_age
Offline      -       21184
234 SandForce_Internal      0x0032   000   000   000    Old_age
Always       -       94656
235 SuperCap_Health         0x0033   100   100   002    Pre-fail
Always       -       0
241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age
Always       -       94656
242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age
Always       -       960

Here the 177 (B1) wear range delta is on a raw value of 3 - this isn't
ssd life left, but Delta between most-worn and least-worn Flash blocks.
I really wonder at which point SSD life left will change to 99 on this
drive..

regards,
Yeb Havinga



pgsql-general by date:

Previous
From: Phoenix Kiula
Date:
Subject: Re: Table Design question for gurus (without going to "NoSQL")...
Next
From: pasman pasmański
Date:
Subject: How to display the progress of query