Thread: Intel SSDs that may not suck

Intel SSDs that may not suck

From
Greg Smith
Date:
Today is the launch of Intel's 3rd generation SSD line, the 320 series.
And they've finally produced a cheap consumer product that may be useful
for databases, too!  They've put 6 small capacitors onto the board and
added logic to flush the write cache if the power drops.  The cache on
these was never very big, so they were able to avoid needing one of the
big super-capacitors instead.  Having 6 little ones is probably a net
reliability win over the single point of failure, too.

Performance is only a little better than earlier generation designs,
which means they're still behind the OCZ Vertex controllers that have
been recommended on this list.  I haven't really been hearing good
things about long-term reliability of OCZ's designs anyway, so glad to
have an alternative.  *Important*:  don't buy SSD for important data
without also having a good redundancy/backup plan.  As relatively new
technology they do still have a pretty high failure rate.  Make sure you
budget for two drives and make multiple copies of your data.

Anyway, the new Intel drivers fast enough for most things, though, and
are going to be very inexpensive.  See
http://www.storagereview.com/intel_ssd_320_review_300gb for some
simulated database tests.  There's more about the internals at
http://www.anandtech.com/show/4244/intel-ssd-320-review and the white
paper about the capacitors is at

http://newsroom.intel.com/servlet/JiveServlet/download/38-4324/Intel_SSD_320_Series_Enhance_Power_Loss_Technology_Brief.pdf

Some may still find these two cheap for enterprise use, given the use of
MLC limits how much activity these drives can handle.  But it's great to
have a new option for lower budget system that can tolerate some risk there.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: Intel SSDs that may not suck

From
Andy
Date:
This might be a bit too little too late though. As you mentioned there really isn't any real performance improvement
forthe Intel SSD. Meanwhile, SandForce (the controller that OCZ Vertex is based on) is releasing its next generation
controllerat a reportedly huge performance increase. 

Is there any benchmark measuring the performance of these SSD's (the new Intel vs. the new SandForce) running database
workloads?The benchmarks I've seen so far are for desktop applications. 

Andy

--- On Mon, 3/28/11, Greg Smith <greg@2ndQuadrant.com> wrote:

> From: Greg Smith <greg@2ndQuadrant.com>
> Subject: [PERFORM] Intel SSDs that may not suck
> To: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
> Date: Monday, March 28, 2011, 4:21 PM
> Today is the launch of Intel's 3rd
> generation SSD line, the 320 series.  And they've
> finally produced a cheap consumer product that may be useful
> for databases, too!  They've put 6 small capacitors
> onto the board and added logic to flush the write cache if
> the power drops.  The cache on these was never very
> big, so they were able to avoid needing one of the big
> super-capacitors instead.  Having 6 little ones is
> probably a net reliability win over the single point of
> failure, too.
>
> Performance is only a little better than earlier generation
> designs, which means they're still behind the OCZ Vertex
> controllers that have been recommended on this list.  I
> haven't really been hearing good things about long-term
> reliability of OCZ's designs anyway, so glad to have an
> alternative.  *Important*:  don't buy SSD for
> important data without also having a good redundancy/backup
> plan.  As relatively new technology they do still have
> a pretty high failure rate.  Make sure you budget for
> two drives and make multiple copies of your data.
>
> Anyway, the new Intel drivers fast enough for most things,
> though, and are going to be very inexpensive.  See http://www.storagereview.com/intel_ssd_320_review_300gb
> for some simulated database tests.  There's more about
> the internals at http://www.anandtech.com/show/4244/intel-ssd-320-review
> and the white paper about the capacitors is at
http://newsroom.intel.com/servlet/JiveServlet/download/38-4324/Intel_SSD_320_Series_Enhance_Power_Loss_Technology_Brief.pdf
>
> Some may still find these two cheap for enterprise use,
> given the use of MLC limits how much activity these drives
> can handle.  But it's great to have a new option for
> lower budget system that can tolerate some risk there.
>
> -- Greg Smith   2ndQuadrant US   
> greg@2ndQuadrant.com   Baltimore,
> MD
> PostgreSQL Training, Services, and 24x7 Support 
> www.2ndQuadrant.us
> "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
>
>
> -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>




Re: Intel SSDs that may not suck

From
Merlin Moncure
Date:
On Mon, Mar 28, 2011 at 7:54 PM, Andy <angelflow@yahoo.com> wrote:
> This might be a bit too little too late though. As you mentioned there really isn't any real performance improvement
forthe Intel SSD. Meanwhile, SandForce (the controller that OCZ Vertex is based on) is releasing its next generation
controllerat a reportedly huge performance increase. 
>
> Is there any benchmark measuring the performance of these SSD's (the new Intel vs. the new SandForce) running
databaseworkloads? The benchmarks I've seen so far are for desktop applications. 

The random performance data is usually a rough benchmark.  The
sequential numbers are mostly useless and always have been.  The
performance of either the ocz or intel drive is so disgustingly fast
compared to a hard drives that the main stumbling block is life span
and write endurance now that they are starting to get capactiors.

My own experience with MLC drives is that write cycle expectations are
more or less as advertised. They do go down (hard), and have to be
monitored. If you are writing a lot of data this can get pretty
expensive although the cost dynamics are getting better and better for
flash. I have no idea what would be precisely prudent, but maybe some
good monitoring tools and phased obsolescence at around 80% duty cycle
might not be a bad starting point.  With hard drives, you can kinda
wait for em to pop and swap em in -- this is NOT a good idea for flash
raid volumes.

merlin

Re: Intel SSDs that may not suck

From
Jesper Krogh
Date:
On 2011-03-29 06:13, Merlin Moncure wrote:
> My own experience with MLC drives is that write cycle expectations are
> more or less as advertised. They do go down (hard), and have to be
> monitored. If you are writing a lot of data this can get pretty
> expensive although the cost dynamics are getting better and better for
> flash. I have no idea what would be precisely prudent, but maybe some
> good monitoring tools and phased obsolescence at around 80% duty cycle
> might not be a bad starting point.  With hard drives, you can kinda
> wait for em to pop and swap em in -- this is NOT a good idea for flash
> raid volumes.
What do you mean by "hard", I have some in our setup, but
havent seen anyting "hard" just yet. Based on report on the net
they seem to slow down writes to "next to nothing" when they
get used but that seems to be more gracefully than old
rotating drives..  can you elaborate a bit more?

Jesper

--
Jesper

Re: Intel SSDs that may not suck

From
Scott Marlowe
Date:
On Mon, Mar 28, 2011 at 10:55 PM, Jesper Krogh <jesper@krogh.cc> wrote:
> On 2011-03-29 06:13, Merlin Moncure wrote:
>>
>> My own experience with MLC drives is that write cycle expectations are
>> more or less as advertised. They do go down (hard), and have to be
>> monitored. If you are writing a lot of data this can get pretty
>> expensive although the cost dynamics are getting better and better for
>> flash. I have no idea what would be precisely prudent, but maybe some
>> good monitoring tools and phased obsolescence at around 80% duty cycle
>> might not be a bad starting point.  With hard drives, you can kinda
>> wait for em to pop and swap em in -- this is NOT a good idea for flash
>> raid volumes.
>
> What do you mean by "hard", I have some in our setup, but
> havent seen anyting "hard" just yet. Based on report on the net
> they seem to slow down writes to "next to nothing" when they
> get used but that seems to be more gracefully than old
> rotating drives..  can you elaborate a bit more?

My understanding is that without running trim commands and such, they
become fragmented and slower.  But, when they start running out of
write cycles they just die.  I.e. they go down hard.

Re: Intel SSDs that may not suck

From
Justin Pitts
Date:
The potential breakthrough here with the 320 is consumer grade SSD
performance and price paired with high reliability.

On Mon, Mar 28, 2011 at 7:54 PM, Andy <angelflow@yahoo.com> wrote:
> This might be a bit too little too late though. As you mentioned there really isn't any real performance improvement
forthe Intel SSD. Meanwhile, SandForce (the controller that OCZ Vertex is based on) is releasing its next generation
controllerat a reportedly huge performance increase. 
>
> Is there any benchmark measuring the performance of these SSD's (the new Intel vs. the new SandForce) running
databaseworkloads? The benchmarks I've seen so far are for desktop applications. 
>
> Andy
>
> --- On Mon, 3/28/11, Greg Smith <greg@2ndQuadrant.com> wrote:
>
>> From: Greg Smith <greg@2ndQuadrant.com>
>> Subject: [PERFORM] Intel SSDs that may not suck
>> To: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>
>> Date: Monday, March 28, 2011, 4:21 PM
>> Today is the launch of Intel's 3rd
>> generation SSD line, the 320 series.  And they've
>> finally produced a cheap consumer product that may be useful
>> for databases, too!  They've put 6 small capacitors
>> onto the board and added logic to flush the write cache if
>> the power drops.  The cache on these was never very
>> big, so they were able to avoid needing one of the big
>> super-capacitors instead.  Having 6 little ones is
>> probably a net reliability win over the single point of
>> failure, too.
>>
>> Performance is only a little better than earlier generation
>> designs, which means they're still behind the OCZ Vertex
>> controllers that have been recommended on this list.  I
>> haven't really been hearing good things about long-term
>> reliability of OCZ's designs anyway, so glad to have an
>> alternative.  *Important*:  don't buy SSD for
>> important data without also having a good redundancy/backup
>> plan.  As relatively new technology they do still have
>> a pretty high failure rate.  Make sure you budget for
>> two drives and make multiple copies of your data.
>>
>> Anyway, the new Intel drivers fast enough for most things,
>> though, and are going to be very inexpensive.  See http://www.storagereview.com/intel_ssd_320_review_300gb
>> for some simulated database tests.  There's more about
>> the internals at http://www.anandtech.com/show/4244/intel-ssd-320-review
>> and the white paper about the capacitors is at
http://newsroom.intel.com/servlet/JiveServlet/download/38-4324/Intel_SSD_320_Series_Enhance_Power_Loss_Technology_Brief.pdf
>>
>> Some may still find these two cheap for enterprise use,
>> given the use of MLC limits how much activity these drives
>> can handle.  But it's great to have a new option for
>> lower budget system that can tolerate some risk there.
>>
>> -- Greg Smith   2ndQuadrant US
>> greg@2ndQuadrant.com   Baltimore,
>> MD
>> PostgreSQL Training, Services, and 24x7 Support
>> www.2ndQuadrant.us
>> "PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
>>
>>
>> -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>

Re: Intel SSDs that may not suck

From
Yeb Havinga
Date:
Hello Greg, list,

On 2011-03-28 22:21, Greg Smith wrote:
> Today is the launch of Intel's 3rd generation SSD line, the 320
> series.  And they've finally produced a cheap consumer product that
> may be useful for databases, too!  They've put 6 small capacitors onto
> the board and added logic to flush the write cache if the power
> drops.  The cache on these was never very big, so they were able to
> avoid needing one of the big super-capacitors instead.  Having 6
> little ones is probably a net reliability win over the single point of
> failure, too.
>
> Performance is only a little better than earlier generation designs,
> which means they're still behind the OCZ Vertex controllers that have
> been recommended on this list.  I haven't really been hearing good
> things about long-term reliability of OCZ's designs anyway, so glad to
> have an alternative.  *Important*:  don't buy SSD for important data
> without also having a good redundancy/backup plan.  As relatively new
> technology they do still have a pretty high failure rate.  Make sure
> you budget for two drives and make multiple copies of your data.
>
> Anyway, the new Intel drivers fast enough for most things, though, and
> are going to be very inexpensive.  See
> http://www.storagereview.com/intel_ssd_320_review_300gb for some
> simulated database tests.  There's more about the internals at
> http://www.anandtech.com/show/4244/intel-ssd-320-review and the white
> paper about the capacitors is at
>
http://newsroom.intel.com/servlet/JiveServlet/download/38-4324/Intel_SSD_320_Series_Enhance_Power_Loss_Technology_Brief.pdf
>
> Some may still find these two cheap for enterprise use, given the use
> of MLC limits how much activity these drives can handle.  But it's
> great to have a new option for lower budget system that can tolerate
> some risk there.
>
While I appreciate the heads up about these new drives, your posting
suggests (though you formulated in a way that you do not actually say
it) that OCZ products do not have a long term reliability. No factual
data. If you have knowledge of sandforce based OCZ drives fail, that'd
be interesting because that's the product line what the new Intel SSD
ought to be compared with. From my POV I've verified that the sandforce
based OCZ drives operate as they should (w.r.t. barriers/write through)
and I've reported what and how that testing was done (where I really
appreciated your help with) -
http://archives.postgresql.org/pgsql-performance/2010-07/msg00449.php.

The three drives we're using in a development environment right now
report (with recent SSD firmwares and smartmontools) their health status
including the supercap status as well as reserved blocks and a lot more
info, that can be used to monitor when it's about to be dead. Since none
of the drives have failed yet, or are in the vicinity of their end of
life predictions, it is currently unknown if this health status is
reliable. It may be, but may as well not be. Therefore I'm very
interested in hearing hard facts about failures and the smart readings
right before that.

Below are smart readings from two Vertex 2 Pro's, the first is the same
I did the testing with earlier. You can see it's lifetime reads/writes
as well as unexpected power loss count is larger than the other, newer
one. The FAILING_NOW of available reserved space is an artefact of
smartmontools db that has its threshold wrong: it should be read as Gb's
reserved space, and I suspect for a new drive it might be in the order
of 18 or 20.

It's hard to compare with spindles: I've seen them fail in all sorts of
ways, but as of yet I've seen no SSD failure yet. I'm inclined to start
a perpetual pgbench on one ssd with monitoring of smart stats to see if
what they report is really a good indicator of their lifetime. If that
is so I'm beginning to believe then this technology is better in failure
predictability than spindles, which pretty much seems at random when you
have large arrays.

Model I tested with earlier:

=== START OF INFORMATION SECTION ===
Model Family:     SandForce Driven SSDs
Device Model:     OCZ VERTEX2-PRO
Serial Number:    OCZ-BVW101PBN8Q8H8M5
LU WWN Device Id: 5 e83a97 f88e46007
Firmware Version: 1.32
User Capacity:    50,020,540,416 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 6
Local Time is:    Tue Mar 29 11:25:04 2011 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                         was never started.
                                         Auto Offline Data Collection:
Disabled.
Self-test execution status:      (   0) The previous self-test routine
completed
                                         without error or no self-test
has ever
                                         been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7f) SMART execute Offline immediate.
                                         Auto Offline data collection
on/off support.
                                         Abort Offline collection upon new
                                         command.
                                         Offline surface scan supported.
                                         Self-test supported.
                                         Conveyance Self-test supported.
                                         Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                         power-saving mode.
                                         Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                         General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   5) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                         SCT Error Recovery Control
supported.
                                         SCT Feature Control supported.
                                         SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x000f   120   120   050    Pre-fail
Always       -       0/0
   5 Retired_Block_Count     0x0033   100   100   003    Pre-fail
Always       -       0
   9 Power_On_Hours_and_Msec 0x0032   100   100   000    Old_age
Always       -       965h+05m+20.870s
  12 Power_Cycle_Count       0x0032   100   100   000    Old_age
Always       -       234
  13 Soft_Read_Error_Rate    0x000a   120   120   000    Old_age
Always       -       752/0
100 Gigabytes_Erased        0x0032   000   000   000    Old_age
Always       -       1152
170 Reserve_Block_Count     0x0032   000   000   000    Old_age
Always       -       17024
171 Program_Fail_Count      0x0032   000   000   000    Old_age
Always       -       0
172 Erase_Fail_Count        0x0032   000   000   000    Old_age
Always       -       0
174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age
Offline      -       50
177 Wear_Range_Delta        0x0000   000   000   ---    Old_age
Offline      -       0
181 Program_Fail_Count      0x0032   000   000   000    Old_age
Always       -       0
182 Erase_Fail_Count        0x0032   000   000   000    Old_age
Always       -       0
184 IO_Error_Detect_Code_Ct 0x0032   100   100   090    Old_age
Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age
Always       -       0
194 Temperature_Celsius     0x0022   032   031   000    Old_age
Always       -       32 (0 0 0 31)
195 ECC_Uncorr_Error_Count  0x001c   120   120   000    Old_age
Offline      -       0/0
196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail
Always       -       0
198 Uncorrectable_Sector_Ct 0x0010   120   120   000    Old_age
Offline      -       0x000000000000
199 SATA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0
201 Unc_Soft_Read_Err_Rate  0x001c   120   120   000    Old_age
Offline      -       0/0
204 Soft_ECC_Correct_Rate   0x001c   120   120   000    Old_age
Offline      -       0/0
230 Life_Curve_Status       0x0013   100   100   000    Pre-fail
Always       -       100
231 SSD_Life_Left           0x0013   100   100   010    Pre-fail
Always       -       0
232 Available_Reservd_Space 0x0000   000   000   010    Old_age
Offline  FAILING_NOW 16
233 SandForce_Internal      0x0000   000   000   000    Old_age
Offline      -       1088
234 SandForce_Internal      0x0032   000   000   000    Old_age
Always       -       6592
235 SuperCap_Health         0x0033   100   100   001    Pre-fail
Always       -       0
241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age
Always       -       6592
242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age
Always       -       3200

SMART Error Log not supported
SMART Self-test Log not supported
SMART Selective self-test log data structure revision number 1
  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
     1        0        0  Not_testing
     2        0        0  Not_testing
     3        0        0  Not_testing
     4        0        0  Not_testing
     5        0        0  Not_testing
Selective self-test flags (0x0):
   After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Relatively new model:

=== START OF INFORMATION SECTION ===
Model Family:     SandForce Driven SSDs
Device Model:     OCZ-VERTEX2 PRO
Serial Number:    OCZ-7AVL07UM37FP45U1
LU WWN Device Id: 5 e83a97 f83e6388d
Firmware Version: 1.32
User Capacity:    50,020,540,416 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 6
Local Time is:    Tue Mar 29 11:34:28 2011 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                         was never started.
                                         Auto Offline Data Collection:
Disabled.
Self-test execution status:      (   0) The previous self-test routine
completed
                                         without error or no self-test
has ever
                                         been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x7f) SMART execute Offline immediate.
                                         Auto Offline data collection
on/off support.
                                         Abort Offline collection upon new
                                         command.
                                         Offline surface scan supported.
                                         Self-test supported.
                                         Conveyance Self-test supported.
                                         Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                         power-saving mode.
                                         Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                         General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (   5) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                         SCT Error Recovery Control
supported.
                                         SCT Feature Control supported.
                                         SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
UPDATED  WHEN_FAILED RAW_VALUE
   1 Raw_Read_Error_Rate     0x000f   120   120   050    Pre-fail
Always       -       0/0
   5 Retired_Block_Count     0x0033   100   100   003    Pre-fail
Always       -       0
   9 Power_On_Hours_and_Msec 0x0032   100   100   000    Old_age
Always       -       452h+19m+31.020s
  12 Power_Cycle_Count       0x0032   100   100   000    Old_age
Always       -       64
  13 Soft_Read_Error_Rate    0x000a   120   120   000    Old_age
Always       -       3067/0
100 Gigabytes_Erased        0x0032   000   000   000    Old_age
Always       -       128
170 Reserve_Block_Count     0x0032   000   000   000    Old_age
Always       -       17440
171 Program_Fail_Count      0x0032   000   000   000    Old_age
Always       -       0
172 Erase_Fail_Count        0x0032   000   000   000    Old_age
Always       -       0
174 Unexpect_Power_Loss_Ct  0x0030   000   000   000    Old_age
Offline      -       16
177 Wear_Range_Delta        0x0000   000   000   ---    Old_age
Offline      -       0
181 Program_Fail_Count      0x0032   000   000   000    Old_age
Always       -       0
182 Erase_Fail_Count        0x0032   000   000   000    Old_age
Always       -       0
184 IO_Error_Detect_Code_Ct 0x0032   100   100   090    Old_age
Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age
Always       -       0
194 Temperature_Celsius     0x0022   032   032   000    Old_age
Always       -       32 (Min/Max 0/32)
195 ECC_Uncorr_Error_Count  0x001c   120   120   000    Old_age
Offline      -       0/0
196 Reallocated_Event_Count 0x0033   100   100   003    Pre-fail
Always       -       0
198 Uncorrectable_Sector_Ct 0x0010   120   120   000    Old_age
Offline      -       0x000000000000
199 SATA_CRC_Error_Count    0x003e   200   200   000    Old_age
Always       -       0
201 Unc_Soft_Read_Err_Rate  0x001c   120   120   000    Old_age
Offline      -       0/0
204 Soft_ECC_Correct_Rate   0x001c   120   120   000    Old_age
Offline      -       0/0
230 Life_Curve_Status       0x0013   100   100   000    Pre-fail
Always       -       100
231 SSD_Life_Left           0x0013   100   100   010    Pre-fail
Always       -       0
232 Available_Reservd_Space 0x0000   000   000   010    Old_age
Offline  FAILING_NOW 17
233 SandForce_Internal      0x0000   000   000   000    Old_age
Offline      -       128
234 SandForce_Internal      0x0032   000   000   000    Old_age
Always       -       448
235 SuperCap_Health         0x0033   100   100   010    Pre-fail
Always       -       0
241 Lifetime_Writes_GiB     0x0032   000   000   000    Old_age
Always       -       448
242 Lifetime_Reads_GiB      0x0032   000   000   000    Old_age
Always       -       192

SMART Error Log not supported
SMART Self-test Log not supported
SMART Selective self-test log data structure revision number 1
  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
     1        0        0  Not_testing
     2        0        0  Not_testing
     3        0        0  Not_testing
     4        0        0  Not_testing
     5        0        0  Not_testing
Selective self-test flags (0x0):
   After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

--
Yeb Havinga
http://www.mgrid.net/
Mastering Medical Data


Re: Intel SSDs that may not suck

From
Jeff
Date:
On Mar 29, 2011, at 12:13 AM, Merlin Moncure wrote:

>
> My own experience with MLC drives is that write cycle expectations are
> more or less as advertised. They do go down (hard), and have to be
> monitored. If you are writing a lot of data this can get pretty
> expensive although the cost dynamics are getting better and better for
> flash. I have no idea what would be precisely prudent, but maybe some
> good monitoring tools and phased obsolescence at around 80% duty cycle
> might not be a bad starting point.  With hard drives, you can kinda
> wait for em to pop and swap em in -- this is NOT a good idea for flash
> raid volumes.



we've been running some of our DB's on SSD's (x25m's, we also have a
pair of x25e's in another box we use for some super hot tables).  They
have been in production for well over a year (in some cases, nearly a
couple years) under heavy load.

We're currently being bit in the ass by performance degradation and
we're working out plans to remedy the situation.  One box has 8 x25m's
in a R10 behind a P400 controller.  First, the p400 is not that
powerful and we've run experiments with newer (p812) controllers that
have been generally positive.   The main symptom we've been seeing is
write stalls.  Writing will go, then come to a complete halt for 0.5-2
seconds, then resume.   The fix we're going to do is replace each
drive in order with the rebuild occuring between each.  Then we do a
security erase to reset the drive back to completely empty (including
the "spare" blocks kept around for writes).

Now that all sounds awful and horrible until you get to overall
performance, especially with reads - you are looking at 20k random
reads per second with a few disks.  Adding in writes does kick it down
a noch, but you're still looking at 10k+ iops. That is the current
trade off.

In general, i wouldn't recommend the cciss stuff with SSD's at this
time because it makes some things such as security erase, smart and
other things near impossible. (performance seems ok though) We've got
some tests planned seeing what we can do with an Areca controller and
some ssds to see how it goes.

Also note that there is a funky interaction with an MSA70 and SSDs.
they do not work together. (I'm not sure if HP's official branded
ssd's have the same issue).

The write degradation could probably be monitored looking at svctime
from sar. We may be implementing that in the near future to detect
when this creeps up again.


--
Jeff Trout <jeff@jefftrout.com>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/




Re: Intel SSDs that may not suck

From
Cédric Villemain
Date:
2011/3/29 Jeff <threshar@torgo.978.org>:
>
> On Mar 29, 2011, at 12:13 AM, Merlin Moncure wrote:
>
>>
>> My own experience with MLC drives is that write cycle expectations are
>> more or less as advertised. They do go down (hard), and have to be
>> monitored. If you are writing a lot of data this can get pretty
>> expensive although the cost dynamics are getting better and better for
>> flash. I have no idea what would be precisely prudent, but maybe some
>> good monitoring tools and phased obsolescence at around 80% duty cycle
>> might not be a bad starting point.  With hard drives, you can kinda
>> wait for em to pop and swap em in -- this is NOT a good idea for flash
>> raid volumes.
>
>
>
> we've been running some of our DB's on SSD's (x25m's, we also have a pair of
> x25e's in another box we use for some super hot tables).  They have been in
> production for well over a year (in some cases, nearly a couple years) under
> heavy load.
>
> We're currently being bit in the ass by performance degradation and we're
> working out plans to remedy the situation.  One box has 8 x25m's in a R10
> behind a P400 controller.  First, the p400 is not that powerful and we've
> run experiments with newer (p812) controllers that have been generally
> positive.   The main symptom we've been seeing is write stalls.  Writing
> will go, then come to a complete halt for 0.5-2 seconds, then resume.   The
> fix we're going to do is replace each drive in order with the rebuild
> occuring between each.  Then we do a security erase to reset the drive back
> to completely empty (including the "spare" blocks kept around for writes).
>
> Now that all sounds awful and horrible until you get to overall performance,
> especially with reads - you are looking at 20k random reads per second with
> a few disks.  Adding in writes does kick it down a noch, but you're still
> looking at 10k+ iops. That is the current trade off.
>
> In general, i wouldn't recommend the cciss stuff with SSD's at this time
> because it makes some things such as security erase, smart and other things
> near impossible. (performance seems ok though) We've got some tests planned
> seeing what we can do with an Areca controller and some ssds to see how it
> goes.
>
> Also note that there is a funky interaction with an MSA70 and SSDs. they do
> not work together. (I'm not sure if HP's official branded ssd's have the
> same issue).
>
> The write degradation could probably be monitored looking at svctime from
> sar. We may be implementing that in the near future to detect when this
> creeps up again.

svctime is untrustable. From the systat author, this field will be
removed in a future version.


--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Re: Intel SSDs that may not suck

From
Jeff
Date:
On Mar 29, 2011, at 10:16 AM, Jeff wrote:

> Now that all sounds awful and horrible until you get to overall
> performance, especially with reads - you are looking at 20k random
> reads per second with a few disks.  Adding in writes does kick it
> down a noch, but you're still looking at 10k+ iops. That is the
> current trade off.
>

We've been doing a burn in for about 4 days now on an array of 8
x25m's behind a p812 controller: here's a sample of what it is
currently doing (I have 10 threads randomly seeking, reading, and 10%
of the time writing (then fsync'ing) out, using my pgiosim tool which
I need to update on pgfoundry)

10:25:24 AM  dev104-2   7652.21 109734.51  12375.22     15.96
8.22      1.07      0.12     88.32
10:25:25 AM  dev104-2   7318.52 104948.15  11696.30     15.94
8.62      1.17      0.13     92.50
10:25:26 AM  dev104-2   7871.56 112572.48  13034.86     15.96
8.60      1.09      0.12     91.38
10:25:27 AM  dev104-2   7869.72 111955.96  13592.66     15.95
8.65      1.10      0.12     91.65
10:25:28 AM  dev104-2   7859.41 111920.79  13560.40     15.97
9.32      1.19      0.13     98.91
10:25:29 AM  dev104-2   7285.19 104133.33  12000.00     15.94
8.08      1.11      0.13     92.59
10:25:30 AM  dev104-2   8017.27 114581.82  13250.91     15.94
8.48      1.06      0.11     90.36
10:25:31 AM  dev104-2   8392.45 120030.19  13924.53     15.96
8.90      1.06      0.11     94.34
10:25:32 AM  dev104-2  10173.86 145836.36  16409.09     15.95
10.72      1.05      0.11    113.52
10:25:33 AM  dev104-2   7007.14 100107.94  11688.89     15.95
7.39      1.06      0.11     79.29
10:25:34 AM  dev104-2   8043.27 115076.92  13192.31     15.95
9.09      1.13      0.12     96.15
10:25:35 AM  dev104-2   7409.09 104290.91  13774.55     15.94
8.62      1.16      0.12     90.55

the 2nd to last column is svctime. first column after dev104-2 is
TPS.  if I kill the writes off, tps rises quite a bit:
10:26:34 AM  dev104-2  22659.41 361528.71      0.00     15.95
10.57      0.42      0.04     99.01
10:26:35 AM  dev104-2  22479.41 359184.31      7.84     15.98
9.61      0.52      0.04     98.04
10:26:36 AM  dev104-2  21734.29 347230.48      0.00     15.98
9.30      0.43      0.04     95.33
10:26:37 AM  dev104-2  21551.46 344023.30    116.50     15.97
9.56      0.44      0.05     97.09
10:26:38 AM  dev104-2  21964.42 350592.31      0.00     15.96
10.25      0.42      0.04     96.15
10:26:39 AM  dev104-2  22512.75 359294.12      7.84     15.96
10.23      0.50      0.04     98.04
10:26:40 AM  dev104-2  22373.53 357725.49      0.00     15.99
9.52      0.43      0.04     98.04
10:26:41 AM  dev104-2  21436.79 342596.23      0.00     15.98
9.17      0.43      0.04     94.34
10:26:42 AM  dev104-2  22525.49 359749.02     39.22     15.97
10.18      0.45      0.04     98.04


now to demonstrate "write stalls" on the problemtic box:
10:30:49 AM  dev104-3      0.00      0.00      0.00      0.00
0.38      0.00      0.00     35.85
10:30:50 AM  dev104-3      3.03      8.08    258.59     88.00
2.43    635.00    333.33    101.01
10:30:51 AM  dev104-3      4.00      0.00    128.00     32.00
0.67    391.75     92.75     37.10
10:30:52 AM  dev104-3     10.89      0.00     95.05      8.73
1.45    133.55     12.27     13.37
10:30:53 AM  dev104-3      0.00      0.00      0.00      0.00
0.00      0.00      0.00      0.00
10:30:54 AM  dev104-3    155.00      0.00   1488.00      9.60
10.88     70.23      2.92     45.20
10:30:55 AM  dev104-3     10.00      0.00    536.00     53.60
1.66    100.20     45.80     45.80
10:30:56 AM  dev104-3     46.53      0.00    411.88      8.85
3.01     78.51      4.30     20.00
10:30:57 AM  dev104-3     11.00      0.00     96.00      8.73
0.79     72.91     27.00     29.70
10:30:58 AM  dev104-3     12.00      0.00     96.00      8.00
0.79     65.42     11.17     13.40
10:30:59 AM  dev104-3      7.84      7.84     62.75      9.00
0.67     85.38     32.00     25.10
10:31:00 AM  dev104-3      8.00      0.00    224.00     28.00
0.82    102.00     47.12     37.70
10:31:01 AM  dev104-3     20.00      0.00    184.00      9.20
0.24     11.80      1.10      2.20
10:31:02 AM  dev104-3      4.95      0.00     39.60      8.00
0.23     46.00     13.00      6.44
10:31:03 AM  dev104-3      0.00      0.00      0.00      0.00
0.00      0.00      0.00      0.00

that was from a simple dd, not random writes. (since it is in
production, I can't really do the random write test as easily)

theoretically, a nice rotation of disks would remove that problem.
annoying, but it is the price you need to pay

--
Jeff Trout <jeff@jefftrout.com>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/




Re: Intel SSDs that may not suck

From
"Strange, John W"
Date:
This can be resolved by partitioning the disk with a larger write spare area so that the cells don't have to by
recycledso often. There is a lot of "misinformation" about SSD's, there are some great articles on anandtech that
reallyexplain how the technology works and some of the differences between the controllers as well.  If you do the
readingyou can find a solution that will work for you, SSD's are probably one of the best technologies to come along
forus in a long time that gives us such a performance jump in the IO world.  We have gone from completely IO bound to
CPUbound, it's really worth spending the time to investigate and understand how this can impact your system.
 

http://www.anandtech.com/show/2614
http://www.anandtech.com/show/2738
http://www.anandtech.com/show/4244/intel-ssd-320-review
http://www.anandtech.com/tag/storage
http://www.anandtech.com/show/3849/micron-announces-realssd-p300-slc-ssd-for-enterprise


-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Jeff
Sent: Tuesday, March 29, 2011 9:33 AM
To: Jeff
Cc: Merlin Moncure; Andy; pgsql-performance@postgresql.org; Greg Smith; Brian Ristuccia
Subject: Re: [PERFORM] Intel SSDs that may not suck


On Mar 29, 2011, at 10:16 AM, Jeff wrote:

> Now that all sounds awful and horrible until you get to overall 
> performance, especially with reads - you are looking at 20k random 
> reads per second with a few disks.  Adding in writes does kick it down 
> a noch, but you're still looking at 10k+ iops. That is the current 
> trade off.
>

We've been doing a burn in for about 4 days now on an array of 8 x25m's behind a p812 controller: here's a sample of
whatit is currently doing (I have 10 threads randomly seeking, reading, and 10% of the time writing (then fsync'ing)
out,using my pgiosim tool which I need to update on pgfoundry)
 

10:25:24 AM  dev104-2   7652.21 109734.51  12375.22     15.96       
8.22      1.07      0.12     88.32
10:25:25 AM  dev104-2   7318.52 104948.15  11696.30     15.94       
8.62      1.17      0.13     92.50
10:25:26 AM  dev104-2   7871.56 112572.48  13034.86     15.96       
8.60      1.09      0.12     91.38
10:25:27 AM  dev104-2   7869.72 111955.96  13592.66     15.95       
8.65      1.10      0.12     91.65
10:25:28 AM  dev104-2   7859.41 111920.79  13560.40     15.97       
9.32      1.19      0.13     98.91
10:25:29 AM  dev104-2   7285.19 104133.33  12000.00     15.94       
8.08      1.11      0.13     92.59
10:25:30 AM  dev104-2   8017.27 114581.82  13250.91     15.94       
8.48      1.06      0.11     90.36
10:25:31 AM  dev104-2   8392.45 120030.19  13924.53     15.96       
8.90      1.06      0.11     94.34
10:25:32 AM  dev104-2  10173.86 145836.36  16409.09     15.95      
10.72      1.05      0.11    113.52
10:25:33 AM  dev104-2   7007.14 100107.94  11688.89     15.95       
7.39      1.06      0.11     79.29
10:25:34 AM  dev104-2   8043.27 115076.92  13192.31     15.95       
9.09      1.13      0.12     96.15
10:25:35 AM  dev104-2   7409.09 104290.91  13774.55     15.94       
8.62      1.16      0.12     90.55

the 2nd to last column is svctime. first column after dev104-2 is TPS.  if I kill the writes off, tps rises quite a
bit:
10:26:34 AM  dev104-2  22659.41 361528.71      0.00     15.95      
10.57      0.42      0.04     99.01
10:26:35 AM  dev104-2  22479.41 359184.31      7.84     15.98       
9.61      0.52      0.04     98.04
10:26:36 AM  dev104-2  21734.29 347230.48      0.00     15.98       
9.30      0.43      0.04     95.33
10:26:37 AM  dev104-2  21551.46 344023.30    116.50     15.97       
9.56      0.44      0.05     97.09
10:26:38 AM  dev104-2  21964.42 350592.31      0.00     15.96      
10.25      0.42      0.04     96.15
10:26:39 AM  dev104-2  22512.75 359294.12      7.84     15.96      
10.23      0.50      0.04     98.04
10:26:40 AM  dev104-2  22373.53 357725.49      0.00     15.99       
9.52      0.43      0.04     98.04
10:26:41 AM  dev104-2  21436.79 342596.23      0.00     15.98       
9.17      0.43      0.04     94.34
10:26:42 AM  dev104-2  22525.49 359749.02     39.22     15.97      
10.18      0.45      0.04     98.04


now to demonstrate "write stalls" on the problemtic box:
10:30:49 AM  dev104-3      0.00      0.00      0.00      0.00       
0.38      0.00      0.00     35.85
10:30:50 AM  dev104-3      3.03      8.08    258.59     88.00       
2.43    635.00    333.33    101.01
10:30:51 AM  dev104-3      4.00      0.00    128.00     32.00       
0.67    391.75     92.75     37.10
10:30:52 AM  dev104-3     10.89      0.00     95.05      8.73       
1.45    133.55     12.27     13.37
10:30:53 AM  dev104-3      0.00      0.00      0.00      0.00       
0.00      0.00      0.00      0.00
10:30:54 AM  dev104-3    155.00      0.00   1488.00      9.60      
10.88     70.23      2.92     45.20
10:30:55 AM  dev104-3     10.00      0.00    536.00     53.60       
1.66    100.20     45.80     45.80
10:30:56 AM  dev104-3     46.53      0.00    411.88      8.85       
3.01     78.51      4.30     20.00
10:30:57 AM  dev104-3     11.00      0.00     96.00      8.73       
0.79     72.91     27.00     29.70
10:30:58 AM  dev104-3     12.00      0.00     96.00      8.00       
0.79     65.42     11.17     13.40
10:30:59 AM  dev104-3      7.84      7.84     62.75      9.00       
0.67     85.38     32.00     25.10
10:31:00 AM  dev104-3      8.00      0.00    224.00     28.00       
0.82    102.00     47.12     37.70
10:31:01 AM  dev104-3     20.00      0.00    184.00      9.20       
0.24     11.80      1.10      2.20
10:31:02 AM  dev104-3      4.95      0.00     39.60      8.00       
0.23     46.00     13.00      6.44
10:31:03 AM  dev104-3      0.00      0.00      0.00      0.00       
0.00      0.00      0.00      0.00

that was from a simple dd, not random writes. (since it is in production, I can't really do the random write test as
easily)

theoretically, a nice rotation of disks would remove that problem.  
annoying, but it is the price you need to pay

--
Jeff Trout <jeff@jefftrout.com>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
This communication is for informational purposes only. It is not
intended as an offer or solicitation for the purchase or sale of
any financial instrument or as an official confirmation of any
transaction. All market prices, data and other information are not
warranted as to completeness or accuracy and are subject to change
without notice. Any comments or statements made herein do not
necessarily reflect those of JPMorgan Chase & Co., its subsidiaries
and affiliates.

This transmission may contain information that is privileged,
confidential, legally privileged, and/or exempt from disclosure
under applicable law. If you are not the intended recipient, you
are hereby notified that any disclosure, copying, distribution, or
use of the information contained herein (including any reliance
thereon) is STRICTLY PROHIBITED. Although this transmission and any
attachments are believed to be free of any virus or other defect
that might affect any computer system into which it is received and
opened, it is the responsibility of the recipient to ensure that it
is virus free and no responsibility is accepted by JPMorgan Chase &
Co., its subsidiaries and affiliates, as applicable, for any loss
or damage arising in any way from its use. If you received this
transmission in error, please immediately contact the sender and
destroy the material in its entirety, whether in electronic or hard
copy format. Thank you.

Please refer to http://www.jpmorgan.com/pages/disclosures for
disclosures relating to European legal entities.

Re: Intel SSDs that may not suck

From
Jesper Krogh
Date:
On 2011-03-29 16:16, Jeff wrote:
> halt for 0.5-2 seconds, then resume. The fix we're going to do is
> replace each drive in order with the rebuild occuring between each.
> Then we do a security erase to reset the drive back to completely
> empty (including the "spare" blocks kept around for writes).

Are you replacing the drives with new once, or just secure-erase and back in?
What kind of numbers are you drawing out of smartmontools in usage figures?
(Also seeing some write-stalls here, on 24 Raid50 volumes of x25m's, and
have been planning to cycle drives for quite some time, without actually
getting to it.

> Now that all sounds awful and horrible until you get to overall
> performance, especially with reads - you are looking at 20k random
> reads per second with a few disks. Adding in writes does kick it
> down a noch, but you're still looking at 10k+ iops. That is the
> current trade off.

Thats also my experience.
--
Jesper

Re: Intel SSDs that may not suck

From
Jeff
Date:
On Mar 29, 2011, at 12:12 PM, Jesper Krogh wrote:

>
> Are you replacing the drives with new once, or just secure-erase and
> back in?
> What kind of numbers are you drawing out of smartmontools in usage
> figures?
> (Also seeing some write-stalls here, on 24 Raid50 volumes of x25m's,
> and
> have been planning to cycle drives for quite some time, without
> actually
> getting to it.
>

we have some new drives that we are going to use initially, but
eventually it'll be a secure-erase'd one we replace it with (which
should perform identical to a new one)

What enclosure & controller are you using on the 24 disk beast?

--
Jeff Trout <jeff@jefftrout.com>
http://www.stuarthamm.net/
http://www.dellsmartexitin.com/




Re: Intel SSDs that may not suck

From
Date:
Both the X25-M and the parts that AnandTech reviews (and a pretty thorough one they do) are, on a good day, prosumer.
Gettingreview material for truly Enterprise parts, the kind that STEC, Violin, and Texas Memory will spend a year to
getqualified at HP or IBM or Oracle is really hard to come by. 

Zsolt does keep track of what's going on in the space, although he doesn't test himself, that I've seen.  Still, a
usefulsite to visit on occasion: 

http://www.storagesearch.com/

regards

---- Original message ----
>Date: Tue, 29 Mar 2011 11:32:16 -0400
>From: pgsql-performance-owner@postgresql.org (on behalf of "Strange, John W" <john.w.strange@jpmchase.com>)
>Subject: Re: [PERFORM] Intel SSDs that may not suck
>To: Jeff <threshar@torgo.dyndns-server.com>
>Cc: Merlin Moncure <mmoncure@gmail.com>,Andy <angelflow@yahoo.com>,"pgsql-performance@postgresql.org"
<pgsql-performance@postgresql.org>,GregSmith <greg@2ndquadrant.com>,Brian Ristuccia <brian@ristuccia.com> 
>
>This can be resolved by partitioning the disk with a larger write spare area so that the cells don't have to by
recycledso often. There is a lot of "misinformation" about SSD's, there are some great articles on anandtech that
reallyexplain how the technology works and some of the differences between the controllers as well.  If you do the
readingyou can find a solution that will work for you, SSD's are probably one of the best technologies to come along
forus in a long time that gives us such a performance jump in the IO world.  We have gone from completely IO bound to
CPUbound, it's really worth spending the time to investigate and understand how this can impact your system. 

>

>http://www.anandtech.com/show/2614

>http://www.anandtech.com/show/2738

>http://www.anandtech.com/show/4244/intel-ssd-320-review

>http://www.anandtech.com/tag/storage

>http://www.anandtech.com/show/3849/micron-announces-realssd-p300-slc-ssd-for-enterprise

>

>

>-----Original Message-----

>From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Jeff

>Sent: Tuesday, March 29, 2011 9:33 AM

>To: Jeff

>Cc: Merlin Moncure; Andy; pgsql-performance@postgresql.org; Greg Smith; Brian Ristuccia

>Subject: Re: [PERFORM] Intel SSDs that may not suck

>

>

>On Mar 29, 2011, at 10:16 AM, Jeff wrote:

>

>> Now that all sounds awful and horrible until you get to overall

>> performance, especially with reads - you are looking at 20k random

>> reads per second with a few disks.  Adding in writes does kick it down

>> a noch, but you're still looking at 10k+ iops. That is the current

>> trade off.

>>

>

>We've been doing a burn in for about 4 days now on an array of 8 x25m's behind a p812 controller: here's a sample of
whatit is currently doing (I have 10 threads randomly seeking, reading, and 10% of the time writing (then fsync'ing)
out,using my pgiosim tool which I need to update on pgfoundry) 

>

>10:25:24 AM  dev104-2   7652.21 109734.51  12375.22     15.96

>8.22      1.07      0.12     88.32

>10:25:25 AM  dev104-2   7318.52 104948.15  11696.30     15.94

>8.62      1.17      0.13     92.50

>10:25:26 AM  dev104-2   7871.56 112572.48  13034.86     15.96

>8.60      1.09      0.12     91.38

>10:25:27 AM  dev104-2   7869.72 111955.96  13592.66     15.95

>8.65      1.10      0.12     91.65

>10:25:28 AM  dev104-2   7859.41 111920.79  13560.40     15.97

>9.32      1.19      0.13     98.91

>10:25:29 AM  dev104-2   7285.19 104133.33  12000.00     15.94

>8.08      1.11      0.13     92.59

>10:25:30 AM  dev104-2   8017.27 114581.82  13250.91     15.94

>8.48      1.06      0.11     90.36

>10:25:31 AM  dev104-2   8392.45 120030.19  13924.53     15.96

>8.90      1.06      0.11     94.34

>10:25:32 AM  dev104-2  10173.86 145836.36  16409.09     15.95

>10.72      1.05      0.11    113.52

>10:25:33 AM  dev104-2   7007.14 100107.94  11688.89     15.95

>7.39      1.06      0.11     79.29

>10:25:34 AM  dev104-2   8043.27 115076.92  13192.31     15.95

>9.09      1.13      0.12     96.15

>10:25:35 AM  dev104-2   7409.09 104290.91  13774.55     15.94

>8.62      1.16      0.12     90.55

>

>the 2nd to last column is svctime. first column after dev104-2 is TPS.  if I kill the writes off, tps rises quite a
bit:

>10:26:34 AM  dev104-2  22659.41 361528.71      0.00     15.95

>10.57      0.42      0.04     99.01

>10:26:35 AM  dev104-2  22479.41 359184.31      7.84     15.98

>9.61      0.52      0.04     98.04

>10:26:36 AM  dev104-2  21734.29 347230.48      0.00     15.98

>9.30      0.43      0.04     95.33

>10:26:37 AM  dev104-2  21551.46 344023.30    116.50     15.97

>9.56      0.44      0.05     97.09

>10:26:38 AM  dev104-2  21964.42 350592.31      0.00     15.96

>10.25      0.42      0.04     96.15

>10:26:39 AM  dev104-2  22512.75 359294.12      7.84     15.96

>10.23      0.50      0.04     98.04

>10:26:40 AM  dev104-2  22373.53 357725.49      0.00     15.99

>9.52      0.43      0.04     98.04

>10:26:41 AM  dev104-2  21436.79 342596.23      0.00     15.98

>9.17      0.43      0.04     94.34

>10:26:42 AM  dev104-2  22525.49 359749.02     39.22     15.97

>10.18      0.45      0.04     98.04

>

>

>now to demonstrate "write stalls" on the problemtic box:

>10:30:49 AM  dev104-3      0.00      0.00      0.00      0.00

>0.38      0.00      0.00     35.85

>10:30:50 AM  dev104-3      3.03      8.08    258.59     88.00

>2.43    635.00    333.33    101.01

>10:30:51 AM  dev104-3      4.00      0.00    128.00     32.00

>0.67    391.75     92.75     37.10

>10:30:52 AM  dev104-3     10.89      0.00     95.05      8.73

>1.45    133.55     12.27     13.37

>10:30:53 AM  dev104-3      0.00      0.00      0.00      0.00

>0.00      0.00      0.00      0.00

>10:30:54 AM  dev104-3    155.00      0.00   1488.00      9.60

>10.88     70.23      2.92     45.20

>10:30:55 AM  dev104-3     10.00      0.00    536.00     53.60

>1.66    100.20     45.80     45.80

>10:30:56 AM  dev104-3     46.53      0.00    411.88      8.85

>3.01     78.51      4.30     20.00

>10:30:57 AM  dev104-3     11.00      0.00     96.00      8.73

>0.79     72.91     27.00     29.70

>10:30:58 AM  dev104-3     12.00      0.00     96.00      8.00

>0.79     65.42     11.17     13.40

>10:30:59 AM  dev104-3      7.84      7.84     62.75      9.00

>0.67     85.38     32.00     25.10

>10:31:00 AM  dev104-3      8.00      0.00    224.00     28.00

>0.82    102.00     47.12     37.70

>10:31:01 AM  dev104-3     20.00      0.00    184.00      9.20

>0.24     11.80      1.10      2.20

>10:31:02 AM  dev104-3      4.95      0.00     39.60      8.00

>0.23     46.00     13.00      6.44

>10:31:03 AM  dev104-3      0.00      0.00      0.00      0.00

>0.00      0.00      0.00      0.00

>

>that was from a simple dd, not random writes. (since it is in production, I can't really do the random write test as
easily)

>

>theoretically, a nice rotation of disks would remove that problem.

>annoying, but it is the price you need to pay

>

>--

>Jeff Trout <jeff@jefftrout.com>

>http://www.stuarthamm.net/

>http://www.dellsmartexitin.com/

>

>

>

>

>--

>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)

>To make changes to your subscription:

>http://www.postgresql.org/mailpref/pgsql-performance

>This communication is for informational purposes only. It is not
>intended as an offer or solicitation for the purchase or sale of
>any financial instrument or as an official confirmation of any
>transaction. All market prices, data and other information are not
>warranted as to completeness or accuracy and are subject to change
>without notice. Any comments or statements made herein do not
>necessarily reflect those of JPMorgan Chase & Co., its subsidiaries
>and affiliates.

>

>This transmission may contain information that is privileged,
>confidential, legally privileged, and/or exempt from disclosure
>under applicable law. If you are not the intended recipient, you
>are hereby notified that any disclosure, copying, distribution, or
>use of the information contained herein (including any reliance
>thereon) is STRICTLY PROHIBITED. Although this transmission and any
>attachments are believed to be free of any virus or other defect
>that might affect any computer system into which it is received and
>opened, it is the responsibility of the recipient to ensure that it
>is virus free and no responsibility is accepted by JPMorgan Chase &
>Co., its subsidiaries and affiliates, as applicable, for any loss
>or damage arising in any way from its use. If you received this
>transmission in error, please immediately contact the sender and
>destroy the material in its entirety, whether in electronic or hard
>copy format. Thank you.

>

>Please refer to http://www.jpmorgan.com/pages/disclosures for
>disclosures relating to European legal entities.
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance

Re: Intel SSDs that may not suck

From
Greg Smith
Date:
On 03/29/2011 06:34 AM, Yeb Havinga wrote:
> While I appreciate the heads up about these new drives, your posting
> suggests (though you formulated in a way that you do not actually say
> it) that OCZ products do not have a long term reliability. No factual
> data. If you have knowledge of sandforce based OCZ drives fail, that'd
> be interesting because that's the product line what the new Intel SSD
> ought to be compared with.

I didn't want to say anything too strong until I got to the bottom of
the reports I'd been sorting through.  It turns out that there is a very
wide incompatibility between OCZ drives and some popular Gigabyte
motherboards:

http://www.ocztechnologyforum.com/forum/showthread.php?76177-do-you-own-a-Gigabyte-motherboard-and-have-the-SMART-error-with-FW1.11...look-inside

(I'm typing this message on a system with one of the impacted
combinations, one reason why I don't own a Vertex 2 Pro yet.  That I
would have to run a "Beta BIOS" does not inspire confidence.)

What happens on the models impacted is that you can't get SMART data
from the drive.  That means no monitoring for the sort of expected
failures we all know can happen with any drive.  So far that looks to be
at the bottom of all the anecdotal failure reports I'd found:  the
drives may have been throwing bad sectors or some other early failure,
and the owners had no idea because they thought SMART would warn
them--but it wasn't working at all.  Thus, don't find out there's a
problem until the drive just dies altogether one day.

More popular doesn't always mean more reliable, but for stuff like this
it helps.  Intel ships so many more drives than OCZ that I'd be shocked
if Gigabyte themselves didn't have reference samples of them for
testing.  This really looks like more of a warning about why you should
be particularly aggressive with checking SMART when running recently
introduced drives, which it sounds like you are already doing.

Reliability in this area is so strange...a diversion to older drives
gives an idea how annoyed I am about all this.  Last year, I gave up on
Western Digital's consumer drives (again).  Not because the failure
rates were bad, but because the one failure I did run into was so
terrible from a SMART perspective.  The drive just lied about the whole
problem so aggressively I couldn't manage the process.  I couldn't get
the drive to admit it had a problem such that it could turn into an RMA
candidate, despite failing every time I ran an aggressive SMART error
check.  It would reallocate a few sectors, say "good as new!", and then
fail at the next block when I re-tested.  Did that at least a dozen
times before throwing it in the "pathological drives" pile I keep around
for torture testing.

Meanwhile, the Seagate drives I switched back to are terrible, from a
failure percentage perspective.  I just had two start to go bad last
week, both halves of an array which is always fun.  But, the failure
started with very clearly labeled increases in reallocated sectors, and
the drive that eventually went really bad (making the bad noises) was
kicked back for RMA.  If you've got redundancy, I'll take components
that fail cleanly over ones that hide what's going on, even if the one
that fails cleanly is actually more likely to fail.  With a rebuild
always a drive swap away, having accurate data makes even a higher
failure rate manageable.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: Intel SSDs that may not suck

From
Jesper Krogh
Date:
On 2011-03-29 18:50, Jeff wrote:
>
> we have some new drives that we are going to use initially, but
> eventually it'll be a secure-erase'd one we replace it with (which
> should perform identical to a new one)
>
> What enclosure & controller are you using on the 24 disk beast?
>
LSI 8888ELP and a HP D2700 enclosure.

Works flawlessly, the only bad thing (which actually is pretty grave)
is that the controller mis-numbers the slots in the enclosure, so
you'll have to have the "mapping" drawn on paper next to the
enclosure to replace the correct disk.

--
Jesper

Re: Intel SSDs that may not suck

From
Greg Smith
Date:
On 03/28/2011 04:21 PM, Greg Smith wrote:
> Today is the launch of Intel's 3rd generation SSD line, the 320
> series.  And they've finally produced a cheap consumer product that
> may be useful for databases, too!  They've put 6 small capacitors onto
> the board and added logic to flush the write cache if the power drops.

I decided a while ago that I wasn't going to buy a personal SSD until I
could get one without a volatile write cache for less than what a
battery-backed caching controller costs.  That seemed the really
disruptive technology point for the sort of database use I worry about.
According to
http://www.newegg.com/Product/Product.aspx?Item=N82E16820167050 that
point was today, with the new 120GB drives now selling for $240.  UPS
willing, later this week I should have one of those here for testing.

A pair of those mirrored with software RAID-1 runs $480 for 120GB.  LSI
MegaRAID 9260-4i with 512MB cache is $330, ditto 3ware 9750-4i.  Battery
backup runs $135 to $180 depending on model; let's call it $150.  Decent
"enterprise" hard drive without RAID-incompatible firmware, $90 for
500GB, need two of them.  That's $660 total for 500GB of storage.

If you really don't need more than 120GB of storage, but do care about
random I/O speed, this is a pretty easy decision now--presuming the
drive holds up to claims.  As the claims are reasonable relative to the
engineering that went into the drive now, that may actually be the case.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: Intel SSDs that may not suck

From
Merlin Moncure
Date:
On Mon, Apr 4, 2011 at 8:26 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> On 03/28/2011 04:21 PM, Greg Smith wrote:
>>
>> Today is the launch of Intel's 3rd generation SSD line, the 320 series.
>>  And they've finally produced a cheap consumer product that may be useful
>> for databases, too!  They've put 6 small capacitors onto the board and added
>> logic to flush the write cache if the power drops.
>
> I decided a while ago that I wasn't going to buy a personal SSD until I
> could get one without a volatile write cache for less than what a
> battery-backed caching controller costs.  That seemed the really disruptive
> technology point for the sort of database use I worry about.  According to
> http://www.newegg.com/Product/Product.aspx?Item=N82E16820167050 that point
> was today, with the new 120GB drives now selling for $240.  UPS willing,
> later this week I should have one of those here for testing.
>
> A pair of those mirrored with software RAID-1 runs $480 for 120GB.  LSI
> MegaRAID 9260-4i with 512MB cache is $330, ditto 3ware 9750-4i.  Battery
> backup runs $135 to $180 depending on model; let's call it $150.  Decent
> "enterprise" hard drive without RAID-incompatible firmware, $90 for 500GB,
> need two of them.  That's $660 total for 500GB of storage.
>
> If you really don't need more than 120GB of storage, but do care about
> random I/O speed, this is a pretty easy decision now--presuming the drive
> holds up to claims.  As the claims are reasonable relative to the
> engineering that went into the drive now, that may actually be the case.

One thing about MLC flash drives (which the industry seems to be
moving towards) is that you have to factor drive lifespan into the
total system balance of costs. Data point: had an ocz vertex 2 that
burned out in ~ 18 months.  In the post mortem, it was determined that
the drive met and exceeded its 10k write limit -- this was a busy
production box.

merlin

Re: Intel SSDs that may not suck

From
Scott Carey
Date:
I have generation 1 and 2 Intel MLC drives in production (~150+).  Some
have been around for 2 years.

None have died.  None have hit the write cycle limit.  We do ~ 75GB of
writes a day.

The data and writes on these are not transactional (if one dies, we have
copies).  But the reliability has been excellent.  We had the performance
degradation issues in the G1's that required a firmware update, and have
had to do a secure-erase a on some to get write performance back to
acceptable levels on a few.

I could care less about the 'fast' sandforce drives.  They fail at a high
rate and the performance improvement is BECAUSE they are using a large,
volatile write cache.  If I need higher sequential transfer rate, I'll
RAID some of these together.  A RAID-10 of 6 of these will make a simple
select count(1) query be CPU bound anyway.

I have some G3 SSD's I'll be doing power-fail testing on soon for database
use (currently, we only use the old ones for indexes in databases or
unimportant clone db's).

I have had more raid cards fail in the last 3 years (out of a couple
dozen) than Intel SSD's fail (out of ~150).  I do not trust the Intel 510
series yet -- its based on a non-Intel controller and has worse
random-write performance anyway.



On 3/28/11 9:13 PM, "Merlin Moncure" <mmoncure@gmail.com> wrote:

>On Mon, Mar 28, 2011 at 7:54 PM, Andy <angelflow@yahoo.com> wrote:
>> This might be a bit too little too late though. As you mentioned there
>>really isn't any real performance improvement for the Intel SSD.
>>Meanwhile, SandForce (the controller that OCZ Vertex is based on) is
>>releasing its next generation controller at a reportedly huge
>>performance increase.
>>
>> Is there any benchmark measuring the performance of these SSD's (the
>>new Intel vs. the new SandForce) running database workloads? The
>>benchmarks I've seen so far are for desktop applications.
>
>The random performance data is usually a rough benchmark.  The
>sequential numbers are mostly useless and always have been.  The
>performance of either the ocz or intel drive is so disgustingly fast
>compared to a hard drives that the main stumbling block is life span
>and write endurance now that they are starting to get capactiors.
>
>My own experience with MLC drives is that write cycle expectations are
>more or less as advertised. They do go down (hard), and have to be
>monitored. If you are writing a lot of data this can get pretty
>expensive although the cost dynamics are getting better and better for
>flash. I have no idea what would be precisely prudent, but maybe some
>good monitoring tools and phased obsolescence at around 80% duty cycle
>might not be a bad starting point.  With hard drives, you can kinda
>wait for em to pop and swap em in -- this is NOT a good idea for flash
>raid volumes.
>
>merlin
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance


Re: Intel SSDs that may not suck

From
Andy
Date:
--- On Wed, 4/6/11, Scott Carey <scott@richrelevance.com> wrote:


> I could care less about the 'fast' sandforce drives. 
> They fail at a high
> rate and the performance improvement is BECAUSE they are
> using a large,
> volatile write cache. 

The G1 and G2 Intel MLC also use volatile write cache, just like most SandForce drives do.

Re: Intel SSDs that may not suck

From
Date:
Not for user data, only controller data.



---- Original message ----
>Date: Wed, 6 Apr 2011 14:11:10 -0700 (PDT)
>From: pgsql-performance-owner@postgresql.org (on behalf of Andy <angelflow@yahoo.com>)
>Subject: Re: [PERFORM] Intel SSDs that may not suck
>To: Merlin Moncure <mmoncure@gmail.com>,Scott Carey <scott@richrelevance.com>
>Cc: "pgsql-performance@postgresql.org" <pgsql-performance@postgresql.org>,Greg Smith <greg@2ndquadrant.com>
>
>
>--- On Wed, 4/6/11, Scott Carey <scott@richrelevance.com> wrote:
>
>
>> I could care less about the 'fast' sandforce drives. 
>> They fail at a high
>> rate and the performance improvement is BECAUSE they are
>> using a large,
>> volatile write cache. 
>
>The G1 and G2 Intel MLC also use volatile write cache, just like most SandForce drives do.
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance

Re: Intel SSDs that may not suck

From
Scott Carey
Date:

On 3/29/11 7:16 AM, "Jeff" <threshar@torgo.978.org> wrote:

>
>The write degradation could probably be monitored looking at svctime
>from sar. We may be implementing that in the near future to detect
>when this creeps up again.


For the X25-M's, overcommit.  Do a secure erase, then only partition and
use 85% or so of the drive (~7% is already hidden).  This helps a lot with
the write performance over time.  The Intel rep claimed that the new G3's
are much better at limiting the occasional write latency, by splitting
longer delays into slightly more frequent smaller delays.

Some of the benchmark reviews have histograms that demonstrate this
(although the authors of the review only note average latency or
throughput, the deviations have clearly gone down in this generation).

I'll know more for sure after some benchmarking myself.


>
>
>--
>Jeff Trout <jeff@jefftrout.com>
>http://www.stuarthamm.net/
>http://www.dellsmartexitin.com/
>
>
>
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance


Re: Intel SSDs that may not suck

From
Scott Carey
Date:

On 3/29/11 7:32 AM, "Jeff" <threshar@torgo.978.org> wrote:

>
>On Mar 29, 2011, at 10:16 AM, Jeff wrote:
>
>> Now that all sounds awful and horrible until you get to overall
>> performance, especially with reads - you are looking at 20k random
>> reads per second with a few disks.  Adding in writes does kick it
>> down a noch, but you're still looking at 10k+ iops. That is the
>> current trade off.
>>
>
>We've been doing a burn in for about 4 days now on an array of 8
>x25m's behind a p812 controller: here's a sample of what it is
>currently doing (I have 10 threads randomly seeking, reading, and 10%
>of the time writing (then fsync'ing) out, using my pgiosim tool which
>I need to update on pgfoundry)

Your RAID card is probably disabling the write cache on those.  If not, it
isn't power failure safe.

When the write cache is disabled, the negative effects of random writes on
longevity and performance are significantly amplified.

For the G3 drives, you can force the write caches on and remain power
failure safe.  This will significantly decrease the effects of the below.
You can also use a newer linux version with a file system that supports
TRIM/DISCARD which will help as long as your raid controller passes that
through.  It might end up that for many workloads with these drives, it is
faster to use software raid than hardware raid + raid controller.


>
>that was from a simple dd, not random writes. (since it is in
>production, I can't really do the random write test as easily)
>
>theoretically, a nice rotation of disks would remove that problem.
>annoying, but it is the price you need to pay
>
>--
>Jeff Trout <jeff@jefftrout.com>
>http://www.stuarthamm.net/
>http://www.dellsmartexitin.com/
>
>
>
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance


Re: Intel SSDs that may not suck

From
Scott Carey
Date:

On 4/6/11 2:11 PM, "Andy" <angelflow@yahoo.com> wrote:

>
>--- On Wed, 4/6/11, Scott Carey <scott@richrelevance.com> wrote:
>
>
>> I could care less about the 'fast' sandforce drives.
>> They fail at a high
>> rate and the performance improvement is BECAUSE they are
>> using a large,
>> volatile write cache.
>
>The G1 and G2 Intel MLC also use volatile write cache, just like most
>SandForce drives do.

1. People are complaining that the Intel G3's aren't as fast as the
SandForce drives (they are faster than the 1st gen SandForce, but not the
yet-to-be-released ones like Vertex 3).  From a database perspective, this
is complete BS.

2. 256K versus 64MB write cache.   Power + time to flush a cache matters.

3. None of the performance benchmarks of drives are comparing the
performance with the cache _disabled_ which is required when not power
safe.  If the SandForce drives are still that much faster with it
disabled, I'd be shocked.  Disabling a 256K write cache will affect
performance less than disabling a 64MB one.


Re: Intel SSDs that may not suck

From
Scott Carey
Date:

On 4/6/11 4:03 PM, "gnuoytr@rcn.com" <gnuoytr@rcn.com> wrote:

>Not for user data, only controller data.
>

False.  I used to think so, but there is volatile write cache for user
data -- its on the 256K chip SRAM not the DRAM though.

Simple power failure tests demonstrate that you lose data with these
drives unless you disable the cache.  Disabling the cache roughly drops
write performance by a factor of 3 to 4 on G1 drives and significantly
hurts wear-leveling and longevity (I haven't tried G2's).

>
>
>---- Original message ----
>>Date: Wed, 6 Apr 2011 14:11:10 -0700 (PDT)
>>From: pgsql-performance-owner@postgresql.org (on behalf of Andy
>><angelflow@yahoo.com>)
>>Subject: Re: [PERFORM] Intel SSDs that may not suck
>>To: Merlin Moncure <mmoncure@gmail.com>,Scott Carey
>><scott@richrelevance.com>
>>Cc: "pgsql-performance@postgresql.org"
>><pgsql-performance@postgresql.org>,Greg Smith <greg@2ndquadrant.com>
>>
>>
>>--- On Wed, 4/6/11, Scott Carey <scott@richrelevance.com> wrote:
>>
>>
>>> I could care less about the 'fast' sandforce drives.
>>> They fail at a high
>>> rate and the performance improvement is BECAUSE they are
>>> using a large,
>>> volatile write cache.
>>
>>The G1 and G2 Intel MLC also use volatile write cache, just like most
>>SandForce drives do.
>>
>>--
>>Sent via pgsql-performance mailing list
>>(pgsql-performance@postgresql.org)
>>To make changes to your subscription:
>>http://www.postgresql.org/mailpref/pgsql-performance
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance


Re: Intel SSDs that may not suck

From
Scott Carey
Date:
On 4/5/11 7:07 AM, "Merlin Moncure" <mmoncure@gmail.com> wrote:

>On Mon, Apr 4, 2011 at 8:26 PM, Greg Smith <greg@2ndquadrant.com> wrote:
>>
>> If you really don't need more than 120GB of storage, but do care about
>> random I/O speed, this is a pretty easy decision now--presuming the
>>drive
>> holds up to claims.  As the claims are reasonable relative to the
>> engineering that went into the drive now, that may actually be the case.
>
>One thing about MLC flash drives (which the industry seems to be
>moving towards) is that you have to factor drive lifespan into the
>total system balance of costs. Data point: had an ocz vertex 2 that
>burned out in ~ 18 months.  In the post mortem, it was determined that
>the drive met and exceeded its 10k write limit -- this was a busy
>production box.

What OCZ Drive?  What controller?  Indilinx? SandForce?  Wear-leveling on
these vary quite a bit.

Intel claims write lifetimes in the single digit PB sizes for these 310's.
 They are due to have an update to the X25-E line too at some point.
Public roadmaps say this will be using "enterprise" MLC.  This stuff
trades off write endurance for data longevity -- if left without power for
too long the data will be lost.  This is a tradeoff for all flash -- but
the stuff that is optimized for USB sticks is quite different than the
stuff optimized for servers.

>
>merlin
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance


Re: Intel SSDs that may not suck

From
David Rees
Date:
On Wed, Apr 6, 2011 at 5:42 PM, Scott Carey <scott@richrelevance.com> wrote:
> On 4/5/11 7:07 AM, "Merlin Moncure" <mmoncure@gmail.com> wrote:
>>One thing about MLC flash drives (which the industry seems to be
>>moving towards) is that you have to factor drive lifespan into the
>>total system balance of costs. Data point: had an ocz vertex 2 that
>>burned out in ~ 18 months.  In the post mortem, it was determined that
>>the drive met and exceeded its 10k write limit -- this was a busy
>>production box.
>
> What OCZ Drive?  What controller?  Indilinx? SandForce?  Wear-leveling on
> these vary quite a bit.

SandForce SF-1200

-Dave

Re: Intel SSDs that may not suck

From
Greg Smith
Date:
On 04/06/2011 08:22 PM, Scott Carey wrote:
> Simple power failure tests demonstrate that you lose data with these
> drives unless you disable the cache.  Disabling the cache roughly drops
> write performance by a factor of 3 to 4 on G1 drives and significantly
> hurts wear-leveling and longevity (I haven't tried G2's).
>

Yup.  I have a customer running a busy system with Intel X25-Es, and
another with X25-Ms, and every time there is a power failure at either
place their database gets corrupted.  That those drives are worthless
for a reliable database setup has been clear for two years now:
http://www.mysqlperformanceblog.com/2009/03/02/ssd-xfs-lvm-fsync-write-cache-barrier-and-lost-transactions/
and sometimes I even hear reports about those drives getting corrupted
even when the write cache is turned off.  If you aggressively replicate
the data to another location on a different power grid, you can survive
with Intel's older drives.  But odds are you're going to lose at least
some transactions no matter what you do, and the risk of "database won't
start" levels of corruption is always lingering.

The fact that Intel is making so much noise over the improved write
integrity features on the new drives gives you an idea how much these
problems have hurt their reputation in the enterprise storage space.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: Intel SSDs that may not suck

From
Greg Smith
Date:
Here's the new Intel 3rd generation 320 series drive:

$ sudo smartctl -i /dev/sdc
Device Model:     INTEL SSDSA2CW120G3
Firmware Version: 4PC10302
User Capacity:    120,034,123,776 bytes
ATA Version is:   8
ATA Standard is:  ATA-8-ACS revision 4

Since I have to go chant at the unbelievers next week (MySQL Con), don't
have time for a really thorough look here.  But I made a first pass
through my usual benchmarks without any surprises.

bonnie++ meets expectations with 253MB/s reads, 147MB/s writes, and 3935
seeks/second:

Version 1.03e       ------Sequential Output------ --Sequential Input-
--Random-
                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP
/sec %CP
toy          32144M           147180   7 77644   3           253893   5
3935  15

Using sysbench to generate a 100GB file and randomly seek around it
gives a similar figure:

Extra file open flags: 0
100 files, 1Gb each
100Gb total file size
Block size 8Kb
Number of random requests for random IO: 10000
Read/Write ratio for combined random IO test: 1.50
Using synchronous I/O mode
Doing random read test
Threads started!
Done.

Operations performed:  10000 reads, 0 writes, 0 Other = 10000 Total
Read 78.125Mb  Written 0b  Total transferred 78.125Mb  (26.698Mb/sec)
  3417.37 Requests/sec executed

So that's the basic range of performance:  up to 250MB/s on reads, but
potentially as low as 3400 IOPS = 27MB/s on really random workloads.  I
can make it do worse than that as you'll see in a minute.

At a database scale of 500, I can get 2357 TPS:

postgres@toy:~$ /usr/lib/postgresql/8.4/bin/pgbench -c 64 -T 300 pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 500
query mode: simple
number of clients: 64
duration: 300 s
number of transactions actually processed: 707793
tps = 2357.497195 (including connections establishing)
tps = 2357.943894 (excluding connections establishing)

This is basically the same performance as the 4-disk setup with 256MB
battery-backed write controller I profiled at
http://www.2ndquadrant.us/pgbench-results/index.htm ; there XFS got as
high as 2332 TPS, albeit with a PostgreSQL patched for better
performance than I used here.  This system has 16GB of RAM, so this is
exercising write speed only without needing to read anything from disk;
not too hard for regular drives to do.  Performance holds at a scale of
1000 however:

postgres@toy:~$ /usr/lib/postgresql/8.4/bin/pgbench -c 64 -T 300 -l pgbench
starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1000
query mode: simple
number of clients: 64
duration: 300 s
number of transactions actually processed: 586043
tps = 1953.006031 (including connections establishing)
tps = 1953.399065 (excluding connections establishing)

Whereas my regular drives are lucky to hit 350 TPS here.  So this is the
typical sweet spot for SSD:  workload is bigger than RAM, but not so
much bigger than RAM that reads & writes become completely random.

If I crank the scale way up, to 4000 = 58GB, now I'm solidly in
seek-bound behavior, which does about twice as fast as my regular drive
array here (that's around 200 TPS on this test):

postgres@toy:~$ /usr/lib/postgresql/8.4/bin/pgbench -T 1800 -c 64 -l pgbench
starting vacuum...end.

transaction type: TPC-B (sort of)
scaling factor: 4000
query mode: simple
number of clients: 64
duration: 1800 s
number of transactions actually processed: 731568
tps = 406.417254 (including connections establishing)
tps = 406.430713 (excluding connections establishing)

Here's a snapshot of typical drive activity when running this:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
            2.29    0.00    1.30   54.80    0.00   41.61

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s
avgrq-sz avgqu-sz   await  svctm  %util
sdc               0.00   676.67  443.63  884.00     7.90    12.25
31.09    41.77   31.45   0.75  99.93

So we're down to around 20MB/s, just as sysbench predicted a seek-bound
workload would be on these drives.

I can still see checkpoint spikes here where sync times go upward:

2011-04-06 20:40:58.969 EDT: LOG:  checkpoint complete: wrote 2959
buffers (9.0%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=147.300 s, sync=32.885 s, total=181.758 s

But the drive seems to never become unresponsive for longer than a second:

postgres@toy:~$ cat pgbench_log.4585 | cut -d" " -f 6 | sort -n | tail
999941
999952
999956
999959
999960
999970
999977
999984
999992
999994

Power-plug pull tests with diskchecker.pl and a write-heavy database
load didn't notice anything funny about the write cache:

[witness]
$ wget http://code.sixapart.com/svn/tools/trunk/diskchecker.pl
$ chmod +x ./diskchecker.pl
$ ./diskchecker.pl -l

[server with SSD]
$ wget http://code.sixapart.com/svn/tools/trunk/diskchecker.pl
$ chmod +x ./diskchecker.pl
$ diskchecker.pl -s grace create test_file 500

   diskchecker: running 20 sec, 69.67% coverage of 500 MB (38456 writes;
1922/s)
   diskchecker: running 21 sec, 71.59% coverage of 500 MB (40551 writes;
1931/s)
   diskchecker: running 22 sec, 73.52% coverage of 500 MB (42771 writes;
1944/s)
   diskchecker: running 23 sec, 75.17% coverage of 500 MB (44925 writes;
1953/s)
[pull plug]

/home/gsmith/diskchecker.pl -s grace verify test_file
  verifying: 0.00%
  verifying: 0.73%
  verifying: 7.83%
  verifying: 14.98%
  verifying: 22.10%
  verifying: 29.23%
  verifying: 36.39%
  verifying: 43.50%
  verifying: 50.65%
  verifying: 57.70%
  verifying: 64.81%
  verifying: 71.86%
  verifying: 79.02%
  verifying: 86.11%
  verifying: 93.15%
  verifying: 100.00%
Total errors: 0

2011-04-06 21:43:09.377 EDT: LOG:  database system was interrupted; last
known up at 2011-04-06 21:30:27 EDT
2011-04-06 21:43:09.392 EDT: LOG:  database system was not properly shut
down; automatic recovery in progress
2011-04-06 21:43:09.394 EDT: LOG:  redo starts at 6/BF7B2880
2011-04-06 21:43:10.687 EDT: LOG:  unexpected pageaddr 5/C2786000 in log
file 6, segment 205, offset 7888896
2011-04-06 21:43:10.687 EDT: LOG:  redo done at 6/CD784400
2011-04-06 21:43:10.687 EDT: LOG:  last completed transaction was at log
time 2011-04-06 21:39:00.551065-04
2011-04-06 21:43:10.705 EDT: LOG:  checkpoint starting: end-of-recovery
immediate
2011-04-06 21:43:14.766 EDT: LOG:  checkpoint complete: wrote 29915
buffers (91.3%); 0 transaction log file(s) added, 0 removed, 106
recycled; write=0.146 s, sync=3.904 s, total=4.078 s
2011-04-06 21:43:14.777 EDT: LOG:  database system is ready to accept
connections

So far, this drive is living up to expectations, without doing anything
unexpected good or bad.  When doing the things that SSD has the biggest
advantage over mechanical drives, it's more than 5X as fast as a 4-disk
array (3 disk DB + wal) with a BBWC.  But on really huge workloads,
where the worst-cast behavior of the drive is being hit, that falls to
closer to a 2X advantage.  And if you're doing work that isn't random
much at all, the drive only matches regular disk.

I like not having surprises in this sort of thing though.  Intel 320
series gets a preliminary thumbs-up from me.  I'll be happy when these
are mainstream enough that I can finally exit the anti-Intel SSD pulpit
I've been standing on the last two years.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: Intel SSDs that may not suck

From
David Boreham
Date:
Had to say a quick thanks to Greg and the others who have posted
detailed test results on SSDs here.
For those of us watching for the inflection point where we can begin the
transition from mechanical to solid state storage, this data and
experience is invaluable. Thanks for sharing it.

A short story while I'm posting : my Dad taught electronics engineering
and would often visit the local factories with groups of students. I
remember in particular after a visit to a disk drive manufacturer
(Burroughs), in 1977 he came home telling me that he'd asked the plant
manager what their plan was once solid state storage made their products
obsolete. The manager looked at him like he was form another planet...

So I've been waiting patiently 34 years for this hopefully
soon-to-arrive moment ;)



Re: Intel SSDs that may not suck

From
Date:
SSDs have been around for quite some time.  The first that I've found is Texas Memory.  Not quite 1977, but not flash
either,although they've been doing so for a couple of years.   

http://www.ramsan.com/company/history

---- Original message ----
>Date: Wed, 06 Apr 2011 20:56:16 -0600
>From: pgsql-performance-owner@postgresql.org (on behalf of David Boreham <david_list@boreham.org>)
>Subject: Re: [PERFORM] Intel SSDs that may not suck
>To: pgsql-performance@postgresql.org
>
>Had to say a quick thanks to Greg and the others who have posted
>detailed test results on SSDs here.
>For those of us watching for the inflection point where we can begin the
>transition from mechanical to solid state storage, this data and
>experience is invaluable. Thanks for sharing it.
>
>A short story while I'm posting : my Dad taught electronics engineering
>and would often visit the local factories with groups of students. I
>remember in particular after a visit to a disk drive manufacturer
>(Burroughs), in 1977 he came home telling me that he'd asked the plant
>manager what their plan was once solid state storage made their products
>obsolete. The manager looked at him like he was form another planet...
>
>So I've been waiting patiently 34 years for this hopefully
>soon-to-arrive moment ;)
>
>
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance

Re: Intel SSDs that may not suck

From
David Boreham
Date:
On 4/6/2011 9:19 PM, gnuoytr@rcn.com wrote:
> SSDs have been around for quite some time.  The first that I've found is Texas Memory.  Not quite 1977, but not flash
either,although they've been doing so for a couple of years. 
Well, I built my first ram disk (which of course I thought I had
invented, at the time) in 1982.
But today we're seeing solid state storage seriously challenging
rotating media across all applications, except at the TB and beyond
scale. That's what's new.



Re: Intel SSDs that may not suck

From
Jesper Krogh
Date:
On 2011-03-28 22:21, Greg Smith wrote:
> Some may still find these two cheap for enterprise use, given the use
> of MLC limits how much activity these drives can handle.  But it's
> great to have a new option for lower budget system that can tolerate
> some risk there.
>
Drifting of the topic slightly..  Has anyone opinions/experience with:
http://www.ocztechnology.com/ocz-z-drive-r2-p88-pci-express-ssd.html

They seem to be "like" the FusionIO drives just quite a lot cheaper,
wonder what the state of those 512MB is in case of a power-loss.


--
Jesper

Re: Intel SSDs that may not suck

From
Greg Smith
Date:
On 04/07/2011 12:27 AM, Jesper Krogh wrote:
> On 2011-03-28 22:21, Greg Smith wrote:
>> Some may still find these two cheap for enterprise use, given the use
>> of MLC limits how much activity these drives can handle.  But it's
>> great to have a new option for lower budget system that can tolerate
>> some risk there.
>>
> Drifting of the topic slightly..  Has anyone opinions/experience with:
> http://www.ocztechnology.com/ocz-z-drive-r2-p88-pci-express-ssd.html
>
> They seem to be "like" the FusionIO drives just quite a lot cheaper,
> wonder what the state of those 512MB is in case of a power-loss.

What I do is assume that if the vendor doesn't say outright how the
cache is preserved, that means it isn't, and the card is garbage for
database use.  That rule is rarely wrong.  The available soon Z-Drive R3
includes a Sandforce controller and supercap for preserving writes:
http://hothardware.com/News/OCZ-Unveils-RevoDrive-X3-Vertex-3-and-Other-SSD-Goodness/

Since they're bragging about it there, the safe bet is that the older R2
unit had no such facility.

I note that the Z-Drive R2 is basically some flash packed on top of an
LSI 1068e controller, mapped as a RAID0 volume.  It's possible they left
the battery-backup unit on that card exposed, so it may be possible to
do better with it.  The way they just stack those card layers together,
the thing is practically held together with duct tape though.  That's
not a confidence inspiring design to me.  The R3 drives are much more
cleanly integrated.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: Intel SSDs that may not suck

From
Scott Carey
Date:

On 4/6/11 10:48 PM, "Greg Smith" <greg@2ndQuadrant.com> wrote:
>Since they're bragging about it there, the safe bet is that the older R2
>unit had no such facility.
>
>I note that the Z-Drive R2 is basically some flash packed on top of an
>LSI 1068e controller, mapped as a RAID0 volume.

In Linux, you can expose it as a set of 4 JBOD drives, use software RAID
of any kind on that,
and have access to TRIM.  Still useless for (most) databases but may be
useful for other applications, if the reliability level is OK otherwise.

I wonder if the R3 will also be configurable as direct JBOD.


>It's possible they left
>the battery-backup unit on that card exposed, so it may be possible to
>do better with it.  The way they just stack those card layers together,
>the thing is practically held together with duct tape though.  That's
>not a confidence inspiring design to me.  The R3 drives are much more
>cleanly integrated.
>
>--
>Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
>PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
>"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books
>
>
>--
>Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-performance