Re: SSD + RAID - Mailing list pgsql-performance

From Bruce Momjian
Subject Re: SSD + RAID
Date
Msg-id 201002270140.o1R1eIm24724@momjian.us
Whole thread Raw
In response to Re: SSD + RAID  (Greg Smith <greg@2ndquadrant.com>)
Responses Re: SSD + RAID  (Greg Smith <greg@2ndquadrant.com>)
List pgsql-performance
I have added documentation about the ATAPI drive flush command, and the
typical SSD behavior.

---------------------------------------------------------------------------

Greg Smith wrote:
> Ron Mayer wrote:
> > Bruce Momjian wrote:
> >
> >> Agreed, thought I thought the problem was that SSDs lie about their
> >> cache flush like SATA drives do, or is there something I am missing?
> >>
> >
> > There's exactly one case I can find[1] where this century's IDE
> > drives lied more than any other drive with a cache:
>
> Ron is correct that the problem of mainstream SATA drives accepting the
> cache flush command but not actually doing anything with it is long gone
> at this point.  If you have a regular SATA drive, it almost certainly
> supports proper cache flushing.  And if your whole software/storage
> stacks understands all that, you should not end up with corrupted data
> just because there's a volative write cache in there.
>
> But the point of this whole testing exercise coming back into vogue
> again is that SSDs have returned this negligent behavior to the
> mainstream again.  See
> http://opensolaris.org/jive/thread.jspa?threadID=121424 for a discussion
> of this in a ZFS context just last month.  There are many documented
> cases of Intel SSDs that will fake a cache flush, such that the only way
> to get good reliable writes is to totally disable their writes
> caches--at which point performance is so bad you might as well have
> gotten a RAID10 setup instead (and longevity is toast too).
>
> This whole area remains a disaster area and extreme distrust of all the
> SSD storage vendors is advisable at this point.  Basically, if I don't
> see the capacitor responsible for flushing outstanding writes, and get a
> clear description from the manufacturer how the cached writes are going
> to be handled in the event of a power failure, at this point I have to
> assume the answer is "badly and your data will be eaten".  And the
> prices for SSDs that meet that requirement are still quite steep.  I
> keep hoping somebody will address this market at something lower than
> the standard "enterprise" prices.  The upcoming SandForce designs seem
> to have thought this through correctly:
> http://www.anandtech.com/storage/showdoc.aspx?i=3702&p=6  But the
> product's not out to the general public yet (just like the Seagate units
> that claim to have capacitor backups--I heard a rumor those are also
> Sandforce designs actually, so they may be the only ones doing this
> right and aiming at a lower price).
>
> --
> Greg Smith  2ndQuadrant US  Baltimore, MD
> PostgreSQL Training, Services and Support
> greg@2ndQuadrant.com   www.2ndQuadrant.us
>

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com
  PG East:  http://www.enterprisedb.com/community/nav-pg-east-2010.do
  + If your life is a hard drive, Christ can be your backup. +
Index: doc/src/sgml/wal.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/wal.sgml,v
retrieving revision 1.62
diff -c -c -r1.62 wal.sgml
*** doc/src/sgml/wal.sgml    20 Feb 2010 18:28:37 -0000    1.62
--- doc/src/sgml/wal.sgml    27 Feb 2010 01:37:03 -0000
***************
*** 59,66 ****
     same concerns about data loss exist for write-back drive caches as
     exist for disk controller caches.  Consumer-grade IDE and SATA drives are
     particularly likely to have write-back caches that will not survive a
!    power failure.  Many solid-state drives also have volatile write-back
!    caches.  To check write caching on <productname>Linux</> use
     <command>hdparm -I</>;  it is enabled if there is a <literal>*</> next
     to <literal>Write cache</>; <command>hdparm -W</> to turn off
     write caching.  On <productname>FreeBSD</> use
--- 59,69 ----
     same concerns about data loss exist for write-back drive caches as
     exist for disk controller caches.  Consumer-grade IDE and SATA drives are
     particularly likely to have write-back caches that will not survive a
!    power failure, though <acronym>ATAPI-6</> introduced a drive cache
!    flush command that some file systems use, e.g. <acronym>ZFS</>.
!    Many solid-state drives also have volatile write-back
!    caches, and many do not honor cache flush commands by default.
!    To check write caching on <productname>Linux</> use
     <command>hdparm -I</>;  it is enabled if there is a <literal>*</> next
     to <literal>Write cache</>; <command>hdparm -W</> to turn off
     write caching.  On <productname>FreeBSD</> use

pgsql-performance by date:

Previous
From: Tory M Blue
Date:
Subject: Re: bgwriter, checkpoints, curious (seeing delays)
Next
From: Greg Smith
Date:
Subject: Re: SSD + RAID