Re: Plug-pull testing worked, diskchecker.pl failed - Mailing list pgsql-general

From Greg Smith
Subject Re: Plug-pull testing worked, diskchecker.pl failed
Date
Msg-id 508B7089.6080107@2ndQuadrant.com
Whole thread Raw
In response to Re: Plug-pull testing worked, diskchecker.pl failed  (Chris Angelico <rosuav@gmail.com>)
Responses Re: Plug-pull testing worked, diskchecker.pl failed  (Chris Angelico <rosuav@gmail.com>)
List pgsql-general
On 10/24/12 4:04 PM, Chris Angelico wrote:

> Is this a useful and plausible testing methodology? It's definitely
> showed up some failures. On a hard-disk, all is well as long as the
> write-back cache is disabled; on the SSDs, I can't make them reliable.

On Linux systems, you can tell when Postgres is busy writing data out
during a checkpoint because the "Dirty:" amount will be dropping
rapidly.  At most other times, that number goes up.  You can try to
increase the odds of finding database level corruption during a pull the
plug test by trying to yank during that most sensitive moment.  Combine
a reasonable write-heavy test like you've devised with that
"optimization", and systems that don't write reliably will usually
corrupt within a few tries.

In general, through, diskchecker.pl is the more sensitive test.  If it
fails, storage is unreliable for PostgreSQL, period.   It's good that
you've followed up by confirming the real database corruption implied by
that is also visible.  In general, though, that's not needed.
Diskchecker says the drive is bad, you're done--don't put a database on
it.  Doing the database level tests is more for finding false positives:
  where diskchecker says the drive is OK, but perhaps there is a
filesystem problem that makes it unreliable, one that it doesn't test for.

What SSD are you using?  The Intel 320 and 710 series models are the
only SATA-connected drives still on the market I know of that pass a
serious test.  The other good models are direct PCI-E storage units,
like the FusionIO drives.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


pgsql-general by date:

Previous
From: "Xiong He"
Date:
Subject: Re: PostgresQL intallation error
Next
From: Chris Angelico
Date:
Subject: Re: Plug-pull testing worked, diskchecker.pl failed