Thread: Plug-pull testing worked, diskchecker.pl failed

Plug-pull testing worked, diskchecker.pl failed

From
Chris Angelico
Date:
After reading the comments last week about SSDs, I did some testing of
the ones we have at work - each of my test-boxes (three with SSDs, one
with HDD) subjected to multiple stand-alone plug-pull tests, using
pgbench to provide load. So far, there've been no instances of
PostgreSQL data corruption, but diskchecker.pl reported huge numbers
of errors.

What exactly does this mean? Is Postgres doing something that
diskchecker isn't, and is thus safe? Could data corruption occur but
I've just never pulled the power out at the precise microsecond when
it would cause problems? Or is it that we would lose entire
transactions, but never experience corruption that the postmaster
can't repair?

Interestingly, disabling write-caching with 'hdparm -W 0 /dev/sda' (as
per the llivejournal blog[1]) reduced the SSD's error rates without
eliminating failures entirely, while on the HDD, there were no
problems at all with write caching off.

ChrisA


Re: Plug-pull testing worked, diskchecker.pl failed

From
Jeff Janes
Date:
On Mon, Oct 22, 2012 at 6:17 AM, Chris Angelico <rosuav@gmail.com> wrote:
> After reading the comments last week about SSDs, I did some testing of
> the ones we have at work - each of my test-boxes (three with SSDs, one
> with HDD) subjected to multiple stand-alone plug-pull tests, using
> pgbench to provide load. So far, there've been no instances of
> PostgreSQL data corruption, but diskchecker.pl reported huge numbers
> of errors.

What did you do to look for corruption?  That PosgreSQL succeeds at
going through crash-recovery and then starting up is not a good
indicator that there is no corruption.

Did you do something like compute the aggregates on pgbench_history
and compare those aggregates to the balances in the other 3 tables?

Cheers,

Jeff


Re: Plug-pull testing worked, diskchecker.pl failed

From
Chris Angelico
Date:
On Tue, Oct 23, 2012 at 6:26 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> What did you do to look for corruption?  That PosgreSQL succeeds at
> going through crash-recovery and then starting up is not a good
> indicator that there is no corruption.

I fired up Postgres and looked at the logs for any signs of failure.

> Did you do something like compute the aggregates on pgbench_history
> and compare those aggregates to the balances in the other 3 tables?

No, didn't do that. My next check will be done over the network
(similar to diskchecker), with a script that fires off requests, waits
for them to be confirmed committed, and then records a local copy, and
will check that local copy once the server's back up again. That'll
tell me if transactions are being lost.

I'm kinda feeling my way in the dark here. Will check out the
aggregates on pgbench_history when I get to work today; thanks for the
tip!

ChrisA


Re: Plug-pull testing worked, diskchecker.pl failed

From
Jeff Janes
Date:
On Mon, Oct 22, 2012 at 12:31 PM, Chris Angelico <rosuav@gmail.com> wrote:
> On Tue, Oct 23, 2012 at 6:26 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> What did you do to look for corruption?  That PosgreSQL succeeds at
>> going through crash-recovery and then starting up is not a good
>> indicator that there is no corruption.
>
> I fired up Postgres and looked at the logs for any signs of failure.
>
>> Did you do something like compute the aggregates on pgbench_history
>> and compare those aggregates to the balances in the other 3 tables?
>
> No, didn't do that. My next check will be done over the network
> (similar to diskchecker), with a script that fires off requests, waits
> for them to be confirmed committed, and then records a local copy, and
> will check that local copy once the server's back up again. That'll
> tell me if transactions are being lost.

If you like Perl, the count.pl from this message might be a useful
starting point:

http://archives.postgresql.org/pgsql-hackers/2012-02/msg01227.php

It was designed to check consistency after postmaster crashes, not OS
crashes, so the checker runs on the same host as postgres does.
Obviously for pull-the-plug test, you need run it on a different host;
so all the
DBI->connect(....)
calls need to be changed to do that.

> I'm kinda feeling my way in the dark here. Will check out the
> aggregates on pgbench_history when I get to work today; thanks for the
> tip!

Here's an example with pgbench_accounts, the other 2 should look analogous.

select aid, abalance, count(*) from (select aid,abalance from
pgbench_accounts union all select aid, sum(delta) from pgbench_history
group by aid) as foo group by aid, abalance having abalance!=0 and
count(*)!=2;

This should return zero rows.  Any other result indicates corruption.

pgbench truncates pgbench_history, but does not reset the balances to
zero on the other tables.  So if you want to run the test repeatedly,
you have to do pgbench -i between runs, or manually reset the balance
columns.

Cheers,

Jeff


Re: Plug-pull testing worked, diskchecker.pl failed

From
Scott Marlowe
Date:
On Mon, Oct 22, 2012 at 7:17 AM, Chris Angelico <rosuav@gmail.com> wrote:
> After reading the comments last week about SSDs, I did some testing of
> the ones we have at work - each of my test-boxes (three with SSDs, one
> with HDD) subjected to multiple stand-alone plug-pull tests, using
> pgbench to provide load. So far, there've been no instances of
> PostgreSQL data corruption, but diskchecker.pl reported huge numbers
> of errors.

Try starting pgbench, and then halfway through the timeout for a
checkpoint timeout issue a checkpoint and WHILE the checkpoint is
still running THEN pull the plug.

Then after bringing the server up (assuming pg starts up) see if
pg_dump generates any errors.


Re: Plug-pull testing worked, diskchecker.pl failed

From
Chris Angelico
Date:
On Tue, Oct 23, 2012 at 9:51 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Mon, Oct 22, 2012 at 7:17 AM, Chris Angelico <rosuav@gmail.com> wrote:
>> After reading the comments last week about SSDs, I did some testing of
>> the ones we have at work - each of my test-boxes (three with SSDs, one
>> with HDD) subjected to multiple stand-alone plug-pull tests, using
>> pgbench to provide load. So far, there've been no instances of
>> PostgreSQL data corruption, but diskchecker.pl reported huge numbers
>> of errors.
>
> Try starting pgbench, and then halfway through the timeout for a
> checkpoint timeout issue a checkpoint and WHILE the checkpoint is
> still running THEN pull the plug.
>
> Then after bringing the server up (assuming pg starts up) see if
> pg_dump generates any errors.

Thanks for the tip. I've been flat-out at work these past few days and
haven't gotten around to testing in the middle of a checkpoint, but I
have done something that might also be of interest. It's inspired by a
combination of diskchecker and pgbench; a harness that puts the
database under load and retains a record of what's been done.

In brief: Create a table with N (eg 100) rows, then spin as fast as
possible, incrementing a counter against one random row and also
incrementing the "Total" counter. When the database goes down, wait
for it to come up again; when it does, check against the local copy of
the counters and report any discrepancies.

The code's written in Pike, using the same database connection logic
that we use in our actual application (well, some of our code is C++
and some is PHP, so this corresponds to one part of our app), so this
is roughly representative of real usage.

It's about a page or two of code: http://pastebin.com/UNTj642Y

Currently, all the key parameters (database connection info (which has
been censored for the pastebin version), pool size, thread count, etc)
are just variables visible in the script, simpler than parsing
command-line arguments.

Is this a useful and plausible testing methodology? It's definitely
showed up some failures. On a hard-disk, all is well as long as the
write-back cache is disabled; on the SSDs, I can't make them reliable.

Is a single table enough to test for corruption with?

Chris Angelico


Re: Plug-pull testing worked, diskchecker.pl failed

From
Scott Marlowe
Date:
On Wed, Oct 24, 2012 at 8:04 AM, Chris Angelico <rosuav@gmail.com> wrote:
> On Tue, Oct 23, 2012 at 9:51 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
>> On Mon, Oct 22, 2012 at 7:17 AM, Chris Angelico <rosuav@gmail.com> wrote:
>>> After reading the comments last week about SSDs, I did some testing of
>>> the ones we have at work - each of my test-boxes (three with SSDs, one
>>> with HDD) subjected to multiple stand-alone plug-pull tests, using
>>> pgbench to provide load. So far, there've been no instances of
>>> PostgreSQL data corruption, but diskchecker.pl reported huge numbers
>>> of errors.
>>
>> Try starting pgbench, and then halfway through the timeout for a
>> checkpoint timeout issue a checkpoint and WHILE the checkpoint is
>> still running THEN pull the plug.
>>
>> Then after bringing the server up (assuming pg starts up) see if
>> pg_dump generates any errors.
>
> Thanks for the tip. I've been flat-out at work these past few days and
> haven't gotten around to testing in the middle of a checkpoint, but I
> have done something that might also be of interest. It's inspired by a
> combination of diskchecker and pgbench; a harness that puts the
> database under load and retains a record of what's been done.
>
> In brief: Create a table with N (eg 100) rows, then spin as fast as
> possible, incrementing a counter against one random row and also
> incrementing the "Total" counter. When the database goes down, wait
> for it to come up again; when it does, check against the local copy of
> the counters and report any discrepancies.
>
> The code's written in Pike, using the same database connection logic
> that we use in our actual application (well, some of our code is C++
> and some is PHP, so this corresponds to one part of our app), so this
> is roughly representative of real usage.
>
> It's about a page or two of code: http://pastebin.com/UNTj642Y

Very cool.  Nice little project.

> Currently, all the key parameters (database connection info (which has
> been censored for the pastebin version), pool size, thread count, etc)
> are just variables visible in the script, simpler than parsing
> command-line arguments.
>
> Is this a useful and plausible testing methodology? It's definitely
> showed up some failures. On a hard-disk, all is well as long as the
> write-back cache is disabled; on the SSDs, I can't make them reliable.

Yes it seems to be quite a good idea actually.

> Is a single table enough to test for corruption with?

If it fails, definitely, if it passes maybe.


Re: Plug-pull testing worked, diskchecker.pl failed

From
Greg Smith
Date:
On 10/24/12 4:04 PM, Chris Angelico wrote:

> Is this a useful and plausible testing methodology? It's definitely
> showed up some failures. On a hard-disk, all is well as long as the
> write-back cache is disabled; on the SSDs, I can't make them reliable.

On Linux systems, you can tell when Postgres is busy writing data out
during a checkpoint because the "Dirty:" amount will be dropping
rapidly.  At most other times, that number goes up.  You can try to
increase the odds of finding database level corruption during a pull the
plug test by trying to yank during that most sensitive moment.  Combine
a reasonable write-heavy test like you've devised with that
"optimization", and systems that don't write reliably will usually
corrupt within a few tries.

In general, through, diskchecker.pl is the more sensitive test.  If it
fails, storage is unreliable for PostgreSQL, period.   It's good that
you've followed up by confirming the real database corruption implied by
that is also visible.  In general, though, that's not needed.
Diskchecker says the drive is bad, you're done--don't put a database on
it.  Doing the database level tests is more for finding false positives:
  where diskchecker says the drive is OK, but perhaps there is a
filesystem problem that makes it unreliable, one that it doesn't test for.

What SSD are you using?  The Intel 320 and 710 series models are the
only SATA-connected drives still on the market I know of that pass a
serious test.  The other good models are direct PCI-E storage units,
like the FusionIO drives.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


Re: Plug-pull testing worked, diskchecker.pl failed

From
Chris Angelico
Date:
On Sat, Oct 27, 2012 at 4:26 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> In general, through, diskchecker.pl is the more sensitive test.  If it
> fails, storage is unreliable for PostgreSQL, period.   It's good that you've
> followed up by confirming the real database corruption implied by that is
> also visible.  In general, though, that's not needed. Diskchecker says the
> drive is bad, you're done--don't put a database on it.  Doing the database
> level tests is more for finding false positives:  where diskchecker says the
> drive is OK, but perhaps there is a filesystem problem that makes it
> unreliable, one that it doesn't test for.

Thanks. That's the conclusion we were coming to too, though all I've
seen is lost transactions and not any other form of damage.

> What SSD are you using?  The Intel 320 and 710 series models are the only
> SATA-connected drives still on the market I know of that pass a serious
> test.  The other good models are direct PCI-E storage units, like the
> FusionIO drives.

I don't have the specs to hand, but one of them is a Kingston drive.
Our local supplier is out of 320 series drives, so we were looking for
others; will check out the 710s. It's crazy that so few drives can
actually be trusted.

ChrisA


Re: Plug-pull testing worked, diskchecker.pl failed

From
Bruce Momjian
Date:
On Sat, Oct 27, 2012 at 05:41:02PM +1100, Chris Angelico wrote:
> On Sat, Oct 27, 2012 at 4:26 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> > In general, through, diskchecker.pl is the more sensitive test.  If it
> > fails, storage is unreliable for PostgreSQL, period.   It's good that you've
> > followed up by confirming the real database corruption implied by that is
> > also visible.  In general, though, that's not needed. Diskchecker says the
> > drive is bad, you're done--don't put a database on it.  Doing the database
> > level tests is more for finding false positives:  where diskchecker says the
> > drive is OK, but perhaps there is a filesystem problem that makes it
> > unreliable, one that it doesn't test for.
>
> Thanks. That's the conclusion we were coming to too, though all I've
> seen is lost transactions and not any other form of damage.
>
> > What SSD are you using?  The Intel 320 and 710 series models are the only
> > SATA-connected drives still on the market I know of that pass a serious
> > test.  The other good models are direct PCI-E storage units, like the
> > FusionIO drives.
>
> I don't have the specs to hand, but one of them is a Kingston drive.
> Our local supplier is out of 320 series drives, so we were looking for
> others; will check out the 710s. It's crazy that so few drives can
> actually be trusted.

Yes.  Welcome to our craziness!

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +


Re: Plug-pull testing worked, diskchecker.pl failed

From
Scott Marlowe
Date:
On Wed, Nov 7, 2012 at 11:59 AM, Bruce Momjian <bruce@momjian.us> wrote:
> On Sat, Oct 27, 2012 at 05:41:02PM +1100, Chris Angelico wrote:
>> On Sat, Oct 27, 2012 at 4:26 PM, Greg Smith <greg@2ndquadrant.com> wrote:
>> > In general, through, diskchecker.pl is the more sensitive test.  If it
>> > fails, storage is unreliable for PostgreSQL, period.   It's good that you've
>> > followed up by confirming the real database corruption implied by that is
>> > also visible.  In general, though, that's not needed. Diskchecker says the
>> > drive is bad, you're done--don't put a database on it.  Doing the database
>> > level tests is more for finding false positives:  where diskchecker says the
>> > drive is OK, but perhaps there is a filesystem problem that makes it
>> > unreliable, one that it doesn't test for.
>>
>> Thanks. That's the conclusion we were coming to too, though all I've
>> seen is lost transactions and not any other form of damage.
>>
>> > What SSD are you using?  The Intel 320 and 710 series models are the only
>> > SATA-connected drives still on the market I know of that pass a serious
>> > test.  The other good models are direct PCI-E storage units, like the
>> > FusionIO drives.
>>
>> I don't have the specs to hand, but one of them is a Kingston drive.
>> Our local supplier is out of 320 series drives, so we were looking for
>> others; will check out the 710s. It's crazy that so few drives can
>> actually be trusted.
>
> Yes.  Welcome to our craziness!

Is there a comprehensive list of drives that have been tested on the
wiki somewhere?  Our current choices seem to be the Intel 3xx series
which STILL suffer from the "whoops I'm now an 8MB drive" bug and the
very expensive SLC 7xx series Intel drives, the Hitachi Ultrastar
SSD400M, and the OCZ Vertex 2 Pro.  Any particular recommendations
from those or other series from anyone would be greatly appreciated.


Re: Plug-pull testing worked, diskchecker.pl failed

From
Bruce Momjian
Date:
On Wed, Nov  7, 2012 at 01:53:47PM -0700, Scott Marlowe wrote:
> On Wed, Nov 7, 2012 at 11:59 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > On Sat, Oct 27, 2012 at 05:41:02PM +1100, Chris Angelico wrote:
> >> On Sat, Oct 27, 2012 at 4:26 PM, Greg Smith <greg@2ndquadrant.com> wrote:
> >> > In general, through, diskchecker.pl is the more sensitive test.  If it
> >> > fails, storage is unreliable for PostgreSQL, period.   It's good that you've
> >> > followed up by confirming the real database corruption implied by that is
> >> > also visible.  In general, though, that's not needed. Diskchecker says the
> >> > drive is bad, you're done--don't put a database on it.  Doing the database
> >> > level tests is more for finding false positives:  where diskchecker says the
> >> > drive is OK, but perhaps there is a filesystem problem that makes it
> >> > unreliable, one that it doesn't test for.
> >>
> >> Thanks. That's the conclusion we were coming to too, though all I've
> >> seen is lost transactions and not any other form of damage.
> >>
> >> > What SSD are you using?  The Intel 320 and 710 series models are the only
> >> > SATA-connected drives still on the market I know of that pass a serious
> >> > test.  The other good models are direct PCI-E storage units, like the
> >> > FusionIO drives.
> >>
> >> I don't have the specs to hand, but one of them is a Kingston drive.
> >> Our local supplier is out of 320 series drives, so we were looking for
> >> others; will check out the 710s. It's crazy that so few drives can
> >> actually be trusted.
> >
> > Yes.  Welcome to our craziness!
>
> Is there a comprehensive list of drives that have been tested on the
> wiki somewhere?  Our current choices seem to be the Intel 3xx series
> which STILL suffer from the "whoops I'm now an 8MB drive" bug and the
> very expensive SLC 7xx series Intel drives, the Hitachi Ultrastar
> SSD400M, and the OCZ Vertex 2 Pro.  Any particular recommendations
> from those or other series from anyone would be greatly appreciated.

No, I know of no official list.  Greg Smith and I have tried to document
some of this on the wiki:

    http://wiki.postgresql.org/wiki/Reliable_Writes

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +


Re: Plug-pull testing worked, diskchecker.pl failed

From
Scott Marlowe
Date:
On Wed, Nov 7, 2012 at 2:01 PM, Bruce Momjian <bruce@momjian.us> wrote:
> On Wed, Nov  7, 2012 at 01:53:47PM -0700, Scott Marlowe wrote:
>> On Wed, Nov 7, 2012 at 11:59 AM, Bruce Momjian <bruce@momjian.us> wrote:
>> > On Sat, Oct 27, 2012 at 05:41:02PM +1100, Chris Angelico wrote:
>> >> On Sat, Oct 27, 2012 at 4:26 PM, Greg Smith <greg@2ndquadrant.com> wrote:
>> >> > In general, through, diskchecker.pl is the more sensitive test.  If it
>> >> > fails, storage is unreliable for PostgreSQL, period.   It's good that you've
>> >> > followed up by confirming the real database corruption implied by that is
>> >> > also visible.  In general, though, that's not needed. Diskchecker says the
>> >> > drive is bad, you're done--don't put a database on it.  Doing the database
>> >> > level tests is more for finding false positives:  where diskchecker says the
>> >> > drive is OK, but perhaps there is a filesystem problem that makes it
>> >> > unreliable, one that it doesn't test for.
>> >>
>> >> Thanks. That's the conclusion we were coming to too, though all I've
>> >> seen is lost transactions and not any other form of damage.
>> >>
>> >> > What SSD are you using?  The Intel 320 and 710 series models are the only
>> >> > SATA-connected drives still on the market I know of that pass a serious
>> >> > test.  The other good models are direct PCI-E storage units, like the
>> >> > FusionIO drives.
>> >>
>> >> I don't have the specs to hand, but one of them is a Kingston drive.
>> >> Our local supplier is out of 320 series drives, so we were looking for
>> >> others; will check out the 710s. It's crazy that so few drives can
>> >> actually be trusted.
>> >
>> > Yes.  Welcome to our craziness!
>>
>> Is there a comprehensive list of drives that have been tested on the
>> wiki somewhere?  Our current choices seem to be the Intel 3xx series
>> which STILL suffer from the "whoops I'm now an 8MB drive" bug and the
>> very expensive SLC 7xx series Intel drives, the Hitachi Ultrastar
>> SSD400M, and the OCZ Vertex 2 Pro.  Any particular recommendations
>> from those or other series from anyone would be greatly appreciated.
>
> No, I know of no official list.  Greg Smith and I have tried to document
> some of this on the wiki:
>
>         http://wiki.postgresql.org/wiki/Reliable_Writes

Well I may get a budget at work to do some testing so I'll update that
list etc.  This has been a good thread to get me motivated to get
started.


Re: Plug-pull testing worked, diskchecker.pl failed

From
Bruce Momjian
Date:
On Wed, Nov  7, 2012 at 02:12:39PM -0700, Scott Marlowe wrote:
> >> >> I don't have the specs to hand, but one of them is a Kingston drive.
> >> >> Our local supplier is out of 320 series drives, so we were looking for
> >> >> others; will check out the 710s. It's crazy that so few drives can
> >> >> actually be trusted.
> >> >
> >> > Yes.  Welcome to our craziness!
> >>
> >> Is there a comprehensive list of drives that have been tested on the
> >> wiki somewhere?  Our current choices seem to be the Intel 3xx series
> >> which STILL suffer from the "whoops I'm now an 8MB drive" bug and the
> >> very expensive SLC 7xx series Intel drives, the Hitachi Ultrastar
> >> SSD400M, and the OCZ Vertex 2 Pro.  Any particular recommendations
> >> from those or other series from anyone would be greatly appreciated.
> >
> > No, I know of no official list.  Greg Smith and I have tried to document
> > some of this on the wiki:
> >
> >         http://wiki.postgresql.org/wiki/Reliable_Writes
>
> Well I may get a budget at work to do some testing so I'll update that
> list etc.  This has been a good thread to get me motivated to get
> started.

Yes, it seems database people are the few who care about device sync
reliability (or know to care).

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +


Re: Plug-pull testing worked, diskchecker.pl failed

From
Vick Khera
Date:

On Wed, Nov 7, 2012 at 3:53 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
Is there a comprehensive list of drives that have been tested on the
wiki somewhere?  Our current choices seem to be the Intel 3xx series
which STILL suffer from the "whoops I'm now an 8MB drive" bug and the
very expensive SLC 7xx series Intel drives, the Hitachi Ultrastar
SSD400M, and the OCZ Vertex 2 Pro.  Any particular recommendations
from those or other series from anyone would be greatly appreciated.

My most recent big box(es) are built using all Intel 3xx series drives. Like you said, the 7xx series was way too expensive.  The 5xx series looks totally right on paper, until you find out they don't have a durable cache.  That just doesn't make sense in any universe... but that's the way they are.

They seem to be doing really well so far.  I connected them to LSI RAID controllers, with the Fastpath option.  I think they are pretty speedy.

On my general purpose boxes, I now spec the 3xx drives for boot (software RAID) and use other drives such as Seagate Constellation for data with ZFS. Sometimes I think that the ZFS volumes are faster than the SSD RAID volumes, but it is not a fair comparison because the RAID systems are CentOS 6 and the ZFS systems are FreeBSD 9.

Re: Plug-pull testing worked, diskchecker.pl failed

From
David Boreham
Date:
On 11/7/2012 3:17 PM, Vick Khera wrote:
> My most recent big box(es) are built using all Intel 3xx series
> drives. Like you said, the 7xx series was way too expensive.

I have to raise my hand to say that for us 710 series drives are an
unbelievable bargain and we buy nothing else now for production servers.
When you compare vs the setup you'd need to achieve the same tps using
rotating media, and especially considering the power and cooling saved,
they're really cheap. YMMV of course..