Re: OT (slightly) testing for data loss on an SSD drive due to power failure - Mailing list pgsql-performance

From Greg Smith
Subject Re: OT (slightly) testing for data loss on an SSD drive due to power failure
Date
Msg-id 4DB23DE4.7000108@2ndQuadrant.com
Whole thread Raw
In response to OT (slightly) testing for data loss on an SSD drive due to power failure  (John Rouillard <rouilj@renesys.com>)
List pgsql-performance
On 04/22/2011 10:04 AM, John Rouillard wrote:
> We have a couple of ssd's 2 x 160GB Intel X25-M MLC SATA
> acting as the zil (write journal) and are trying to see if it is safe
> to use for a power fail situation.
>

Well, the quick answer is "no".  I've lost several weekends of my life
to recovering information from database stored on those drivers, after
they were corrupted in a crash.

> The testing method is to copy a bunch of files over NFS to the server
> with the zil. When the copy is running along, pull the power to the
> server. The NFS client will stop and if the client got a message that
> block X was written safely to the zil, it will continue writing with
> block x+1. After the server comes backup and and the copies
> resume/finish the files are checksummed. If block X went missing, the
> checksums will fail and we will have our proof.
>

Interestingly, you have reinvented parts of the standard script for
testing for data loss, diskchecker.pl:
http://brad.livejournal.com/2116715.html

You can get a few thousand commits per second using that program, which
is enough to fill the drive buffer such that a power pull should
sometimes lose something.  I don't think you can do a proper test here
using NFS; you really need something that is executing fsync calls
directly in the same pattern a database server will.

ZFS is more resilient than most filesystem as far as avoiding file
corruption in this case.  But you should still be able to find some
missing transactions that are sitting in the drive cache.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


pgsql-performance by date:

Previous
From: Claudio Freire
Date:
Subject: Re: oom_killer
Next
From: Henry
Date:
Subject: Re: not using partial index