Re: 8.3.5 broken after power fail SOLVED - Mailing list pgsql-admin

From Scott Marlowe
Subject Re: 8.3.5 broken after power fail SOLVED
Date
Msg-id dcc563d10902211843k54b38175m615b9d83d30927f4@mail.gmail.com
Whole thread Raw
In response to Re: 8.3.5 broken after power fail SOLVED  (Ron Mayer <rm_pg@cheapcomplexdevices.com>)
List pgsql-admin
On Sat, Feb 21, 2009 at 3:41 PM, Ron Mayer
<rm_pg@cheapcomplexdevices.com> wrote:
> Naomi Walker wrote:
>> Other than disaster tests, how would I know if I have an system that
>> lies about fsync?
>
> Well, the linux kernel tries to detect it on bootup and
> will give messages like this:
>  %dmesg | grep 'disabling barriers'
>  JBD: barrier-based sync failed on md1 - disabling barriers
>  JBD: barrier-based sync failed on hda3 - disabling barriers
> when it detects certain types of unreliable fsync's. The command
>  %hdparm -I /dev/hdf | grep FLUSH_CACHE_EXT
> will give you clues if a hard drive itself even can support
> a non-lying fsync when it's internal cache is enabled.
>
>
> Sadly some filesystems (ext3) lie even above and beyond what
> Linux does - by only using the write barriers correctly
> when the inode itself is modified; not when the data is modified.
> A test program here:
> http://archives.postgresql.org/pgsql-performance/2008-08/msg00159.php
> can detect those cases where the kernel & drive don't lie
> about fsync but ext3 lies in spite of them; with more background
> info here:
>  http://article.gmane.org/gmane.linux.file-systems/21373
>  http://thread.gmane.org/gmane.linux.kernel/646040
>
>
> Elsewhere in the archives you can find programs that measure
> how fast fsyncs happen - but on your hardware, and you can
> try to see if those numbers approximately match how fast your
> disks spin.  But then you still need to make sure the test
> program used the same methods for syncing the drive that your
> postgres configuration files are choosing.
>
> I wonder if the only really safe way is to run a very
> write intensive database script and pull and kill your
> system in a number of ways, including yanking power to
> the system; to disk arrays, etc and see if your database died.

Well, you can't prove it's 100% safe but you can usually find most of
the not safe systems this way.  I usually setup a big pgbench db, run
500 or so concurrent, wait 5 or 10 minutes, run a checkpoint, and
halfway through it pull the plug.  It's still possible for a system to
fail after passing this test, but I feel a lot better knowing I've
done it a couple of times and the db came back up without problems.

pgsql-admin by date:

Previous
From: Ron Mayer
Date:
Subject: Re: 8.3.5 broken after power fail SOLVED
Next
From: "Tena Sakai"
Date:
Subject: trouble restoring data from postgres 8.3.3 to freshly installed 8.3.6