Home > mailing lists

Re: 8.3.5 broken after power fail SOLVED - Mailing list pgsql-admin

From	Scott Marlowe
Subject	Re: 8.3.5 broken after power fail SOLVED
Date	February 22, 2009 01:43:17
Msg-id	dcc563d10902211843k54b38175m615b9d83d30927f4@mail.gmail.com Whole thread Raw
In response to	Re: 8.3.5 broken after power fail SOLVED (Ron Mayer <rm_pg@cheapcomplexdevices.com>)
List	pgsql-admin

Tree view

On Sat, Feb 21, 2009 at 3:41 PM, Ron Mayer
<rm_pg@cheapcomplexdevices.com> wrote:
> Naomi Walker wrote:
>> Other than disaster tests, how would I know if I have an system that
>> lies about fsync?
>
> Well, the linux kernel tries to detect it on bootup and
> will give messages like this:
>  %dmesg | grep 'disabling barriers'
>  JBD: barrier-based sync failed on md1 - disabling barriers
>  JBD: barrier-based sync failed on hda3 - disabling barriers
> when it detects certain types of unreliable fsync's. The command
>  %hdparm -I /dev/hdf | grep FLUSH_CACHE_EXT
> will give you clues if a hard drive itself even can support
> a non-lying fsync when it's internal cache is enabled.
>
>
> Sadly some filesystems (ext3) lie even above and beyond what
> Linux does - by only using the write barriers correctly
> when the inode itself is modified; not when the data is modified.
> A test program here:
> http://archives.postgresql.org/pgsql-performance/2008-08/msg00159.php
> can detect those cases where the kernel & drive don't lie
> about fsync but ext3 lies in spite of them; with more background
> info here:
>  http://article.gmane.org/gmane.linux.file-systems/21373
>  http://thread.gmane.org/gmane.linux.kernel/646040
>
>
> Elsewhere in the archives you can find programs that measure
> how fast fsyncs happen - but on your hardware, and you can
> try to see if those numbers approximately match how fast your
> disks spin.  But then you still need to make sure the test
> program used the same methods for syncing the drive that your
> postgres configuration files are choosing.
>
> I wonder if the only really safe way is to run a very
> write intensive database script and pull and kill your
> system in a number of ways, including yanking power to
> the system; to disk arrays, etc and see if your database died.

Well, you can't prove it's 100% safe but you can usually find most of
the not safe systems this way.  I usually setup a big pgbench db, run
500 or so concurrent, wait 5 or 10 minutes, run a checkpoint, and
halfway through it pull the plug.  It's still possible for a system to
fail after passing this test, but I feel a lot better knowing I've
done it a couple of times and the db came back up without problems.

pgsql-admin by date:

From: Ron Mayer
Date: 21 February 2009, 21:42:00
Subject: Re: 8.3.5 broken after power fail SOLVED

From: "Tena Sakai"
Date: 22 February 2009, 07:45:45
Subject: trouble restoring data from postgres 8.3.3 to freshly installed 8.3.6

Re: 8.3.5 broken after power fail SOLVED - Mailing list pgsql-admin

Previous

Next