Re: initdb and fsync - Mailing list pgsql-hackers

From Andres Freund
Subject Re: initdb and fsync
Date
Msg-id 201206181805.29195.andres@2ndquadrant.com
Whole thread Raw
In response to Re: initdb and fsync  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: initdb and fsync
Re: initdb and fsync
List pgsql-hackers
On Wednesday, June 13, 2012 06:53:17 PM Jeff Davis wrote:
> On Wed, 2012-06-13 at 13:53 +0300, Peter Eisentraut wrote:
> > The --help output for the -N option was copy-and-pasted wrongly.
> > 
> > The message issued when using -N is also a bit content-free.  Maybe
> > something like
> > 
> > "Running in nosync mode.  The data directory might become corrupt if the
> > operating system crashes.\n"
> 
> Thank you, fixed.
> 
> > Which leads to the question, how does one get out of this state?  Is
> > running sync(1) enough?  Is starting the postgres server enough?
> 
> sync(1) calls sync(2), and the man page says:
> 
> "According to the standard specification  (e.g.,  POSIX.1-2001),  sync()
> schedules the writes, but may return before the actual writing is done.
> However, since version 1.3.20 Linux does actually  wait.   (This  still
> does not guarantee data integrity: modern disks have large caches.)"
> 
> So it looks like sync is enough if you are on linux *and* you have any
> unprotected write cache disabled.
Protection can include write barries, it doesn't need to be a BBU....

> I don't think starting the postgres server is enough.
Agreed.

> Before, I think we were safe because we could assume that the OS would
> flush the buffers before you had time to store any important data. But
> now, that window can be much larger.
I think to a large degree we didn't see any problem because of the old ext3 
"sync the whole world" type of behaviour which was common for a very long 
time.
So on ext3 (with data=ordered, the default) any fsync (checkpoints, commit, 
...) would lead to all the other files being synced as well which means 
starting the server once would have been enough on such a system...


Quick review:
- defaulting to initdb -N in the regression suite is not a good imo, because 
that way the buildfarm won't catch problems in that area...
- could the copydir.c and initdb.c versions of walkdir/sync_fname et al be 
unified?
- I personally would find it way nicer to put USE_PRE_SYNC into pre_sync_fname 
instead of cluttering the main function with it

Looks good otherwise!

Thanks,

Andres
-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services


pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: sortsupport for text
Next
From: Merlin Moncure
Date:
Subject: Re: pgsql_fdw in contrib