Re: fsync reliability - Mailing list pgsql-hackers

From Daniel Farina
Subject Re: fsync reliability
Date
Msg-id BANLkTimahmfL+Hefeti_Do0Kv0CMh+dCiw@mail.gmail.com
Whole thread Raw
In response to fsync reliability  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On Thu, Apr 21, 2011 at 1:26 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Daniel Farina points out to me that the Linux man page for fsync() says
> "Calling fsync() does not necessarily ensure that the entry in the directory
>       containing the file has also reached disk.  For that an
> explicit fsync() on a
>       file descriptor for the directory is also needed."
> http://www.kernel.org/doc/man-pages/online/pages/man2/fsync.2.html

I'd also like to point out that even on ext(2|3) there is a special
option, 'dirsync', and directory attribute (see 'chattr') that exists,
mostly to the benefit of the authors of MTAs that use a lot of
metadata manipulation operations, to allow all directory metadata
mangling to be synchronous, to get around non-durable metadata
manipulations (even if you use fsync() a crash between the rename()
and the fsync() will leave you in either the pre-move or post-move
state: it is atomic, and non-durable, the synchronous directory
modification ensures that the return of rename() coincides with the
durability of the rename itself, or so I would think.

I only found this from doing some research about how perform a
two-phase commit between postgres and the file system and reading the
kernel source.  I admit, it's a dusty and obscure corner, but it still
seems in use by said MTAs.

Would a reading and exploration of the kernel code at hand perhaps
help resolve this discussion, one way or another?

--
fdr


pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Some TODO items for collations
Next
From: Greg Stark
Date:
Subject: Re: Unlogged tables, persistent kind