Re: fsync reliability - Mailing list pgsql-hackers

From Tom Lane
Subject Re: fsync reliability
Date
Msg-id 24001.1303401355@sss.pgh.pa.us
Whole thread Raw
In response to fsync reliability  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: fsync reliability  (Robert Haas <robertmhaas@gmail.com>)
Re: fsync reliability  (Simon Riggs <simon@2ndQuadrant.com>)
Re: fsync reliability  (Greg Stark <gsstark@mit.edu>)
List pgsql-hackers
Simon Riggs <simon@2ndQuadrant.com> writes:
> Daniel Farina points out to me that the Linux man page for fsync() says
> "Calling fsync() does not necessarily ensure that the entry in the directory
>        containing the file has also reached disk.  For that an
> explicit fsync() on a
>        file descriptor for the directory is also needed."
> http://www.kernel.org/doc/man-pages/online/pages/man2/fsync.2.html

> This point appears to have been discussed before

Yes ...

> Tom said
> "We don't try to "fsync the
> directory" after a normal table create for instance"
> which is fine because we don't need to. In the event of a crash a
> missing table would be recreated during crash recovery.

Nonsense.  Once a checkpoint occurs after the WAL record that says to
create the table, we won't replay that action.  Or are you proposing
to have checkpoints run around and fsync every directory in the data
tree?

The traditional standard is that the filesystem is supposed to take
care of its own metadata, and even Linux filesystems have pretty much
figured that out.  I don't really see a need for us to be nursemaiding
the filesystem.  At most there's a documentation issue here, ie, we
ought to be more explicit about which filesystems and which mount
options we recommend.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: "stored procedures"
Next
From: Daniel Farina
Date:
Subject: Re: hot backups: am I doing it wrong, or do we have a problem with pg_clog?