Re: sync() - Mailing list pgsql-hackers

From Kevin Brown
Subject Re: sync()
Date
Msg-id 20030113053102.GG20180@filer
Whole thread Raw
In response to Re: sync()  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: sync()
List pgsql-hackers
Tom Lane wrote:
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > I'm just wondering why we do not use fsync() to flush data/index
> > pages.
> 
> There isn't any efficient way to do that AFAICS.  The process that wants
> to do the checkpoint hasn't got any way to know just which files need to
> be sync'd.

So the backends have to keep a common list of all the files they
touch.  Admittedly, that could be a problem if it means using a bunch
of shared memory, and it may have additional performance implications
depending on the implementation ...

>  Even if it did know, it's not clear to me that we can
> portably assume that process A issuing an fsync on a file descriptor F
> it's opened for file X will force to disk previous writes issued against
> the same physical file X by a different process B using a different file
> descriptor G.

If the manpages are to be believed, then under FreeBSD, Linux, and
HP-UX, calling fsync() will force to disk *all* unwritten buffers
associated with the file pointed to by the filedescriptor.

Sadly, however, the Solaris and IRIX manpages suggest that only
buffers associated with the specific file descriptor itself are
written, not necessarily all buffers associated with the file pointed
at by the file descriptor (and interestingly, the Solaris version
appears to be implemented as a library function and not a system call,
if the manpage's section is any indication).

> sync() is surely overkill, in that it writes out dirty kernel buffers
> that might have nothing at all to do with Postgres.  But I don't see how
> to do better.

It's obvious to me that sync() can have some very significant
performance implications on a system that is acting as more than just
a database server.  So it should probably be used only when there's no
good alternative.

So: this is probably one of those cases where it's important to
distinguish between operating systems and use the sync() approach only
when it's uncertain that fsync() will do the job.  So FreeBSD (and
probably all the other BSD derivatives) definitely should use fsync()
since they have known-good implementations.  Linux and HP-UX 11 (if
the manpage's wording can be trusted.  Not sure about earlier
versions) should use fsync() as well.  Solaris and IRIX should use
sync() since their manpages indicate that only data associated with
the filedescriptor will be written to disk.

Under Linux (and perhaps HP-UX), it may be necessary to fsync() the
directories leading to the file as well, so that the state of the
filesystem on disk is consistent and safe in the event that the files
in question are newly-created.  Whether that's truly necessary or not
appears to be filesystem-dependent.  A quick perusal of the Linux
source shows that ext2 appears to only sync the data and metadata
associated with the inode of the specific file and not any parent
directories, so it's probably a safe bet to fsync() any ancestor
directories that matter as well as the file even if the system is
running on top of a journalled filesystem.  Since all the files in
question probably reside in the same set of directories, the directory
fsync()s can be deferred until the very end.


-- 
Kevin Brown                          kevin@sysexperts.com


pgsql-hackers by date:

Previous
From: "Christopher Kings-Lynne"
Date:
Subject: Re: pg_get_constraintdef
Next
From: Tom Lane
Date:
Subject: Re: sync()