Re: sync() - Mailing list pgsql-hackers
From | Kevin Brown |
---|---|
Subject | Re: sync() |
Date | |
Msg-id | 20030113053102.GG20180@filer Whole thread Raw |
In response to | Re: sync() (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: sync()
|
List | pgsql-hackers |
Tom Lane wrote: > Tatsuo Ishii <t-ishii@sra.co.jp> writes: > > I'm just wondering why we do not use fsync() to flush data/index > > pages. > > There isn't any efficient way to do that AFAICS. The process that wants > to do the checkpoint hasn't got any way to know just which files need to > be sync'd. So the backends have to keep a common list of all the files they touch. Admittedly, that could be a problem if it means using a bunch of shared memory, and it may have additional performance implications depending on the implementation ... > Even if it did know, it's not clear to me that we can > portably assume that process A issuing an fsync on a file descriptor F > it's opened for file X will force to disk previous writes issued against > the same physical file X by a different process B using a different file > descriptor G. If the manpages are to be believed, then under FreeBSD, Linux, and HP-UX, calling fsync() will force to disk *all* unwritten buffers associated with the file pointed to by the filedescriptor. Sadly, however, the Solaris and IRIX manpages suggest that only buffers associated with the specific file descriptor itself are written, not necessarily all buffers associated with the file pointed at by the file descriptor (and interestingly, the Solaris version appears to be implemented as a library function and not a system call, if the manpage's section is any indication). > sync() is surely overkill, in that it writes out dirty kernel buffers > that might have nothing at all to do with Postgres. But I don't see how > to do better. It's obvious to me that sync() can have some very significant performance implications on a system that is acting as more than just a database server. So it should probably be used only when there's no good alternative. So: this is probably one of those cases where it's important to distinguish between operating systems and use the sync() approach only when it's uncertain that fsync() will do the job. So FreeBSD (and probably all the other BSD derivatives) definitely should use fsync() since they have known-good implementations. Linux and HP-UX 11 (if the manpage's wording can be trusted. Not sure about earlier versions) should use fsync() as well. Solaris and IRIX should use sync() since their manpages indicate that only data associated with the filedescriptor will be written to disk. Under Linux (and perhaps HP-UX), it may be necessary to fsync() the directories leading to the file as well, so that the state of the filesystem on disk is consistent and safe in the event that the files in question are newly-created. Whether that's truly necessary or not appears to be filesystem-dependent. A quick perusal of the Linux source shows that ext2 appears to only sync the data and metadata associated with the inode of the specific file and not any parent directories, so it's probably a safe bet to fsync() any ancestor directories that matter as well as the file even if the system is running on top of a journalled filesystem. Since all the files in question probably reside in the same set of directories, the directory fsync()s can be deferred until the very end. -- Kevin Brown kevin@sysexperts.com
pgsql-hackers by date: