Re: Experimental patch for inter-page delay in VACUUM - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Experimental patch for inter-page delay in VACUUM
Date
Msg-id 1969.1068044941@sss.pgh.pa.us
Whole thread Raw
In response to Re: Experimental patch for inter-page delay in VACUUM  (Greg Stark <gsstark@mit.edu>)
Responses Re: Experimental patch for inter-page delay in VACUUM  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
Greg Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> You want to find, open, and fsync() every file in the database cluster
>> for every checkpoint?  Sounds like a non-starter to me.

> Except a) this is outside any critical path, and b) only done every few
> minutes and c) the fsync calls on files with no dirty buffers ought to be
> cheap, at least as far as i/o.

The directory search and opening of the files is in itself nontrivial
overhead ... particularly on systems where open(2) isn't speedy, such
as Solaris.  I also disbelieve your assumption that fsync'ing a file
that doesn't need it will be free.  That depends entirely on what sort
of indexes the OS keeps on its buffer cache.  There are Unixen where
fsync requires a scan through the entire buffer cache because there is
no data structure that permits finding associated buffers any more
efficiently than that.  (IIRC, the HPUX system I'm typing this on is
like that.)  On those sorts of systems, we'd be way better off to use
O_SYNC or O_DSYNC on all our writes than to invoke multiple fsyncs.
Check the archives --- this was all gone into in great detail when we
were testing alternative methods for fsyncing the WAL files.

> So the NetBSD and Sun developers I checked with both asserted fsync does in
> fact guarantee this. And SUSv2 seems to back them up:

>     The fsync() function can be used by an application to indicate that all
>     data for the open file description named by fildes is to be transferred to
>     the storage device associated with the file described by fildes in an
>     implementation-dependent manner.

The question here is what is meant by "data for the open file
description".  If it said "all data for the file referenced by the open
FD" then I would agree that the spec says what you claim.  As is, I
think it would be entirely within the spec for the OS to dump only
buffers that had been dirtied through that particular FD.  Notice that
the last part of the sentence is careful to respect the distinction
between the FD and the file; why isn't the first part?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Reinhard Max
Date:
Subject: Re: Erroneous PPC spinlock code
Next
From: vjanand@uwm.edu
Date:
Subject: BTree index