Re: [pgsql-hackers-win32] Sync vs. fsync during checkpoint - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [pgsql-hackers-win32] Sync vs. fsync during checkpoint
Date
Msg-id 16010.1075485076@sss.pgh.pa.us
Whole thread Raw
In response to Re: [pgsql-hackers-win32] Sync vs. fsync during checkpoint  (Bruce Momjian <pgman@candle.pha.pa.us>)
Responses Re: [pgsql-hackers-win32] Sync vs. fsync during checkpoint
List pgsql-hackers
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Any ideas on how to record the
> modified files without generating tones of output or locking contention?

What I've suggested before is that the bgwriter process can keep track
of all files that it's written to since the last checkpoint, and fsync
them during checkpoint (this would likely require giving the checkpoint
task to the bgwriter instead of launching a separate process for it,
but that doesn't seem unreasonable).  Obviously this requires only local
storage in the bgwriter process, and hence no contention.

That leaves us still needing to account for files that are written
directly by a backend process and not by the bgwriter.  However, I claim
that if the bgwriter is worth the cycles it's expending, cases in which
a backend has to write out a page for itself will be infrequent enough
that we don't need to optimize them.  Therefore it would be enough to
have backends immmediately sync any write they have to do.  (They might
as well use O_SYNC.)  Note that backends need not sync writes to temp
files or temp tables, only genuine shared tables.

If it turns out that it's not quite *that* infrequent, a compromise
position would be to keep a small list of files-needing-fsync in shared
memory.  Backends that have to evict pages from shared buffers add those
files to the list; the bgwriter periodically removes entries from the
list and fsyncs the files.  Only if there is no room in the list does a
backend have to fsync for itself.  If the list is touched often enough
that it becomes a source of contention, then the whole bgwriter concept
is completely broken :-(

Now this last plan does assume that an fsync applied by process X will
write pages that were dirtied by process Y through a different file
descriptor for the same file.  There's been some concern raised in the
past about whether we can assume that.  If not, though, the simpler
backends-must-sync-their-own-writes plan will still work.

            regards, tom lane

pgsql-hackers by date:

Previous
From: Steve Atkins
Date:
Subject: Re: Mixing threaded and non-threaded
Next
From: Tom Lane
Date:
Subject: Why is it so easy to override -fno-strict-aliasing?