Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Any ideas on how to record the
> modified files without generating tones of output or locking contention?
What I've suggested before is that the bgwriter process can keep track
of all files that it's written to since the last checkpoint, and fsync
them during checkpoint (this would likely require giving the checkpoint
task to the bgwriter instead of launching a separate process for it,
but that doesn't seem unreasonable). Obviously this requires only local
storage in the bgwriter process, and hence no contention.
That leaves us still needing to account for files that are written
directly by a backend process and not by the bgwriter. However, I claim
that if the bgwriter is worth the cycles it's expending, cases in which
a backend has to write out a page for itself will be infrequent enough
that we don't need to optimize them. Therefore it would be enough to
have backends immmediately sync any write they have to do. (They might
as well use O_SYNC.) Note that backends need not sync writes to temp
files or temp tables, only genuine shared tables.
If it turns out that it's not quite *that* infrequent, a compromise
position would be to keep a small list of files-needing-fsync in shared
memory. Backends that have to evict pages from shared buffers add those
files to the list; the bgwriter periodically removes entries from the
list and fsyncs the files. Only if there is no room in the list does a
backend have to fsync for itself. If the list is touched often enough
that it becomes a source of contention, then the whole bgwriter concept
is completely broken :-(
Now this last plan does assume that an fsync applied by process X will
write pages that were dirtied by process Y through a different file
descriptor for the same file. There's been some concern raised in the
past about whether we can assume that. If not, though, the simpler
backends-must-sync-their-own-writes plan will still work.
regards, tom lane