Tom Lane <tgl@sss.pgh.pa.us> writes:
> I would like to see us go over to fsync, or some other technique that
> gives more certainty about when the write has occurred. There might be
> some scope that way to allow stretching out the I/O, too.
>
> The main problem with this is knowing which files need to be fsync'd.
Why could the postmaster not just fsync *every* file? Does any OS make it a
slow operation to fsync a file that has no pending writes? Would we even care,
it would mean the checkpoint would take longer but not actually issue any
extra i/o.
I'm assuming fsync syncs writes issued by other processes on the same file,
which isn't necessarily true though. Otherwise every process would have to
fsync every file descriptor it has open.
> The only idea I have come up with is to move all buffer write operations
> into a background writer process, which could easily keep track of
> every file it's written into since the last checkpoint.
I fear this approach. It seems to limit a lot of design flexibility later. But
I can't come up with any concrete way it limits things so perhaps that
instinct is just fud.
It also can become a point of contention. At least on Oracle you often need
multiple such processes to keep up with the i/o bandwidth.
> Actually, once you build it this way, you could make all writes synchronous
> (open the files O_SYNC) so that there is never any need for explicit fsync
> at checkpoint time.
Or using aio write ahead as much as you want and then just make checkpoint
block until all the writes are completed. You don't actually need to rush them
at all, just know when they're done. That would completely eliminate the i/o
storm without changing the actual pattern of writes at all.
--
greg