Jan Wieck <JanWieck@Yahoo.com> writes:
> I didn't save any of the charts done with 7.4, but the responsetime
> spikes on checkpoints went up to 60 seconds without the bgwriter. If you
> look at the last chart on this page
> http://developer.postgresql.org/~wieck/vacuum_cost/
> there are no spikes at all.
I have been meaning to ask you to redo those charts with CVS tip, to see
how things work now that checkpoints use fsync() instead of sync().
There was talk earlier of providing an option to issue sync() before
starting the loop that issues fsync() against each file we've written
since the last checkpoint. The idea was that the sync() would cue the
kernel to schedule I/O for all currently dirty buffers in the most
efficient order, and then the fsync()s would merely ensure that Postgres
waits until the I/O it needs is done. This should be optional since it
would be a clear loser in systems where Postgres isn't the dominant
cause of disk write traffic (since the sync would force much unneeded
I/O). But in a system that's dedicated to one Postgres installation it
seems like it might be a win, compared to doing just fsyncs which might
cause the I/O to be done in a globally non-optimal order.
On the other hand, if the bgwriter's trickle writes are getting the job
done then there shouldn't be all that much work to do at checkpoint
time, and so this might be all just theorizing with not much real-world
effect.
So, before troubling to create this option I'd like to see some
evidence that it'd actually be worthwhile. Could you test it out?
The place to put the sync() call would be at the top of mdsync() in
storage/smgr/md.c.
regards, tom lane