Dan Armbrust <daniel.armbrust.list@gmail.com> writes:
> Almost all of the slow query log messages are logged within about 3
> seconds of the checkpoint starting message.
> LOG: checkpoint complete: wrote 9975 buffers (77.9%); 0 transaction
> log file(s) added, 0 removed, 15 recycled; write=156.576 s, sync=0.065
> s, total=156.662 s
Huh. And there's just about no daylight between the total checkpoint
time and the write+sync time (barely more than 20msec in both examples).
So that seems to wipe out the thought I had that maybe we'd
underestimated the work involved in one of the other steps of
checkpoint.
As Greg commented upthread, we seem to be getting forced to the
conclusion that the initial buffer scan in BufferSync() is somehow
causing this. There are a couple of things it'd be useful to try
here:
* see how the size of the hiccup varies with shared_buffers;
* try inserting a delay into that scan loop, as per attached
quick-and-dirty patch. (Numbers pulled from the air, but
we can worry about tuning after we see if this is really
where the problem is.)
regards, tom lane
*** src/backend/storage/buffer/bufmgr.c~ Tue Jan 1 14:45:51 2008
--- src/backend/storage/buffer/bufmgr.c Wed Jul 8 18:12:49 2009
***************
*** 1037,1042 ****
--- 1037,1045 ----
}
UnlockBufHdr(bufHdr);
+
+ if ((buf_id % 256) == 0)
+ pg_usleep(1000L);
}
if (num_to_write == 0)