"Simon Riggs" <simon@2ndquadrant.com> writes:
> On Mon, 2006-11-27 at 22:08 +0100, Peter Eisentraut wrote:
>> He increased the WAL segment size from 16 MB to 256 MB. Without any
>> further information about the system configuration, that seems to be
>> mostly equivalent to increasing the number of checkpoint segments.
> On a busy system you can switch WAL segments every few seconds at 16MB.
> Fsync can freeze commits for more than a second, so raising the segment
> size reduces the fsync overhead considerably.
Sorry, but that's just handwaving. The amount of data to be written for
any specific commit isn't going to change in the least if you change
XLOGSEGSZ --- it's still going to be whatever has been written since the
last commit. I agree with Peter that the quoted Sun test appears to have
failed to control the frequency of checkpoints, and that that was what
really accounted for the performance change. So he'd have gotten the
same result from increasing checkpoint_segments without bothering with
a change in XLOGSEGSZ.
I do note that XLogWrite() does this in the foreground path of control:
* If we just wrote the whole last page of a logfile segment, * fsync the segment immediately.
Thisavoids having to go back * and re-open prior segments when an fsync request comes along *
later.Doing it here ensures that one and only one backend will * perform this fsync.
This coding predates the existence of the bgwriter; now that we have
that, it'd perhaps be interesting to try to put the burden on the
bgwriter instead. (However, if a backend is trying to fsync a commit
record just after the segment switch, it'd have to wait for the previous
segment to be fsync'd anyway. The complexity and likely performance
costs of arranging for that synchronization might outweigh any gains.)
In any case, the existence of this code isn't an argument for raising
XLOGSEGSZ, more the reverse --- the bigger the segment the more painful
the fsync is likely to be.
[ studies code a bit more... ] I'm also wondering whether the forced
pg_control update at each xlog seg switch is worth its keep. Offhand it
seems like the checkpoint pointer is enough; why are we maintaining
logId/logSeg in pg_control?
regards, tom lane