>> AFAICR with xlog-triggered checkpoints, the checkpointer progress is
>> measured with respect to the size of the WAL file, which does not grow
>> linearly in time for the reason you pointed above (a lot of FPW at the
>> beginning, less in the end). As the WAL file is growing quickly, the
>> checkpointer thinks that it is late and that it has some catchup to do, so
>> it will start to try writing quickly as well. There is a double whammy as
>> both are writing more, and are probably not succeeding.
>>
>> For time triggered checkpoints, the WAL file gets filled up *but* the
>> checkpointer load is balanced against time. This is a "simple" whammy, where
>> the checkpointer uses IO bandwith which is needed for the WAL, and it could
>> wait a little bit because the WAL will need less later, but it is not trying
>> to catch up by even writing more, so the load shifting needed in this case
>> is not the same as the previous case.
>
> I see your point, but this isn't a function of what triggered the
> checkpoint. It's a function of how we measure whether the
> already-triggered checkpoint is on schedule - we may be behind either
> because of time, or because of xlog, or both.
Yes. Indeed the current implementation does some kind of both time & xlog.
My reasonning was that for time triggered checkpoints (probably average to
low load) the time is likely to be used for the checkpoint schedule, while
for xlog-triggered checkpoints (probably higher load) it would be more
likely to be the xlog, which is skewed.
Anyway careful thinking is needed to balance WAL and checkpointer IOs,
only when needed, not a rough formula applied blindly.
--
Fabien.