On Mon, Jun 16, 2025 at 04:36:59PM +0200, Christoph Berg wrote:
> I spent some time digging through the code, but I'm still not entirely
> sure what's happening. There are several parts to it:
>
> 1) the list of buffers to flush is determined at the beginning of the
> checkpoint, so running a 2nd FLUSH_UNLOGGED checkpoint will not make
> the running checkpoint write these
>
> 2) running CHECKPOINT updates the checkpoint flags in shared memory so
> I think the currently running checkpoint picks "MODE FAST" up and
> speeds up. (But I'm not entirely sure, the call stack is quite deep
> there.)
>
> 3) running CHECKPOINT (at least when waiting for it) seems to actually
> start a new checkpoint, so FLUSH_UNLOGGED should still be effective.
> (See the code arount "start_cv" in checkpointer.c)
>
> Admittedly, adding these points together raises some question marks
> about the flag handling, so I would welcome clarification by someone
> more knowledgeable in this area.
I think you've got it right. With CHECKPOINT_WAIT set, RequestCheckpoint()
will wait for a new checkpoint to start, at which point we know that the
new flags have been seen by the checkpointer. If an immediate checkpoint
is pending, CheckpointWriteDelay() will skip sleeping in the
currently-running one, so the current checkpoint will be "upgraded" to
immediate in some sense, but IIUC there will still be another immediate
checkpoint after it completes. But AFAICT it doesn't pick up
FLUSH_UNLOGGED until the next checkpoint begins.
Another thing to note is what I mentioned earlier:
+ Note that the server may consolidate concurrently requested checkpoints or
+ restartpoints. Such consolidated requests will contain a combined set of
+ options. For example, if one session requested an immediate checkpoint and
+ another session requested a non-immediate checkpoint, the server may combine
+ these requests and perform one immediate checkpoint.
--
nathan