Re: Buffer locking is special (hints, checksums, AIO writes) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Buffer locking is special (hints, checksums, AIO writes)
Date
Msg-id 03041d48-1e15-4741-b365-0809f2bc75c4@iki.fi
Whole thread Raw
In response to Re: Buffer locking is special (hints, checksums, AIO writes)  (Andres Freund <andres@anarazel.de>)
Responses Re: Buffer locking is special (hints, checksums, AIO writes)
List pgsql-hackers
On 03/02/2026 00:33, Andres Freund wrote:
>    - Now that we use the normal order of WAL logging, we don't need to delay
>      checkpoint starts anymore.
> 
>      I think the explanation for why that is ok is correct [1], but it needs to
>      be looked at by somebody with experience around this. Maybe Heikki?

So that's patch 0004 "bufmgr: Switch to standard order in 
MarkBufferDirtyHint()". Yes, looks correct to me.

>     /*
>      * Update RedoRecPtr so that we can make the right decision. It's possible
>      * that a new checkpoint will start just after GetRedoRecPtr(), but that
>      * is ok, as the buffer is already dirty, ensuring that any BufferSync()
>      * started after the buffer was marked dirty cannot complete without
>      * flushing this buffer.  If a checkpoint started between marking the
>      * buffer dirty and this check, we will emit an unnecessary WAL record (as
>      * the buffer will be written out as part of the checkpoint), but the
>      * window for that is small.
>      */
>     RedoRecPtr = GetRedoRecPtr();

That "small window" is actually pretty big if you think of it a little 
more loosely. Our rule is that we write the full page image if a 
checkpoint has started since the page LSN, but that's very conservative 
already. It would be sufficient to write the full page image only if the 
checkpoint has already flushed the page. This small window is just a 
special case of that conservatism.

I've been thinking of trying track that more accurately for a long time, 
because it would smoothen the WAL spike when a checkpoint begins.

That gets off-topic, but my point is that it feels a little silly to 
mention that small window when there's the other giant panoramic window 
next to it.

- Heikki




pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: pg_upgrade: transfer pg_largeobject_metadata's files when possible
Next
From: Chengpeng Yan
Date:
Subject: Re: Unfortunate pushing down of expressions below sort