Re: Buffer locking is special (hints, checksums, AIO writes) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Buffer locking is special (hints, checksums, AIO writes)
Date
Msg-id 5ubipyssiju5twkb7zgqwdr7q2vhpkpmuelxfpanetlk6ofnop@hvxb4g2amb2d
Whole thread Raw
In response to Re: Buffer locking is special (hints, checksums, AIO writes)  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hi,

On 2026-01-14 16:20:58 -0500, Andres Freund wrote:
> I'm now working on cleaning up the last two commits. The most crucial bit is
> to simplify what happens in MarkSharedBufferDirtyHint(), we afaict can delete
> the use of DELAY_CHKPT_START etc and just go to marking the buffer dirty first
> and then do the WAL logging, just like normal WAL logging. The previous order
> was only required because we were dirtying the page while holding only a
> shared lock, which did not conflict with the lock held by SyncBuffers() etc.

I've been working on that.

- A lot of what was special about MarkBufferDirtyHint() isn't needed anymore:

  - The "abnormal" order of WAL logging before marking the buffer dirty was
    only needed because we marked buffers dirty. Which in turn was only needed
    because setting hint bits didn't conflict with flushing the page. With
    share-exclusive they do conflict, and we can switch to the normal order of
    operations, where marking a buffer dirty makes checkpoint wait when the
    buffer is encountered (due to wanting to flush the buffer but not getting
    the lock)


  - Now that we use the normal order of WAL logging, we don't need to delay
    checkpoint starts anymore.

    I think the explanation for why that is ok is correct [1], but it needs to
    be looked at by somebody with experience around this. Maybe Heikki?


  - Thanks to holding share-exclusive lock, nothing can concurrently dirty or
    undirty the buffer. Therefore the comments about spurious failures to mark
    the buffer dirty can be removed.


- I realized that, now that buffers cannot be dirtied while IO is ongoing, we
  don't need BM_JUST_DIRTIED anymore.


- The way MarkBufferDirtyHint() operates was copied into
  heap_inplace_update_and_unlock(). Now that MarkBufferDirtyHint() won't work
  that way anymore, it seems better to go with the alternative approach the
  comments already outlined, namely to only delay updating of the buffer
  contents.

  I've done this in a prequisite commit, as it doesn't actually depend on any
  of the other changes.  Noah, any chance you could take a look at this?


- Lots of minor polish


Greetings,

Andres Freund

[1]
    /*
     * Update RedoRecPtr so that we can make the right decision. It's possible
     * that a new checkpoint will start just after GetRedoRecPtr(), but that
     * is ok, as the buffer is already dirty, ensuring that any BufferSync()
     * started after the buffer was marked dirty cannot complete without
     * flushing this buffer.  If a checkpoint started between marking the
     * buffer dirty and this check, we will emit an unnecessary WAL record (as
     * the buffer will be written out as part of the checkpoint), but the
     * window for that is small.
     */

Attachment

pgsql-hackers by date:

Previous
From: Jelte Fennema-Nio
Date:
Subject: Re: libpq: Bump protocol version to version 3.2 at least until the first/second beta
Next
From: Melanie Plageman
Date:
Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)