Re: Buffer locking is special (hints, checksums, AIO writes) - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Buffer locking is special (hints, checksums, AIO writes)
Date
Msg-id 20260215195239.ce.noahmisch@microsoft.com
Whole thread Raw
In response to Re: Buffer locking is special (hints, checksums, AIO writes)  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-hackers
On Sat, Feb 07, 2026 at 12:44:25PM +0200, Heikki Linnakangas wrote:
> On 03/02/2026 00:33, Andres Freund wrote:
> > - The way MarkBufferDirtyHint() operates was copied into
> >    heap_inplace_update_and_unlock(). Now that MarkBufferDirtyHint() won't work
> >    that way anymore, it seems better to go with the alternative approach the
> >    comments already outlined, namely to only delay updating of the buffer
> >    contents.
> > 
> >    I've done this in a prequisite commit, as it doesn't actually depend on any
> >    of the other changes.  Noah, any chance you could take a look at this?

v12-0001-heapam-Don-t-mimic-MarkBufferDirtyHint-in-inplac.patch looks good.

> Patch 0001 Looks correct to me. However:
> 
> >      * ["D" is a VACUUM (ONLY_DATABASE_STATS)]
> >      * ["R" is a VACUUM tbl]
> >      * D: vac_update_datfrozenxid() -> systable_beginscan(pg_class)
> >      * D: systable_getnext() returns pg_class tuple of tbl
> >      * R: memcpy() into pg_class tuple of tbl
> >      * D: raise pg_database.datfrozenxid, XLogInsert(), finish
> >      * [crash]
> >      * [recovery restores datfrozenxid w/o relfrozenxid]
> >      *
> >      * As we hold an exclusive lock - preventing the buffer from being written
> >      * out once dirty - we can work around this as follows: MarkBufferDirty(),
> >      * XLogInsert(), memcpy().
> 
> That last reference to 'memcpy' is a little orphaned now. The comment used
> to talk about the stack copy of the page, but now there's no mention of that
> except for this reference to memcpy(). To make things worse, the steps have
> "memcpy() into pg_class tuple of tbl", so one could think that the "memcpy"
> refers to that.

"memcpy" does refer to "memcpy() into pg_class tuple of tbl", so I don't see
that as orphaned.  Nonetheless:

> How about this:
> 
>      * We avoid that by using a temporary copy of the buffer to hide our
>      * change from other backends until it's been WAL-logged. We apply our
>      * change to the temporary copy and WAL-log it before modifying the real
>      * page. That way any action a reader of the in-place-updated value takes
>      * will be WAL logged after this change.

Either v12 or v12 w/ this edit is fine with me.  I find this proposed text
redundant with nearby comment "register block matching what buffer will look
like after changes", so I mildly prefer v12.



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Next
From: Tom Lane
Date:
Subject: Inconsistency in installation of syscache_info.h