Re: Buffer locking is special (hints, checksums, AIO writes) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Buffer locking is special (hints, checksums, AIO writes)
Date
Msg-id cmjazttp6zz5gttyxfp3iakcaqxev33vanks4uhrwjyskdrzqz@er2mmhtobt62
Whole thread Raw
In response to Re: Buffer locking is special (hints, checksums, AIO writes)  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Hi,

On 2026-01-29 13:33:02 -0500, Peter Geoghegan wrote:
> On Thu, Jan 29, 2026 at 1:06 PM Andres Freund <andres@anarazel.de> wrote:
> > Wonder if - independent of this
> > issue - it could make sense to update the FSM during nbtree WAL recovery...
> 
> Maybe that would make sense. But I tend to think that we should have a
> fully atomic, crash-safe approach to free space management.

I agree that would be nice, but realistically (as you also say below) that
would have to be embedded into the WAL records that use the page that was
acquired from the FSM.  Maybe we could accept a dedicated WAL record for the
index case, but certainly not in the heap case.

Given that we'd need to embed the record somehow anyway, just adding, for now,
a RecordUsedIndexPage() to the redo of XLOG_BTREE_SPLIT* and
XLOG_BTREE_NEWROOT or such could make sense...

It doesn't seem like it'd be great to have a completely outdated index fsm
after a failover. If the index FSM on the newly promoted node is completely
outdated, due to having been copied at a much earlier time while there were a
lot of free pages, a _bt_allocbuf() could take quite a while...

I'm somewhat surprised it doesn't cause more performance issues to keep btree
pages exclusively locked while extending the relation... If that has to write
out pages and flush the WAL...


> Particularly in index AMs, where free space can only ever come in
> BLCKSZ units -- the data structure/concurrency rules can be a lot
> simpler if it only has to accommodate index AM requirements. Maybe the
> WAL-logging could be built into existing index AM record types.

Yea, I have my doubt that makes sense to share code between the index and heap
use cases. I doubt that having one FSM implementation support variable amount
of "space tracking granularity" really makes sense.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: ocean_li_996
Date:
Subject: Re: Fix logical decoding not track transaction during SNAPBUILD_BUILDING_SNAPSHOT
Next
From: Marcos Pegoraro
Date:
Subject: Re: Document NULL