RE: Newly created replication slot may be invalidated by checkpoint - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Newly created replication slot may be invalidated by checkpoint
Date
Msg-id TY4PR01MB16907E744589B1AB2EE89A31F94D7A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: Newly created replication slot may be invalidated by checkpoint  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Responses RE: Newly created replication slot may be invalidated by checkpoint
List pgsql-hackers
On Monday, November 17, 2025 6:50 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> 
> 
> Attach v2 patch that improves the fix based on above analysis.
> 
> (I am still testing some other scenarios related to slotsync to ensure the
> current fix is complete)

I have been testing whether taking exclusive ReplicationSlotAllocationLock can
prevent newly synced slot from being invalidated, but found that it is
insufficient.

During slotsync, since it fetches the remote LSN (queried from the publisher) as
the initial restart_lsn for newly synced slot and we sync the slots in a
asynchronous manner, the synced restart_lsn could be behind the standby's redo
pointer. So, there could be race conditions that the checkpoint removes the
required WALs and invalidates the newly synced slot:

1. Assuming there is no slot on standby, the minimum slot LSN would be invalid,
   and the checkpoint reaches InvalidateObsoleteReplicationSlots() with
   oldestSegno =segno of redo.
2. The slotsync successfully sync the slot A that has old restart_lsn.
3. The checkpoint would then find that the newly synced slot A should be
   invalidated due to old restart_lsn.

Unlike in ReplicationSlotReserveWal(), holding only the
ReplicationSlotAllocationLock does not safeguard a newly synced slot if a
checkpoint occurs before WAL reservation during slotsync as the initial
restart_lsn provided could be prior to the redo pointer.

To address this issue, I changed the WAL reservation to compare the
remote restart_lsn with both minimum slot LSN and redo pointer. If either local
LSN is less than the remote restart_lsn, we update the local slot with the
remote value. Otherwise, we use the minimum of the two local LSNs.

Additionally, we acquire both ReplicationSlotAllocationLock and
ReplicationSlotControlLock to prevent the checkpoint from updating the redo
pointer and other backend processes from updating the minimum slot LSN.

Here is the V3 patch that fixes all the race conditions found so far.

------

Apart from addressing the fix for HEAD, I would like to inquire if anyone holds
a differing opinion regarding the revert of the origin fix 2090edc6f32f652a2c in
the back branches and applying the same fix as HEAD.

I searched for extensions that rely on the size of ReplicationSlot and did not
find any, so I think it is OK to implement the same fix in the backpatches.

It there are no alternative views, I will proceed with preparing the patches for
the back branches in upcoming versions.

Best Regards,
Hou zj

Attachment

pgsql-hackers by date:

Previous
From: Jakub Wartak
Date:
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Next
From: Peter Eisentraut
Date:
Subject: Re: gen_guc_tables.pl: Validate required GUC fields before code generation