RE: Assertion failure in SnapBuildInitialSnapshot() - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Assertion failure in SnapBuildInitialSnapshot()
Date
Msg-id TY4PR01MB1690722DA11C85E1686F739DF94D5A@TY4PR01MB16907.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: Assertion failure in SnapBuildInitialSnapshot()  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Thursday, November 13, 2025 12:56 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> 
> While testing the patches across all branches, I noticed that an additional lock
> needs to be added in the launcher.c where
> ReplicationSlotsComputeRequiredXmin(true) was recently added for conflict
> detection slot. I have modified the original patch accordingly.
> 
> BTW, I am not adding a test using an injection point because it does not seem
> practical to insert an injection point inner
> ReplicationSlotsComputeRequiredXmin. The reason is that the injection point
> function internally calls CHECK_FOR_INTERRUPTS(), but the key functions in
> the patch holds the lwlock, holding holds interrupts.
> 
> I am sharing the patches for all branches for reference.

I have been thinking if there a way to avoid holding ReplicationSlotControlLock
exclusively in ReplicationSlotsComputeRequiredXmin() because that could cause
lock contention when many slots exist and advancements occur frequently.

Given that the bug arises from a race condition between slot creation and
concurrent slot xmin computation, I think another way is that, we acquire the
ReplicationSlotControlLock exclusively only during slot creation to do the
initial update of the slot xmin. In ReplicationSlotsComputeRequiredXmin(), we
still hold the ReplicationSlotControlLock in shared mode until the global slot
xmin is updated in ProcArraySetReplicationSlotXmin(). This approach prevents
concurrent computations and updates of new xmin horizons by other backends
during the initial slot xmin update process, while it still permits concurrent
calls to ReplicationSlotsComputeRequiredXmin().

Here is an update patch for this approach on HEAD.

Best Regards,
Hou zj

Attachment

pgsql-hackers by date:

Previous
From: Ajin Cherian
Date:
Subject: Re: Improve pg_sync_replication_slots() to wait for primary to advance
Next
From: Rahila Syed
Date:
Subject: Re: Clarification on when _PG_init() is invoked for extensions