On Thursday, November 13, 2025 12:56 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
>
> While testing the patches across all branches, I noticed that an additional lock
> needs to be added in the launcher.c where
> ReplicationSlotsComputeRequiredXmin(true) was recently added for conflict
> detection slot. I have modified the original patch accordingly.
>
> BTW, I am not adding a test using an injection point because it does not seem
> practical to insert an injection point inner
> ReplicationSlotsComputeRequiredXmin. The reason is that the injection point
> function internally calls CHECK_FOR_INTERRUPTS(), but the key functions in
> the patch holds the lwlock, holding holds interrupts.
>
> I am sharing the patches for all branches for reference.
I have been thinking if there a way to avoid holding ReplicationSlotControlLock
exclusively in ReplicationSlotsComputeRequiredXmin() because that could cause
lock contention when many slots exist and advancements occur frequently.
Given that the bug arises from a race condition between slot creation and
concurrent slot xmin computation, I think another way is that, we acquire the
ReplicationSlotControlLock exclusively only during slot creation to do the
initial update of the slot xmin. In ReplicationSlotsComputeRequiredXmin(), we
still hold the ReplicationSlotControlLock in shared mode until the global slot
xmin is updated in ProcArraySetReplicationSlotXmin(). This approach prevents
concurrent computations and updates of new xmin horizons by other backends
during the initial slot xmin update process, while it still permits concurrent
calls to ReplicationSlotsComputeRequiredXmin().
Here is an update patch for this approach on HEAD.
Best Regards,
Hou zj