Dear Hayato,
On Tuesday, October 07, 2025 14:53 MSK, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote:
> > Yes, I
> > propose
> > to ensure that the protection happens when we assign restart_lsn. It seems to be
> > wrong that we invalidate slots by its restart_lsn but protect the wal for
> > slots using XLogCtl->replicationSlotsMinLSN.
>
> Seems valid. There is another corner case that another restart_lsn can be set in-between,
> but they have larger LSN than RedoRecPtr, right?
Here we talk about new creating slots. There should be no other processes that
can change restart_lsn during slot creation. Once, the slot is successfully
created it can be advanced to the greater values only. During advance, the old
restart_lsn will protect the slot, because it will be taken into account in
the checkpoint.
> > There is one subtle thing. Once, the operation of restart_lsn assignment is not
> > an atomic, the following scenario may happen theoretically:
> > 1. Read GetRedoRecPtr() in the backend (ReplicationSlotReserveWal)
> > 2. Assign a new redo LSN in the checkpointer
> > 3. Call ReplicationSlotsComputeRequiredLSN() in the checkpointer
> > 3. Assign the old redo LSN to restart_lsn
> >
> > In this scenario, the restart_lsn will point to a previous redo LSN and it will
> > be not protected by the new redo LSN. This scenario is unlikely, but it can
> > happen theoretically. I have no ideas how to deal with it, except of assigning
> > restart_lsn under XLogCtl->info_lck lock to avoid concurrent modification of
> > XLogCtl->RecoRecPtr until it is assigned to restart_lsn of a creating slot.
>
> Oh, your point is there is another race condition, right? Do you have the reproducer for it?
The attached test for the master branch demonstrates a possible but very
rare race condition, because we read and assign slot's restart_lsn from redo rec
ptr non-atomically. The proposed scenario (see above) seems to be not complete.
With best regards,
Vitaly