On Monday, December 8, 2025 5:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Dec 8, 2025 at 12:53 PM Masahiko Sawada
> <sawada.mshk@gmail.com> wrote:
> >
> > On Fri, Dec 5, 2025 at 4:10 AM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> > >
> > > On Thu, Dec 4, 2025 at 12:12 PM Zhijie Hou (Fujitsu)
> > > <houzj.fnst@fujitsu.com> wrote:
> > > >
> > > > Here are the updated patches for HEAD and 18. I did not add tests
> > > > since, after applying the patch and resolving the issue, the only
> > > > observable behavior is that the checkpoint will wait for another
> > > > backend to create a slot due to the lwlock lock, so it seems not
> > > > worth to test solely lwlock wait event (I could not find similar tests).
> > > >
> > >
> > > Fair enough. The patch looks mostly good to me, attached are minor
> > > comment improvements atop the HEAD patch. I'll do some more testing
> > > before push.
> > >
> > > Sawada-san/Vitaly, do you have any opinion on patch or the direction
> > > to fix? The idea is to get this fixed for HEAD and 18, then continue
> > > discussion for other bank-branches and the remaining patches.
> >
> > +1
> >
>
> Thanks, Pushed. I'll continue thinking on how to fix it in branches prior to 18
> and other problems reported in this thread.
Thanks for pushing. I thought about whether it's possible to apply a similar fix
to back-branches and one approach could be to take ReplicationSlotAllocationLock
at two places. E.g., acquire an exclusive lock WAL reservation, and a shared
lock during the minimum LSN calculation at checkpoints to serialize the process.
The logic is similar to HEAD: it ensures that, if WAL reservation
occurs first, the checkpoint waits until restart_lsn is updated before
calculating the minimum LSN. If the checkpoint runs first, subsequent WAL
reservations pick a position at or after the latest checkpoint's redo pointer.
Here is the patch based on PG17 for reference.
Best Regards,
Hou zj