Re: Newly created replication slot may be invalidated by checkpoint - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Newly created replication slot may be invalidated by checkpoint
Date
Msg-id CAD21AoCpSGPd1vh0wJc2FAO_1AhUb0G-XFwF0yHYCUP5uoESXA@mail.gmail.com
Whole thread Raw
In response to Re: Newly created replication slot may be invalidated by checkpoint  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Mon, Jan 5, 2026 at 1:31 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Dec 30, 2025 at 5:31 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > On Sun, Dec 14, 2025 at 8:21 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > > On Thu, Dec 11, 2025 at 12:39 PM Zhijie Hou (Fujitsu)
> > > <houzj.fnst@fujitsu.com> wrote:
> > > >
> > > > >
> > > > > The other idea to fix this problem is suggested by Alexander in his
> > > > > email [1] which is to introduce a new ReplicationSlotReserveWALLock
> > > > > for this purpose. I think introducing LWLock in back branches could be
> > > > > questionable. Did you evaluate the pros and cons of using that
> > > > > approach?
> > > >
> > > > I reviewed that approach, and I think the main distinction lies in whether to
> > > > use a new LWLock to serialize the process or rely on an existing lock.
> > > > Introducing a new LWLock in back branches would alter the size of
> > > > MainLWLockArray and affect NUM_INDIVIDUAL_LWLOCKS/LWTRANCHE_FIRST_USER_DEFINED.
> > > > Although this may not directly impact user applications since users typically
> > > > use standard APIs like RequestNamedLWLockTranche and LWLockNewTrancheId to add
> > > > private LWLocks, it still has a slight risk. Additionally, using an existing
> > > > lock could keep code similarity with the HEAD, which can be helpful for future
> > > > bug fixes and analysis.
> > > >
> > >
> > > Fair enough. I'll wait for Sawada-san/Vitaly to see what their opinion
> > > on this matter is.
> >
> > While it's hacky that the proposed approach takes
> > ReplicationSlotAllocationLock before
> > XLogGetReplicationSlotMinimumLSN() during checkpoint, I find that it's
> > better than introducing a new LWLock in minor versions which could
> > lead unexpected compatibility issues.
> >
> > Regarding the v10 patch, do we need to take
> > ReplicationSlotAllocationLock also at the following place?
> >
> >         /*
> >          * Recalculate the current minimum LSN to be used in the WAL segment
> >          * cleanup.  Then, we must synchronize the replication slots again in
> >          * order to make this LSN safe to use.
> >          */
> >         slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN();
> >         CheckPointReplicationSlots(shutdown);
> >
>
> No, we don't need the allocation lock here because RedoRecPtr won't
> change after the previous computation so the WAL reservation during
> creation of slots won't be impacted. I mean if the slot reservation
> starts just before this computation, it should use the latest (this
> checkpoint cycle's RedoRecPtr) whereas that was not true with the case
> where the patch acquires it.

Agreed.

>
> > I think we need to add some comments regardless of taking the lwlock.
> >
>
> I have added a comment though not sure if it is required. Do you have
> something specific in mind?

I wanted to clarify when we should and should not acquire
ReplicationSlotALlocationLock in shared mode during checkpointing. The
updated comments look good to me. Thank you!

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Fix incorrect buffer lock description in pg_visibility comment
Next
From: Masahiko Sawada
Date:
Subject: Re: POC: Parallel processing of indexes in autovacuum