Re: Newly created replication slot may be invalidated by checkpoint - Mailing list pgsql-hackers
| From | Amit Kapila |
|---|---|
| Subject | Re: Newly created replication slot may be invalidated by checkpoint |
| Date | |
| Msg-id | CAA4eK1KPJWRUE+KQcCDM_v=H1BrzZJavhtzjx3qc8uxoSgQWUQ@mail.gmail.com Whole thread Raw |
| In response to | RE: Newly created replication slot may be invalidated by checkpoint ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
| Responses |
RE: Newly created replication slot may be invalidated by checkpoint
|
| List | pgsql-hackers |
On Mon, Dec 8, 2025 at 3:54 PM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote: > > On Monday, December 8, 2025 5:47 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Mon, Dec 8, 2025 at 12:53 PM Masahiko Sawada > > <sawada.mshk@gmail.com> wrote: > > > > > > On Fri, Dec 5, 2025 at 4:10 AM Amit Kapila <amit.kapila16@gmail.com> > > wrote: > > > > > > > > On Thu, Dec 4, 2025 at 12:12 PM Zhijie Hou (Fujitsu) > > > > <houzj.fnst@fujitsu.com> wrote: > > > > > > > > > > Here are the updated patches for HEAD and 18. I did not add tests > > > > > since, after applying the patch and resolving the issue, the only > > > > > observable behavior is that the checkpoint will wait for another > > > > > backend to create a slot due to the lwlock lock, so it seems not > > > > > worth to test solely lwlock wait event (I could not find similar tests). > > > > > > > > > > > > > Fair enough. The patch looks mostly good to me, attached are minor > > > > comment improvements atop the HEAD patch. I'll do some more testing > > > > before push. > > > > > > > > Sawada-san/Vitaly, do you have any opinion on patch or the direction > > > > to fix? The idea is to get this fixed for HEAD and 18, then continue > > > > discussion for other bank-branches and the remaining patches. > > > > > > +1 > > > > > > > Thanks, Pushed. I'll continue thinking on how to fix it in branches prior to 18 > > and other problems reported in this thread. > > Thanks for pushing. I thought about whether it's possible to apply a similar fix > to back-branches and one approach could be to take ReplicationSlotAllocationLock > at two places. E.g., acquire an exclusive lock WAL reservation, and a shared > lock during the minimum LSN calculation at checkpoints to serialize the process. > + * + * Additionally, acquiring the Allocation lock to serialize the minimum LSN + * calculation with concurrent slot WAL reservation. This ensures that the + * WAL position being reserved is either included in the miminum LSN or is + * beyond or equal to the redo pointer of the current checkpoint (See + * ReplicationSlotReserveWal for details). */ + LWLockAcquire(ReplicationSlotAllocationLock, LW_SHARED); slotsMinReqLSN = XLogGetReplicationSlotMinimumLSN(); + LWLockRelease(ReplicationSlotAllocationLock); Yeah, this will fix the reported issue but doesn't it look odd to take an unrelated lock here? I mean it appears that if someone has to call XLogGetReplicationSlotMinimumLSN(), they should acquire ReplicationSlotAllocationLock in LW_SHARED mode? If we want to go in this direction and don't have better ideas to fix then we should add comments suggesting this is a special case and shouldn't be used as an example for other places. The other idea to fix this problem is suggested by Alexander in his email [1] which is to introduce a new ReplicationSlotReserveWALLock for this purpose. I think introducing LWLock in back branches could be questionable. Did you evaluate the pros and cons of using that approach? Yet, another possibility is that we don't fix this in back branches prior to 18 but not sure how frequently it can impact users. Suyu, can you please tell how you found this problem in the first place? Is it via code-review or did you hit this in the production or while doing some related tests? BTW, I have asked a question regarding commit 2090edc6f32f652a2c in email [2]. Did you get a chance to look at that? [1] - https://www.postgresql.org/message-id/CAPpHfduZY7_pRCrbLdsLty4zP5x2EDmwk4CYiofiyjdt1iK%2BzA%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAA4eK1%2BwrNSee6PKQ0%2BDtUu_W0GdvewskpAEK5EiX6r3E%2B2Sxw%40mail.gmail.com -- With Regards, Amit Kapila.
pgsql-hackers by date: