Re: Implement waiting for wal lsn replay: reloaded - Mailing list pgsql-hackers
| From | Alexander Korotkov |
|---|---|
| Subject | Re: Implement waiting for wal lsn replay: reloaded |
| Date | |
| Msg-id | CAPpHfdsLkrDSEfrYe_Q2CWs1R6b_qO28jzVdh4sZxFnp5arubQ@mail.gmail.com Whole thread Raw |
| In response to | Re: Implement waiting for wal lsn replay: reloaded (Xuneng Zhou <xunengzhou@gmail.com>) |
| List | pgsql-hackers |
Hi, Xuneng! On Fri, Nov 14, 2025 at 3:50 AM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > On Fri, Nov 14, 2025 at 4:32 AM Tomas Vondra <tomas@vondra.me> wrote: > > > > On 11/5/25 10:51, Alexander Korotkov wrote: > > > Hi! > > > > > > On Mon, Nov 3, 2025 at 5:13 PM Andres Freund <andres@anarazel.de> wrote: > > >> On 2025-11-03 16:06:58 +0100, Álvaro Herrera wrote: > > >>> On 2025-Nov-03, Alexander Korotkov wrote: > > >>> > > >>>> I'd like to give this subject another chance for pg19. I'm going to > > >>>> push this if no objections. > > >>> > > >>> Sure. I don't understand why patches 0002 and 0003 are separate though. > > >> > > >> FWIW, I appreciate such splits. Even if the functionality isn't usable > > >> independently, it's still different type of code that's affected. And the > > >> patches are each big enough to make that worthwhile for easier review. > > > > > > Thank you for the feedback, pushed. > > > > > > > Hi, > > > > The new TAP test 049_wait_for_lsn.pl introduced by this commit, because > > it takes a long time - about 65 seconds on my laptop. That's about 25% > > of the whole src/test/recovery, more than any other test. > > > > And most of the time there's nothing happening - these are the two log > > messages showing the 60-second wait: > > > > 2025-11-13 21:12:39.949 CET checkpointer[562597] LOG: checkpoint > > complete: wrote 9 buffers (7.0%), wrote 3 SLRU buffers; 0 WAL file(s) > > added, 0 removed, 2 recycled; write=0.906 s, sync=0.001 s, total=0.907 > > s; sync files=0, longest=0.000 s, average=0.000 s; distance=32768 kB, > > estimate=32768 kB; lsn=0/040000B8, redo lsn=0/04000060 > > > > 2025-11-13 21:13:38.994 CET client backend[562727] 049_wait_for_lsn.pl > > ERROR: recovery is not in progress > > > > So there's a checkpoint, 60 seconds of nothing, and then a failure. I > > haven't looked into why it waits for 1 minute exactly, but adding 60 > > seconds to check-world is somewhat annoying. > > Thanks for looking into this! > > I did a quick analysis for this prolonged waiting: > > In WaitLSNWakeup() (xlogwait.c:267), the fast-path check incorrectly > handled InvalidXLogRecPtr: > /* Fast path check */ > if (pg_atomic_read_u64(&waitLSNState->minWaitedLSN[i]) > currentLSN) > return; // Issue: Returns early when currentLSN = 0 > > When currentLSN = InvalidXLogRecPtr (0), meaning "wake all waiters", > the check compared: > - minWaitedLSN (e.g., 0x570CC048) > 0 → TRUE > - Result: function returned early without waking anyone > > When It Happened > During standby promotion, xlog.c:6246 calls: > > WaitLSNWakeup(WAIT_LSN_TYPE_REPLAY, InvalidXLogRecPtr); > > This should wake all LSN waiters, but the bug prevented it. WAIT FOR > LSN commands could wait indefinitely. Test 049_wait_for_lsn.pl took 68 > seconds instead of ~9 seconds. > > if the above analysis is sound, the fix could be like: > > Proposed fix: > Added a validity check before the comparison: > /* > * Fast path check. Skip if currentLSN is InvalidXLogRecPtr, which means > * "wake all waiters" (e.g., during promotion when recovery ends). > */ > if (XLogRecPtrIsValid(currentLSN) && > pg_atomic_read_u64(&waitLSNState->minWaitedLSN[i]) > currentLSN) > return; > > Result: > Test time: 68s → 9s > WAIT FOR LSN exits immediately on promotion (62ms vs 60s) > > > While at it, I noticed a couple comments refer to WaitForLSNReplay, but > > but I think that got renamed simply to WaitForLSN. > > Please check the attached patch for replacing them. Thank you so much for your patches! Pushed with minor corrections. ------ Regards, Alexander Korotkov Supabase
pgsql-hackers by date: