Re: Implement waiting for wal lsn replay: reloaded - Mailing list pgsql-hackers
| From | Xuneng Zhou |
|---|---|
| Subject | Re: Implement waiting for wal lsn replay: reloaded |
| Date | |
| Msg-id | CABPTF7VRP97GPwPTiu89xYQMA5pfWknsLSSxnpq11mu+-FiDRA@mail.gmail.com Whole thread Raw |
| In response to | Re: Implement waiting for wal lsn replay: reloaded (Alexander Korotkov <aekorotkov@gmail.com>) |
| Responses |
Re: Implement waiting for wal lsn replay: reloaded
|
| List | pgsql-hackers |
Hi Alexander, On Tue, Apr 7, 2026 at 8:52 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > On Tue, Apr 7, 2026 at 7:03 AM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > On Tue, Apr 7, 2026 at 11:31 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > Hi, > > > > > > On 2026-04-06 23:07:45 -0400, Andres Freund wrote: > > > > But, leaving that aside, looking at this code I'm somewhat concerned - it > > > > seems to not worry at all about memory ordering? > > > > > > > > > > > > static void > > > > XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr, TimeLineID tli) > > > > ... > > > > /* Update shared-memory status */ > > > > pg_atomic_write_u64(&WalRcv->writtenUpto, LogstreamResult.Write); > > > > > > > > /* > > > > * If we wrote an LSN that someone was waiting for, notify the waiters. > > > > */ > > > > if (waitLSNState && > > > > (LogstreamResult.Write >= > > > > pg_atomic_read_u64(&waitLSNState->minWaitedLSN[WAIT_LSN_TYPE_STANDBY_WRITE]))) > > > > WaitLSNWakeup(WAIT_LSN_TYPE_STANDBY_WRITE, LogstreamResult.Write); > > > > > > > > There are no memory barriers here, so the CPU would be entirely free to not > > > > make the writtenUpto write visible to a waiter that's in the process of > > > > registering and is checking whether it needs to wait in WaitForLSN(). > > > > > > > > And WaitForLSN()->GetCurrentLSNForWaitType()->GetWalRcvWriteRecPtr() also has > > > > no barriers. That MAYBE is ok, due addLSNWaiter() providing the barrier at > > > > loop entry and maybe kinda you can think that WaitLatch() will somehow also > > > > have barrier semantic. But if so, that would need to be very carefully > > > > documented. And it seems completely unnecessary here, it's hard to believe > > > > using a barrier (via pg_atomic_read_membarrier_u64() or such) would be a > > > > performance issue > > > > Thanks for pointing this out. This is indeed a store-load ordering issue. > > > > > And separately from the memory ordering, how can it make sense that there's > > > at least 5 copies of this > > > > > > if (waitLSNState && > > > (LogstreamResult.Flush >= > > > pg_atomic_read_u64(&waitLSNState->minWaitedLSN[WAIT_LSN_TYPE_STANDBY_FLUSH]))) > > > WaitLSNWakeup(WAIT_LSN_TYPE_STANDBY_FLUSH, LogstreamResult.Flush); > > > > > > around? That needs to be encapsulated so that if you have a bug, like the > > > memory ordering problem I describe above, it can be fixed once, not in > > > multiple places. > > > > Yeah, this duplication is not ok. > > > > > And why do these callers even have that pre-check? Seems WaitLSNWakeup() > > > does so itself? > > > > > > /* > > > * Fast path check. Skip if currentLSN is InvalidXLogRecPtr, which means > > > * "wake all waiters" (e.g., during promotion when recovery ends). > > > */ > > > if (XLogRecPtrIsValid(currentLSN) && > > > pg_atomic_read_u64(&waitLSNState->minWaitedLSN[i]) > currentLSN) > > > return; > > > > > > And why is the code checking if waitLSNState is non-NULL? > > > > > > > These fast checks are unnecessary copy-pastos and waitLSNState checks > > also do not make sense except for the one in WaitLSNCleanup. > > > > I'll prepare a patch set addressing them. > > Thanks to Tom and Andres for catching these issues! > I'm planning to work on this during this evening. I'll review > Xuneng's patches (or write my own if they wouldn't arrive yet). > I’ve posted two patches. The first fixes the duplication issue reported by Andres and is fairly straightforward. The second turned out to be more complex than expected, and I’m still working through possible solutions. Feedback or alternative approaches would be very helpful. I also spent some time drafting a patch to address the memory ordering issue and will post it later. -- Best, Xuneng
pgsql-hackers by date: