Re: Implement waiting for wal lsn replay: reloaded - Mailing list pgsql-hackers

From Xuneng Zhou
Subject Re: Implement waiting for wal lsn replay: reloaded
Date
Msg-id CABPTF7W7=uK0ypteDLRTiR4PnRP-BEUEJJtit+Q17aiPPhpLrw@mail.gmail.com
Whole thread
In response to Re: Implement waiting for wal lsn replay: reloaded  (Andres Freund <andres@anarazel.de>)
Responses Re: Implement waiting for wal lsn replay: reloaded
List pgsql-hackers
On Fri, Apr 10, 2026 at 12:18 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2026-04-09 18:21:24 +0300, Alexander Korotkov wrote:
> > I've assembled all the pending patches together.
> > 0001 adds memory barrier to GetWalRcvWriteRecPtr() as suggested by
> > Andres off-list.
>
> I'd make it a pg_atomic_read_membarrier_u64().
>
>
> > 0002 is basically [1] by Xuneng, but revised given we have a memory
> > barrier in 0001, and my proposal to do ResetLatch() unconditionally
> > similar to our other Latch-based loops.
> > 0003 and 0004 are [2] by Xuneng.
> > 0005 is [3] by Xuneng.
> >
> > I'm going to add them to Commitfest to run CI over them, and have a
> > closer look over them tomorrow.
>
> Briefly skimming the patches, none makes the writes to writtenUpto use
> something bearing barrier semantics. I'd just make both of them a
> pg_atomic_write_membarrier_u64().
>

Makes sense to me. Done.

> I think this also needs a few more tests, e.g. for the scenario that
> 29e7dbf5e4d fixed.  I think it'd also be good to do some testing for
> off-by-one dangers. E.g. making sure that we don't stop waiting too early /
> too late.  Another one that I think might deserve more testing is waits on the
> standby while crossing timeline boundaries.
>

I'll prepare a new patch for more test harnessing.

>
> > From 0e5b4d1b9311a628a70218d89abf12308c9d782f Mon Sep 17 00:00:00 2001
> > From: Alexander Korotkov <akorotkov@postgresql.org>
> > Date: Thu, 9 Apr 2026 16:49:04 +0300
> > Subject: [PATCH v3 1/5] Add a memory barrier to GetWalRcvWriteRecPtr()
> >
> > Add pg_memory_barrier() before reading writtenUpto so that callers see
> > up-to-date shared memory state.  This matches the barrier semantics that
> > GetWalRcvFlushRecPtr() and other LSN-position functions get implicitly from
> > their spinlock acquire/release, and in turn protects from bugs caused by
> > expectations of similar barrier guarantees from different LSN-position functions.
> >
> > Reported-by: Andres Freund <andres@anarazel.de>
> > Discussion: https://postgr.es/m/zqbppucpmkeqecfy4s5kscnru4tbk6khp3ozqz6ad2zijz354k%40w4bdf4z3wqoz
> > ---
> >  src/backend/replication/walreceiverfuncs.c | 12 ++++++++++--
> >  1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
> > index bd5d47be964..0408ddff43e 100644
> > --- a/src/backend/replication/walreceiverfuncs.c
> > +++ b/src/backend/replication/walreceiverfuncs.c
> > @@ -363,14 +363,22 @@ GetWalRcvFlushRecPtr(XLogRecPtr *latestChunkStart, TimeLineID *receiveTLI)
> >
> >  /*
> >   * Returns the last+1 byte position that walreceiver has written.
> > - * This returns a recently written value without taking a lock.
> > + *
> > + * Use a memory barrier to ensure that callers see up-to-date shared memory
> > + * state, matching the barrier semantics provided by the spinlock in
> > + * GetWalRcvFlushRecPtr() and other LSN-position functions.
> >   */
> >  XLogRecPtr
> >  GetWalRcvWriteRecPtr(void)
> >  {
> >       WalRcvData *walrcv = WalRcv;
> > +     XLogRecPtr      recptr;
> > +
> > +     pg_memory_barrier();
> >
> > -     return pg_atomic_read_u64(&walrcv->writtenUpto);
> > +     recptr = pg_atomic_read_u64(&walrcv->writtenUpto);
> > +
> > +     return recptr;
> >  }
> >
> >  /*
> > --
> > 2.39.5 (Apple Git-154)
> >
>
> > Subject: [PATCH v3 2/5] Fix memory ordering in WAIT FOR LSN wakeup mechanism
>
> > +     /*
> > +      * Ensure the waker's prior position store (writtenUpto, flushedUpto,
> > +      * lastReplayedEndRecPtr, etc.) is globally visible before we read
> > +      * minWaitedLSN.  Without this barrier, the CPU could load minWaitedLSN
> > +      * before draining the position store, leaving the position invisible to a
> > +      * concurrently-registering waiter.
> > +      *
> > +      * This is the waker side of a Dekker-style handshake; pairs with
> > +      * pg_memory_barrier() in GetCurrentLSNForWaitType() on the waiter side.
> > +      */
> > +     pg_memory_barrier();
> > +
> >       /*
> >        * Fast path check.  Skip if currentLSN is InvalidXLogRecPtr, which means
> >        * "wake all waiters" (e.g., during promotion when recovery ends).
>
> I'd also make this a pg_atomic_read_membarrier_u64() and the write a
> pg_atomic_write_membarrier_u64().  It's a lot easier to reason about this
> stuff if you make sure that the individual reads / write pair and have
> ordering implied.
>

It does look more selft-contained to me.

Here is the updated patch set based on Alexander’s earlier version.

--
Best,
Xuneng

Attachment

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: Unused injection point in hash agg code
Next
From: Amit Langote
Date:
Subject: Re: Eliminating SPI / SQL from some RI triggers - take 3