Re: Implement waiting for wal lsn replay: reloaded - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Implement waiting for wal lsn replay: reloaded
Date
Msg-id d2ogsp54if47w35kx4vu7o3nkfghpuzctwbaudzey5brxavomw@bgfs3p7twoib
Whole thread
In response to Re: Implement waiting for wal lsn replay: reloaded  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: Implement waiting for wal lsn replay: reloaded
List pgsql-hackers
On 2026-04-09 18:21:24 +0300, Alexander Korotkov wrote:
> I've assembled all the pending patches together.
> 0001 adds memory barrier to GetWalRcvWriteRecPtr() as suggested by
> Andres off-list.

I'd make it a pg_atomic_read_membarrier_u64().


> 0002 is basically [1] by Xuneng, but revised given we have a memory
> barrier in 0001, and my proposal to do ResetLatch() unconditionally
> similar to our other Latch-based loops.
> 0003 and 0004 are [2] by Xuneng.
> 0005 is [3] by Xuneng.
> 
> I'm going to add them to Commitfest to run CI over them, and have a
> closer look over them tomorrow.

Briefly skimming the patches, none makes the writes to writtenUpto use
something bearing barrier semantics. I'd just make both of them a
pg_atomic_write_membarrier_u64().


I think this also needs a few more tests, e.g. for the scenario that
29e7dbf5e4d fixed.  I think it'd also be good to do some testing for
off-by-one dangers. E.g. making sure that we don't stop waiting too early /
too late.  Another one that I think might deserve more testing is waits on the
standby while crossing timeline boundaries.



> From 0e5b4d1b9311a628a70218d89abf12308c9d782f Mon Sep 17 00:00:00 2001
> From: Alexander Korotkov <akorotkov@postgresql.org>
> Date: Thu, 9 Apr 2026 16:49:04 +0300
> Subject: [PATCH v3 1/5] Add a memory barrier to GetWalRcvWriteRecPtr()
> 
> Add pg_memory_barrier() before reading writtenUpto so that callers see
> up-to-date shared memory state.  This matches the barrier semantics that
> GetWalRcvFlushRecPtr() and other LSN-position functions get implicitly from
> their spinlock acquire/release, and in turn protects from bugs caused by
> expectations of similar barrier guarantees from different LSN-position functions.
> 
> Reported-by: Andres Freund <andres@anarazel.de>
> Discussion: https://postgr.es/m/zqbppucpmkeqecfy4s5kscnru4tbk6khp3ozqz6ad2zijz354k%40w4bdf4z3wqoz
> ---
>  src/backend/replication/walreceiverfuncs.c | 12 ++++++++++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/src/backend/replication/walreceiverfuncs.c b/src/backend/replication/walreceiverfuncs.c
> index bd5d47be964..0408ddff43e 100644
> --- a/src/backend/replication/walreceiverfuncs.c
> +++ b/src/backend/replication/walreceiverfuncs.c
> @@ -363,14 +363,22 @@ GetWalRcvFlushRecPtr(XLogRecPtr *latestChunkStart, TimeLineID *receiveTLI)
>  
>  /*
>   * Returns the last+1 byte position that walreceiver has written.
> - * This returns a recently written value without taking a lock.
> + *
> + * Use a memory barrier to ensure that callers see up-to-date shared memory
> + * state, matching the barrier semantics provided by the spinlock in
> + * GetWalRcvFlushRecPtr() and other LSN-position functions.
>   */
>  XLogRecPtr
>  GetWalRcvWriteRecPtr(void)
>  {
>      WalRcvData *walrcv = WalRcv;
> +    XLogRecPtr    recptr;
> +
> +    pg_memory_barrier();
>  
> -    return pg_atomic_read_u64(&walrcv->writtenUpto);
> +    recptr = pg_atomic_read_u64(&walrcv->writtenUpto);
> +
> +    return recptr;
>  }
>  
>  /*
> -- 
> 2.39.5 (Apple Git-154)
> 

> Subject: [PATCH v3 2/5] Fix memory ordering in WAIT FOR LSN wakeup mechanism

> +    /*
> +     * Ensure the waker's prior position store (writtenUpto, flushedUpto,
> +     * lastReplayedEndRecPtr, etc.) is globally visible before we read
> +     * minWaitedLSN.  Without this barrier, the CPU could load minWaitedLSN
> +     * before draining the position store, leaving the position invisible to a
> +     * concurrently-registering waiter.
> +     *
> +     * This is the waker side of a Dekker-style handshake; pairs with
> +     * pg_memory_barrier() in GetCurrentLSNForWaitType() on the waiter side.
> +     */
> +    pg_memory_barrier();
> +
>      /*
>       * Fast path check.  Skip if currentLSN is InvalidXLogRecPtr, which means
>       * "wake all waiters" (e.g., during promotion when recovery ends).

I'd also make this a pg_atomic_read_membarrier_u64() and the write a
pg_atomic_write_membarrier_u64().  It's a lot easier to reason about this
stuff if you make sure that the individual reads / write pair and have
ordering implied.



Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Next
From: Nathan Bossart
Date:
Subject: Re: Add pg_stat_autovacuum_priority