Re: Implement waiting for wal lsn replay: reloaded - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Implement waiting for wal lsn replay: reloaded
Date
Msg-id zqbppucpmkeqecfy4s5kscnru4tbk6khp3ozqz6ad2zijz354k@w4bdf4z3wqoz
Whole thread
In response to Re: Implement waiting for wal lsn replay: reloaded  (Xuneng Zhou <xunengzhou@gmail.com>)
Responses Re: Implement waiting for wal lsn replay: reloaded
List pgsql-hackers
Hi,

On 2026-04-07 10:28:52 +0800, Xuneng Zhou wrote:
> On Tue, Apr 7, 2026 at 10:08 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > I wrote:
> > > I wondered why my buildfarm animals got noticeably slower today.
> > > There seem to be a couple of culprits, but one of them is that
> > > 7e8aeb9e4 (Use WAIT FOR LSN) has caused the runtime of pg_rewind's
> > > t/003_extrafiles.pl to go through the roof.  On indri's host, that
> > > TAP test took about 3 seconds immediately before that commit, and
> > > about 45 seconds immediately after.
> >
> > I'm wrong: there's only one culprit.  The other big change in runtime
> > today is that src/test/recovery's t/033_replay_tsp_drops.pl went from
> > about 4 seconds to about 46, and that jump also happened at 7e8aeb9e4.
> > So we still have a mystery, but it's "what do those two tests have in
> > common that is shared by no others?".
> >
> Thanks for reporting this. I think it could be related to the read of
> not-yet-updated writtenUpto position.  I'll look into this and propose
> a fix shortly.

Yes, it's not yet initialized and thus the WAIT FOR waits, even though the
position had already been reached.


But, leaving that aside, looking at this code I'm somewhat concerned - it
seems to not worry at all about memory ordering?


static void
XLogWalRcvWrite(char *buf, Size nbytes, XLogRecPtr recptr, TimeLineID tli)
...
    /* Update shared-memory status */
    pg_atomic_write_u64(&WalRcv->writtenUpto, LogstreamResult.Write);

    /*
     * If we wrote an LSN that someone was waiting for, notify the waiters.
     */
    if (waitLSNState &&
        (LogstreamResult.Write >=
         pg_atomic_read_u64(&waitLSNState->minWaitedLSN[WAIT_LSN_TYPE_STANDBY_WRITE])))
        WaitLSNWakeup(WAIT_LSN_TYPE_STANDBY_WRITE, LogstreamResult.Write);

There are no memory barriers here, so the CPU would be entirely free to not
make the writtenUpto write visible to a waiter that's in the process of
registering and is checking whether it needs to wait in WaitForLSN().

And WaitForLSN()->GetCurrentLSNForWaitType()->GetWalRcvWriteRecPtr() also has
no barriers.  That MAYBE is ok, due addLSNWaiter() providing the barrier at
loop entry and maybe kinda you can think that WaitLatch() will somehow also
have barrier semantic.  But if so, that would need to be very carefully
documented.  And it seems completely unnecessary here, it's hard to believe
using a barrier (via pg_atomic_read_membarrier_u64() or such) would be a
performance issue

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Shmem allocated wrong for custom cumulative stats
Next
From: Haibo Yan
Date:
Subject: Re: Extract numeric filed in JSONB more effectively