Re: Implement waiting for wal lsn replay: reloaded - Mailing list pgsql-hackers

From Xuneng Zhou
Subject Re: Implement waiting for wal lsn replay: reloaded
Date
Msg-id CABPTF7X0iV=kGC4gjsTj4NvK_NNEJGM3YTc7Obxs5GOiYoMhEw@mail.gmail.com
Whole thread
In response to Re: Implement waiting for wal lsn replay: reloaded  (Alexander Korotkov <aekorotkov@gmail.com>)
Responses Re: Implement waiting for wal lsn replay: reloaded
List pgsql-hackers
On Wed, Apr 8, 2026 at 7:23 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> Hi, Xuneng!
>
> > Here is some analysis of the issue reported by Tom:
> >
> > 1) The problem
> >
> > WAIT FOR LSN with standby_write or standby_flush mode can block
> > indefinitely on an idle primary even when the target LSN is already
> > satisfied by WAL on disk.
> >
> > The walreceiver initializes its process-local LogstreamResult.Write
> > and LogstreamResult.Flush from GetXLogReplayRecPtr() at connect time,
> > reflecting all WAL already present on the standby (from a base backup,
> > archive restore, or prior streaming). The shared-memory positions used
> > by WAIT FOR LSN, however, are not seeded from this value:
> >
> > WalRcv->writtenUpto is zero-initialized by ShmemInitStruct and remains
> > 0 until XLogWalRcvWrite() processes incoming streaming data.
> > WalRcv->flushedUpto is initialized to the segment-aligned streaming
> > start point by RequestXLogStreaming(), which may be significantly
> > behind the replay position. It advances only when XLogWalRcvFlush()
> > processes new data — which itself requires LogstreamResult.Flush <
> > LogstreamResult.Write, a condition that never holds at startup since
> > both fields are initialized to the same value.
> >
> > When the primary is idle and sends no new WAL, both positions stay at
> > their initial stale values indefinitely.
> >
> > 2) The fix
> > Seed writtenUpto and flushedUpto from LogstreamResult immediately
> > after the walreceiver initializes those process-local fields, then
> > call WaitLSNWakeup() to wake any already-blocked waiters.
> >
> > This broadens the semantics of these fields. writtenUpto and
> > flushedUpto  used to track only WAL written or flushed by the current
> > walreceiver session — WAL received from the primary since the most
> > recent connect. After this change, they are initialized to the replay
> > position, so they also cover WAL that was already on disk before
> > streaming began. This affects pg_stat_wal_receiver.written_lsn and
> > flushed_lsn, which will now report the replay position immediately at
> > walreceiver startup rather than 0 and the segment boundary
> > respectively. I am still considering whether this semantic change is
> > acceptable though it does shorten the runtime of the tap tests
> > reported by Tom in my test. Another approach is to modify the logic of
> > GetCurrentLSNForWaitType to cope with this special case and leave the
> > publisher side alone without changing the semantics. But this seems to
> > be more subtle.
>
> Patch 0001 looks OK for me.
> Regarding patch 0002.  Changes made for GetCurrentLSNForWaitType()
> looks reliable for me.  PerformWalRecovery() sets replayed positions
> before starting recovery, and in turn before standby can accept
> connections.  So, changes to WalReceiverMain() don't look necessary to
> me.

Yeah, GetCurrentLSNForWaitType seems to be the right place to place
the fix. Please see the attached patch 2.

I also noticed another relevent problem:

During pure archive recovery (no walreceiver), a backend that issues
'WAIT FOR LSN ... MODE 'standby_write' with a target ahead of the
current replay position will sleep forever; the startup process
replays past the target but only wakes 'STANDBY_REPLAY' waiters.

This also affects mixed scenarios: the walreceiver may lag behind
replay (e.g., archive restore has delivered WAL faster than
streaming), so a 'standby_write' waiter could be waiting on WAL that
replay has already consumed.

I will write a patch to address this soon.

--
Best,
Xuneng

Attachment

pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: doc: Improve wal_level and effective_wal_level GUC around logical replication
Next
From: "David G. Johnston"
Date:
Subject: Re: doc: Improve wal_level and effective_wal_level GUC around logical replication