Re: Add WALRCV_CONNECTING state to walreceiver - Mailing list pgsql-hackers

From Xuneng Zhou
Subject Re: Add WALRCV_CONNECTING state to walreceiver
Date
Msg-id CABPTF7UkUUxy6z8a2fcOkkxG=OgG1Ae0fJxnr7syz3wX5KjO6g@mail.gmail.com
Whole thread Raw
In response to Re: Add WALRCV_CONNECTING state to walreceiver  (Xuneng Zhou <xunengzhou@gmail.com>)
Responses Re: Add WALRCV_CONNECTING state to walreceiver
List pgsql-hackers
Hi,

On Fri, Dec 12, 2025 at 9:52 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Fri, Dec 12, 2025 at 4:45 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi Noah,
> >
> > On Fri, Dec 12, 2025 at 1:05 PM Noah Misch <noah@leadboat.com> wrote:
> > >
> > > On Fri, Dec 12, 2025 at 12:51:00PM +0800, Xuneng Zhou wrote:
> > > > Bug #19093 [1] reported that pg_stat_wal_receiver.status = 'streaming'
> > > > does not accurately reflect streaming health.  In that discussion,
> > > > Noah noted that even before the reported regression, status =
> > > > 'streaming' was unreliable because walreceiver sets it during early
> > > > startup, before attempting a connection. He suggested:
> > > >
> > > > "Long-term, in master only, perhaps we should introduce another status
> > > > like 'connecting'. Perhaps enact the connecting->streaming status
> > > > transition just before tendering the first byte of streamed WAL to the
> > > > startup process. Alternatively, enact that transition when the startup
> > > > process accepts the
> > > > first streamed byte."
> > >
> > > > == Proposal ==
> > > >
> > > > Introduce WALRCV_CONNECTING as an intermediate state between STARTING
> > > > and STREAMING:
> > > >
> > > > - When walreceiver starts, it enters CONNECTING (instead of going
> > > > directly to STREAMING).
> > > > - The transition to STREAMING occurs in XLogWalRcvFlush(), inside the
> > > > existing spinlock-protected block that updates flushedUpto.
> > >
> > > I think this has the drawback that if the primary's WAL is incompatible,
> > > e.g. unacceptable timeline, the walreceiver will still briefly enter
> > > STREAMING.  That could trick monitoring.
> >
> > Thanks for pointing this out.
> >
> >  Waiting for applyPtr to advance
> > > would avoid the short-lived STREAMING.  What's the feasibility of that?
> >
> > I think this could work, but with complications. If replay latency is
> > high or replay is paused with pg_wal_replay_pause, the WalReceiver
> > would stay in the CONNECTING state longer than expected. Whether this
> > is ok depends on the definition of the 'connecting' state. For the
> > implementation, deciding where and when to check applyPtr against LSNs
> > like receiveStart is more difficult—the WalReceiver doesn't know when
> > applyPtr advances. While the WalReceiver can read applyPtr from shared
> > memory, it isn't automatically notified when that pointer advances.
> > This leads to latency between checking and replay if this is done in
> > the WalReceiver part unless we let the startup process set the state,
> > which would couple the two components. Am I missing something here?
> >
>
> After some thoughts, a potential approach could be to expose a new
> function in the WAL receiver that transitions the state from
> CONNECTING to STREAMING. This function can then be invoked directly
> from WaitForWALToBecomeAvailable in the startup process, ensuring the
> state change aligns with the actual acceptance of the WAL stream.
>

V2 makes the transition from WALRCV_CONNECTING to STREAMING only when
the first valid WAL record is processed by the startup process. A new
function WalRcvSetStreaming is introduced to enable the transition.

--
Best,
Xuneng

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Fix documentation from recent test_custom_stats commit
Next
From: Thomas Munro
Date:
Subject: Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain