Re: Add WALRCV_CONNECTING state to walreceiver - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Add WALRCV_CONNECTING state to walreceiver
Date
Msg-id 20251214051422.b2.nmisch@google.com
Whole thread Raw
In response to Re: Add WALRCV_CONNECTING state to walreceiver  (Xuneng Zhou <xunengzhou@gmail.com>)
Responses Re: Add WALRCV_CONNECTING state to walreceiver
List pgsql-hackers
On Sun, Dec 14, 2025 at 12:45:46PM +0800, Xuneng Zhou wrote:
> On Fri, Dec 12, 2025 at 9:52 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > On Fri, Dec 12, 2025 at 4:45 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > On Fri, Dec 12, 2025 at 1:05 PM Noah Misch <noah@leadboat.com> wrote:
> > > > Waiting for applyPtr to advance
> > > > would avoid the short-lived STREAMING.  What's the feasibility of that?
> > >
> > > I think this could work, but with complications. If replay latency is
> > > high or replay is paused with pg_wal_replay_pause, the WalReceiver
> > > would stay in the CONNECTING state longer than expected. Whether this
> > > is ok depends on the definition of the 'connecting' state. For the
> > > implementation, deciding where and when to check applyPtr against LSNs
> > > like receiveStart is more difficult—the WalReceiver doesn't know when
> > > applyPtr advances. While the WalReceiver can read applyPtr from shared
> > > memory, it isn't automatically notified when that pointer advances.
> > > This leads to latency between checking and replay if this is done in
> > > the WalReceiver part unless we let the startup process set the state,
> > > which would couple the two components. Am I missing something here?
> >
> > After some thoughts, a potential approach could be to expose a new
> > function in the WAL receiver that transitions the state from
> > CONNECTING to STREAMING. This function can then be invoked directly
> > from WaitForWALToBecomeAvailable in the startup process, ensuring the
> > state change aligns with the actual acceptance of the WAL stream.
> 
> V2 makes the transition from WALRCV_CONNECTING to STREAMING only when
> the first valid WAL record is processed by the startup process. A new
> function WalRcvSetStreaming is introduced to enable the transition.

The original patch set STREAMING in XLogWalRcvFlush().  XLogWalRcvFlush()
callee XLogWalRcvSendReply() already fetches applyPtr to send a status
message.  So I would try the following before involving the startup process
like v2 does:

1. store the applyPtr when we enter CONNECTING
2. force a status message as long as we remain in CONNECTING
3. become STREAMING when applyPtr differs from the one stored at (1)

A possible issue with all patch versions: when the primary is writing no WAL
and the standby was caught up before this walreceiver started, CONNECTING
could persist for an unbounded amount of time.  Only actual primary WAL
generation would move the walreceiver to STREAMING.  This relates to your
above point about high latency.  If that's a concern, perhaps this change
deserves a total of two new states, CONNECTING and a state that represents
"connection exists, no WAL yet applied"?



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: [PATCH] O_CLOEXEC not honored on Windows - handle inheritance chain
Next
From: Pavel Stehule
Date:
Subject: Re: [PROPOSAL] Termination of Background Workers for ALTER/DROP DATABASE