Hi,
On Fri, Dec 12, 2025 at 9:52 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>
> Hi,
>
> On Fri, Dec 12, 2025 at 4:45 PM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> >
> > Hi Noah,
> >
> > On Fri, Dec 12, 2025 at 1:05 PM Noah Misch <noah@leadboat.com> wrote:
> > >
> > > On Fri, Dec 12, 2025 at 12:51:00PM +0800, Xuneng Zhou wrote:
> > > > Bug #19093 [1] reported that pg_stat_wal_receiver.status = 'streaming'
> > > > does not accurately reflect streaming health. In that discussion,
> > > > Noah noted that even before the reported regression, status =
> > > > 'streaming' was unreliable because walreceiver sets it during early
> > > > startup, before attempting a connection. He suggested:
> > > >
> > > > "Long-term, in master only, perhaps we should introduce another status
> > > > like 'connecting'. Perhaps enact the connecting->streaming status
> > > > transition just before tendering the first byte of streamed WAL to the
> > > > startup process. Alternatively, enact that transition when the startup
> > > > process accepts the
> > > > first streamed byte."
> > >
> > > > == Proposal ==
> > > >
> > > > Introduce WALRCV_CONNECTING as an intermediate state between STARTING
> > > > and STREAMING:
> > > >
> > > > - When walreceiver starts, it enters CONNECTING (instead of going
> > > > directly to STREAMING).
> > > > - The transition to STREAMING occurs in XLogWalRcvFlush(), inside the
> > > > existing spinlock-protected block that updates flushedUpto.
> > >
> > > I think this has the drawback that if the primary's WAL is incompatible,
> > > e.g. unacceptable timeline, the walreceiver will still briefly enter
> > > STREAMING. That could trick monitoring.
> >
> > Thanks for pointing this out.
> >
> > Waiting for applyPtr to advance
> > > would avoid the short-lived STREAMING. What's the feasibility of that?
> >
> > I think this could work, but with complications. If replay latency is
> > high or replay is paused with pg_wal_replay_pause, the WalReceiver
> > would stay in the CONNECTING state longer than expected. Whether this
> > is ok depends on the definition of the 'connecting' state. For the
> > implementation, deciding where and when to check applyPtr against LSNs
> > like receiveStart is more difficult—the WalReceiver doesn't know when
> > applyPtr advances. While the WalReceiver can read applyPtr from shared
> > memory, it isn't automatically notified when that pointer advances.
> > This leads to latency between checking and replay if this is done in
> > the WalReceiver part unless we let the startup process set the state,
> > which would couple the two components. Am I missing something here?
> >
>
> After some thoughts, a potential approach could be to expose a new
> function in the WAL receiver that transitions the state from
> CONNECTING to STREAMING. This function can then be invoked directly
> from WaitForWALToBecomeAvailable in the startup process, ensuring the
> state change aligns with the actual acceptance of the WAL stream.
>
V2 makes the transition from WALRCV_CONNECTING to STREAMING only when
the first valid WAL record is processed by the startup process. A new
function WalRcvSetStreaming is introduced to enable the transition.
--
Best,
Xuneng