Re: Add WALRCV_CONNECTING state to walreceiver - Mailing list pgsql-hackers
| From | Xuneng Zhou |
|---|---|
| Subject | Re: Add WALRCV_CONNECTING state to walreceiver |
| Date | |
| Msg-id | CABPTF7Wk+hSpdE1crJ6yiB=WLJvYducJjY9ZKJ8a0VdW78ZuKg@mail.gmail.com Whole thread Raw |
| In response to | Re: Add WALRCV_CONNECTING state to walreceiver (Xuneng Zhou <xunengzhou@gmail.com>) |
| Responses |
Re: Add WALRCV_CONNECTING state to walreceiver
|
| List | pgsql-hackers |
Hi, On Mon, Jan 19, 2026 at 11:35 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > Hi Michael, > > On Mon, Jan 19, 2026 at 8:13 AM Michael Paquier <michael@paquier.xyz> wrote: > > > > On Sun, Jan 11, 2026 at 08:56:57PM +0800, Xuneng Zhou wrote: > > > After some thoughts, I’m more inclined toward a startup-process–driven > > > approach. Marking the status as streaming immediately after the > > > connection is established seems not provide sufficient accuracy for > > > monitoring purposes. Introducing an intermediate state, such as > > > connected, would help reduce confusion when the startup process is > > > stalled and would make it easier for users to detect and diagnose > > > anomalies. > > > > > > V4 whitelisted CONNECTED and CONNECTING in WalRcvWaitForStartPosition > > > to handle valid stream termination scenarios without triggering a > > > FATAL error. > > > > > > Specifically, the walreceiver may need to transition to WAITING (idle) if: > > > 1. 'CONNECTED': The handshake succeeded (COPY_BOTH started), but > > > the stream ended before any WAL was applied (e.g., timeline divergence > > > detected mid-stream). > > > 2. 'CONNECTING': The handshake completed (START_REPLICATION > > > acknowledged), but the primary declined to stream (e.g., no WAL > > > available on the requested timeline). > > > > > > In both cases, the receiver should pause and await a new timeline or > > > restart position from the startup process. > > > > This stuff depends on the philosophical difference you want to put > > behind "connecting" and "streaming". My own opinion is that it is a > > non-starter to introduce more states that can be set by the startup > > process, and that a new state should reflect what we do in the code. > > We already have some of that for in the start and stop phases because > > we want some ordering when the WAL receiver process is spawned and at > > shutdown. That's just a simple way to say that we should not rely on > > more static variables to control how to set one or more states, and I > > don't see why that's actually required here? initialApplyPtr and > > force_reply are what I see as potential recipes for more bugs in the > > long term, as showed in the first approach. The second patch, > > introducing a similar new complexity with walrcv_streaming_set, is no > > better in terms of complexity added. > > The main take that I can retrieve from this thread is that it may take > > time between the moment we begin a WAL receiver in WalReceiverMain(), > > where walRcvState is switched to WALRCV_STREAMING, and the moment we > > actually have established a connection, location where "first_stream = > > false" (which is just to track if a WAL receiver is restarting, > > actually) after walrcv_startstreaming() has returned true, so as far > > as I can see you would be happy enough with the addition of a single > > state called CONNECTING, set at the beginning of WalReceiverMain() > > instead of where STREAMING is set now. The same would sound kind of > > true for WalRcvWaitForStartPosition(), because we are not actively > > streaming yet, still we are marking the WAL receiver as streaming, so > > the current code feels like we are cheating as if we define > > "streaming" as a WAL receiver that has already done an active > > connection. We also want the WAL receiver to be killable by the > > startup process while in "connecting" or "streaming" more. > > > > Hence I would suggest something like the following guidelines: > > - Add only a CONNECTING state. Set this state where we switch the > > state to "streaming" now, aka the two locations in the tree now. > > - Switch to STREAMING once the connection has been established, as > > returned by walrcv_startstreaming(), because we are acknowledging *in > > the code* that we have started streaming successfully. > > - Update the docs to reflect the new state, because this state can > > show up in the system view pg_stat_wal_receiver. > > - I am not convinved by what we gain with a CONNECTED state, either. > > Drop it. > > - The fact that we'd want to switch the state once the startup process > > has acknowleged the reception of the first byte from the stream is > > already something we track in the WAL receiver, AFAIK. > > Thank you for the detailed feedback. I agree with your analysis — the > simpler approach seems preferable and should be sufficient in most > cases. Tightly coupling the startup process with the WAL receiver to > set state is not very ideal. I'll post v5 with the simplified > walreceiver changes as you suggested shortly. Please see v5 of the updated patch. -- Best, Xuneng
Attachment
pgsql-hackers by date: