Hi Chao,
Thanks for looking into this.
>
> Hi Xuneng,
>
> I just reviewed the patch. It splits the current STREAMING state into CONNECTING and STREAMING, where CONNECTING
representsthe connection establishing phase, which makes sense. The patch is solid, I have just a few small comments:
>
> 1 - 0001
> ```
> + <para>
> + <literal>stopped</literal>: WAL receiver process is not running.
> + This state is normally not visible because the view returns no row
> + when the WAL receiver is not running.
> + </para>
> ```
>
> “Is not running” appears twice in this short paragraph, looks redundant. Suggestion:
> ```
> <literal>stopped</literal>: WAL receiver process is not running.
> This state is normally not visible because the view returns no row in this case.
> ```
>
> 2 - 0002
> ```
> + WALRCV_CONNECTING, /* connecting to primary, replication not yet
> + * started */
> ```
>
> I think “primary” is inaccurate. A connection destination could be a primary, or the other standby, so maybe change
“primary”to “upstream server”.
Yeah, "upstream server" is more precise. There are multiple
user-facing log messages in this file that also use “primary”. I’m
wondering whether we should update those as well.
>
>
> 3 - 0002
> ```
> * Mark walreceiver as running in shared memory.
> ```
>
> I think we should update this comment, changing “running” to “connecting”. There is not a “running” state, before
thispatch, “streaming” can roughly equal to “running”, after this patch, “running” is kinda unclear.
>
It has been changed.
> 4 - 0002
> ```
> + /* Connection established, switch to streaming state */
> + SpinLockAcquire(&walrcv->mutex);
> + Assert(walrcv->walRcvState == WALRCV_CONNECTING);
> + walrcv->walRcvState = WALRCV_STREAMING;
> + SpinLockRelease(&walrcv->mutex);
> ```
>
> I don’t feel good with this Assert(). Looking at ShutdownWalRcv(), the startup process might set the state to
STOPPING,so there is a race.
>
> Before this patch, this is no state change here, thus rest logic can handle STOPPING. After this patch, if the race
occurs,in dev mode, the Assert is fired; in production mode, STOPPING is overwritten by STREAMING, which is wrong.
>
> So, instead of Assert(), I think we should check if current state is CONNECTING, if not, it should not proceed.
It makes sense to me. Please check the updated patch.
--
Best,
Xuneng