Thread: walsender.c fileheader comment

walsender.c fileheader comment

From
Peter Smith
Date:
Hi,

I was reading the walsender.c fileheader comment while studying
another thread. I think if there is logical replication in progress
then the PROCSIG_WALSND_INIT_STOPPING handler will *always* switch to
a "stopping" state: e.g.,

/*
 * Handle PROCSIG_WALSND_INIT_STOPPING signal.
 */
void
HandleWalSndInitStopping(void)
{
Assert(am_walsender);

/*
* If replication has not yet started, die like with SIGTERM. If
* replication is active, only set a flag and wake up the main loop. It
* will send any outstanding WAL, wait for it to be replicated to the
* standby, and then exit gracefully.
*/
if (!replication_active)
kill(MyProcPid, SIGTERM);
else
got_STOPPING = true;
}

~~~

But the walsender.c fileheader comment seems to be saying something
slightly different. IIUC, some minor rewording of the comment is
needed so it describes the code better.

HEAD
...
 * shutdown, if logical replication is in progress all existing WAL records
 * are processed followed by a shutdown.  Otherwise this causes the walsender
 * to switch to the "stopping" state. In this state, the walsender will reject
 * any further replication commands. The checkpointer begins the shutdown
 ...

SUGGESTION
.. shutdown. If logical replication is in progress, the walsender
switches to a "stopping" state. In this state, the walsender will
reject any further replication commands - but all existing WAL records
are processed - followed by a shutdown.

~~~

I attached a patch for the above-suggested change.

Thoughts?

======
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: walsender.c fileheader comment

From
Tomas Vondra
Date:

On 6/11/24 04:35, Peter Smith wrote:
> Hi,
> 
> I was reading the walsender.c fileheader comment while studying
> another thread. I think if there is logical replication in progress
> then the PROCSIG_WALSND_INIT_STOPPING handler will *always* switch to
> a "stopping" state: e.g.,
> 
> /*
>  * Handle PROCSIG_WALSND_INIT_STOPPING signal.
>  */
> void
> HandleWalSndInitStopping(void)
> {
> Assert(am_walsender);
> 
> /*
> * If replication has not yet started, die like with SIGTERM. If
> * replication is active, only set a flag and wake up the main loop. It
> * will send any outstanding WAL, wait for it to be replicated to the
> * standby, and then exit gracefully.
> */
> if (!replication_active)
> kill(MyProcPid, SIGTERM);
> else
> got_STOPPING = true;
> }
> 
> ~~~
> 
> But the walsender.c fileheader comment seems to be saying something
> slightly different. IIUC, some minor rewording of the comment is
> needed so it describes the code better.
> 
> HEAD
> ...
>  * shutdown, if logical replication is in progress all existing WAL records
>  * are processed followed by a shutdown.  Otherwise this causes the walsender
>  * to switch to the "stopping" state. In this state, the walsender will reject
>  * any further replication commands. The checkpointer begins the shutdown
>  ...
> 
> SUGGESTION
> .. shutdown. If logical replication is in progress, the walsender
> switches to a "stopping" state. In this state, the walsender will
> reject any further replication commands - but all existing WAL records
> are processed - followed by a shutdown.
> 
> ~~~
>
> I attached a patch for the above-suggested change.
>
> Thoughts?

I did look at this, and while the explanation in the current comment may
seem a bit confusing, I'm not sure the suggested changes improve the
situation very much.

This suggests the two comments somehow disagree, but it does not say in
what exactly, so perhaps I just missed it :-(

ISTM there's a bit of confusion what is meant by "stopping" state - you
seem to be interpreting it as a general concept, where the walsender is
requested to stop (through the signal), and starts doing stuff to exit.
But the comments actually talk about WalSnd->state, where "stopping"
means it needs to be set to WALSNDSTATE_STOPPING.

And we only ever switch to that state in two places - in WalSndPhysical
and exec_replication_command. And that does not happen in regular
logical replication (which is what "logical replication is in progress"
refers to) - if you have a walsender just replicating DML, it will never
see the WALSNDSTATE_STOPPING state. It will simply do the cleanup while
still in WALSNDSTATE_STREAMING state, and then just exit.

So from this point of view, the suggestion is actually wrong.

To conclude, I think this probably makes the comments more confusing. If
we want to make it clearer, I'd probably start by clarifying what the
"stopping" state means. Also, it's a bit surprising we may not actually
go through the "stopping" state during shutdown.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: walsender.c fileheader comment

From
Peter Smith
Date:
Hi, Thankyou for taking the time to look at this and reply.

>
> I did look at this, and while the explanation in the current comment may
> seem a bit confusing, I'm not sure the suggested changes improve the
> situation very much.
>
> This suggests the two comments somehow disagree, but it does not say in
> what exactly, so perhaps I just missed it :-(
>
> ISTM there's a bit of confusion what is meant by "stopping" state - you
> seem to be interpreting it as a general concept, where the walsender is
> requested to stop (through the signal), and starts doing stuff to exit.
> But the comments actually talk about WalSnd->state, where "stopping"
> means it needs to be set to WALSNDSTATE_STOPPING.

Yes, I interpreted the "stopping" state meaning as when the boolean
flag 'got_STOPPING' is assigned true.

>
> And we only ever switch to that state in two places - in WalSndPhysical
> and exec_replication_command. And that does not happen in regular
> logical replication (which is what "logical replication is in progress"
> refers to) - if you have a walsender just replicating DML, it will never
> see the WALSNDSTATE_STOPPING state. It will simply do the cleanup while
> still in WALSNDSTATE_STREAMING state, and then just exit.
>
> So from this point of view, the suggestion is actually wrong.

OK.

>
> To conclude, I think this probably makes the comments more confusing. If
> we want to make it clearer, I'd probably start by clarifying what the
> "stopping" state means. Also, it's a bit surprising we may not actually
> go through the "stopping" state during shutdown.
>

I agree. My interpretation of the (ambiguous) "stopping" state led me
to believe the comment was quite wrong. So, this thread was only
intended as a trivial comment fix in passing but clearly there is more
to this than I anticipated. I would be happy if someone with more
knowledge about the WALSNDSTATE_STOPPING versus got_STOPPING could
disambiguate the file header comment, but that's not me, so I have
withdrawn this from the Commitfest.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: walsender.c fileheader comment

From
Tomas Vondra
Date:
On 7/19/24 07:02, Peter Smith wrote:
> ...
>>
>> To conclude, I think this probably makes the comments more confusing. If
>> we want to make it clearer, I'd probably start by clarifying what the
>> "stopping" state means. Also, it's a bit surprising we may not actually
>> go through the "stopping" state during shutdown.
>>
> 
> I agree. My interpretation of the (ambiguous) "stopping" state led me
> to believe the comment was quite wrong. So, this thread was only
> intended as a trivial comment fix in passing but clearly there is more
> to this than I anticipated. I would be happy if someone with more
> knowledge about the WALSNDSTATE_STOPPING versus got_STOPPING could
> disambiguate the file header comment, but that's not me, so I have
> withdrawn this from the Commitfest.
> 

Understood. Thanks for the patch anyway, I appreciate you took the time
to try to improve the comments!

I agree the state transitions in walsender are not very clear, and the
fact that it may shutdown without ever going through STOPPING state is
quite confusing. That being said, I personally don't have ambition to
improve this.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company