Re: walsender.c fileheader comment - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: walsender.c fileheader comment |
Date | |
Msg-id | cdf6841c-a283-4f41-b0b0-a46819c827fa@enterprisedb.com Whole thread Raw |
In response to | walsender.c fileheader comment (Peter Smith <smithpb2250@gmail.com>) |
Responses |
Re: walsender.c fileheader comment
|
List | pgsql-hackers |
On 6/11/24 04:35, Peter Smith wrote: > Hi, > > I was reading the walsender.c fileheader comment while studying > another thread. I think if there is logical replication in progress > then the PROCSIG_WALSND_INIT_STOPPING handler will *always* switch to > a "stopping" state: e.g., > > /* > * Handle PROCSIG_WALSND_INIT_STOPPING signal. > */ > void > HandleWalSndInitStopping(void) > { > Assert(am_walsender); > > /* > * If replication has not yet started, die like with SIGTERM. If > * replication is active, only set a flag and wake up the main loop. It > * will send any outstanding WAL, wait for it to be replicated to the > * standby, and then exit gracefully. > */ > if (!replication_active) > kill(MyProcPid, SIGTERM); > else > got_STOPPING = true; > } > > ~~~ > > But the walsender.c fileheader comment seems to be saying something > slightly different. IIUC, some minor rewording of the comment is > needed so it describes the code better. > > HEAD > ... > * shutdown, if logical replication is in progress all existing WAL records > * are processed followed by a shutdown. Otherwise this causes the walsender > * to switch to the "stopping" state. In this state, the walsender will reject > * any further replication commands. The checkpointer begins the shutdown > ... > > SUGGESTION > .. shutdown. If logical replication is in progress, the walsender > switches to a "stopping" state. In this state, the walsender will > reject any further replication commands - but all existing WAL records > are processed - followed by a shutdown. > > ~~~ > > I attached a patch for the above-suggested change. > > Thoughts? I did look at this, and while the explanation in the current comment may seem a bit confusing, I'm not sure the suggested changes improve the situation very much. This suggests the two comments somehow disagree, but it does not say in what exactly, so perhaps I just missed it :-( ISTM there's a bit of confusion what is meant by "stopping" state - you seem to be interpreting it as a general concept, where the walsender is requested to stop (through the signal), and starts doing stuff to exit. But the comments actually talk about WalSnd->state, where "stopping" means it needs to be set to WALSNDSTATE_STOPPING. And we only ever switch to that state in two places - in WalSndPhysical and exec_replication_command. And that does not happen in regular logical replication (which is what "logical replication is in progress" refers to) - if you have a walsender just replicating DML, it will never see the WALSNDSTATE_STOPPING state. It will simply do the cleanup while still in WALSNDSTATE_STREAMING state, and then just exit. So from this point of view, the suggestion is actually wrong. To conclude, I think this probably makes the comments more confusing. If we want to make it clearer, I'd probably start by clarifying what the "stopping" state means. Also, it's a bit surprising we may not actually go through the "stopping" state during shutdown. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: