Re: Time delayed LR (WAS Re: logical replication restrictions) - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Time delayed LR (WAS Re: logical replication restrictions)
Date
Msg-id CAA4eK1KcJQCyX=sVLNDj=opU=8VbnxFdEiEvAV_OGEzBravUYw@mail.gmail.com
Whole thread Raw
In response to RE: Time delayed LR (WAS Re: logical replication restrictions)  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
List pgsql-hackers
On Fri, Dec 9, 2022 at 10:49 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> Hi Vignesh,
>
> > In the case of physical replication by setting
> > recovery_min_apply_delay, I noticed that both primary and standby
> > nodes were getting stopped successfully immediately after the stop
> > server command. In case of logical replication, stop server fails:
> > pg_ctl -D publisher -l publisher.log stop -c
> > waiting for server to shut
> > down...............................................................
> > failed
> > pg_ctl: server does not shut down
> >
> > In case of logical replication, the server does not get stopped
> > because the walsender process is not able to exit:
> > ps ux | grep walsender
> > vignesh  1950789 75.3  0.0 8695216 22284 ?       Rs   11:51   1:08
> > postgres: walsender vignesh [local] START_REPLICATION
>
> Thanks for reporting the issue. I analyzed about it.
>
>
> This issue has occurred because the apply worker cannot reply during the delay.
> I think we may have to modify the mechanism that delays applying transactions.
>
> When walsender processes are requested to shut down, it can shut down only after
> that all the sent WALs are replicated on the subscriber. This check is done in
> WalSndDone(), and the replicated position will be updated when processes handle
> the reply messages from a subscriber, in ProcessStandbyReplyMessage().
>
> In the case of physical replication, the walreciever can receive WALs and reply
> even if the application is delayed. It means that the replicated position will
> be transported to the publisher side immediately. So the walsender can exit.
>

I think it is not only the replicated positions but it also checks if
there is any pending send in WalSndDone(). Why is it a must to send
all pending WAL and confirm that it is flushed on standby before the
shutdown for physical standby? Is it because otherwise, we may lose
the required WAL? I am asking because it is better to see if those
conditions apply to logical replication as well.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: Time delayed LR (WAS Re: logical replication restrictions)
Next
From: Amit Kapila
Date:
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)