Re: Time delayed LR (WAS Re: logical replication restrictions) - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Time delayed LR (WAS Re: logical replication restrictions)
Date
Msg-id CAA4eK1Lq+h8qo+rqGU-E+hwJKAHYocV54y4pvou4rLysCgYD-g@mail.gmail.com
Whole thread Raw
In response to Re: Time delayed LR (WAS Re: logical replication restrictions)  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Time delayed LR (WAS Re: logical replication restrictions)
RE: Time delayed LR (WAS Re: logical replication restrictions)
List pgsql-hackers
On Thu, Dec 15, 2022 at 7:16 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Wed, 14 Dec 2022 10:46:17 +0000, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote in
> > I have implemented and tested that workers wake up per wal_receiver_timeout/2
> > and send keepalive. Basically it works well, but I found two problems.
> > Do you have any good suggestions about them?
> >
> > 1)
> >
> > With this PoC at present, workers calculate sending intervals based on its
> > wal_receiver_timeout, and it is suppressed when the parameter is set to zero.
> >
> > This means that there is a possibility that walsender is timeout when wal_sender_timeout
> > in publisher and wal_receiver_timeout in subscriber is different.
> > Supposing that wal_sender_timeout is 2min, wal_receiver_tiemout is 5min,
>
> It seems to me wal_receiver_status_interval is better for this use.
> It's enough for us to docuemnt that "wal_r_s_interval should be
> shorter than wal_sener_timeout/2 especially when logical replication
> connection is using min_apply_delay. Otherwise you will suffer
> repeated termination of walsender".
>

This sounds reasonable to me.

> > and min_apply_delay is 10min. The worker on subscriber will wake up per 2.5min and
> > send keepalives, but walsender exits before the message arrives to publisher.
> >
> > One idea to avoid that is to send the min_apply_delay subscriber option to publisher
> > and compare them, but it may be not sufficient. Because XXX_timout GUC parameters
> > could be modified later.
>
> # Anyway, I don't think such asymmetric setup is preferable.
>
> > 2)
> >
> > The issue reported by Vignesh-san[1] has still remained. I have already analyzed that [2],
> > the root cause is that flushed WAL is not updated and sent to the publisher. Even
> > if workers send keepalive messages to pub during the delay, the flushed position
> > cannot be modified.
>
> I didn't look closer but the cause I guess is walsender doesn't die
> until all WAL has been sent, while logical delay chokes replication
> stream.
>

Right, I also think so.

> Allowing walsender to finish ignoring replication status
> wouldn't be great.
>

Yes, that would be ideal. But do you know why that is a must?

>  One idea is to let logical workers send delaying
> status.
>

How can that help?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)