Re: Time delayed LR (WAS Re: logical replication restrictions) - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Time delayed LR (WAS Re: logical replication restrictions)
Date
Msg-id 20221215.104611.330470611359597283.horikyota.ntt@gmail.com
Whole thread Raw
In response to RE: Time delayed LR (WAS Re: logical replication restrictions)  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Responses Re: Time delayed LR (WAS Re: logical replication restrictions)  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
At Wed, 14 Dec 2022 10:46:17 +0000, "Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com> wrote in 
> I have implemented and tested that workers wake up per wal_receiver_timeout/2
> and send keepalive. Basically it works well, but I found two problems.
> Do you have any good suggestions about them?
> 
> 1)
> 
> With this PoC at present, workers calculate sending intervals based on its
> wal_receiver_timeout, and it is suppressed when the parameter is set to zero.
> 
> This means that there is a possibility that walsender is timeout when wal_sender_timeout
> in publisher and wal_receiver_timeout in subscriber is different.
> Supposing that wal_sender_timeout is 2min, wal_receiver_tiemout is 5min,

It seems to me wal_receiver_status_interval is better for this use.
It's enough for us to docuemnt that "wal_r_s_interval should be
shorter than wal_sener_timeout/2 especially when logical replication
connection is using min_apply_delay. Otherwise you will suffer
repeated termination of walsender".

> and min_apply_delay is 10min. The worker on subscriber will wake up per 2.5min and
> send keepalives, but walsender exits before the message arrives to publisher.
> 
> One idea to avoid that is to send the min_apply_delay subscriber option to publisher
> and compare them, but it may be not sufficient. Because XXX_timout GUC parameters
> could be modified later.

# Anyway, I don't think such asymmetric setup is preferable.

> 2)
> 
> The issue reported by Vignesh-san[1] has still remained. I have already analyzed that [2],
> the root cause is that flushed WAL is not updated and sent to the publisher. Even
> if workers send keepalive messages to pub during the delay, the flushed position
> cannot be modified.

I didn't look closer but the cause I guess is walsender doesn't die
until all WAL has been sent, while logical delay chokes replication
stream. Allowing walsender to finish ignoring replication status
wouldn't be great.  One idea is to let logical workers send delaying
status.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: pg_upgrade: Make testing different transfer modes easier
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)