RE: Logical replication timeout problem - Mailing list pgsql-hackers

From wangw.fnst@fujitsu.com
Subject RE: Logical replication timeout problem
Date
Msg-id OS3PR01MB6275D7FBDEBC143B792F93169E019@OS3PR01MB6275.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: Logical replication timeout problem  ("kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com>)
List pgsql-hackers
On Thur, Feb 24, 2022 at 4:06 PM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:
> Dear Wang,
Thanks for your review.

> > According to our discussion, we need to send keepalive messages to
> > subscriber when skipping changes.
> > One approach is that **for each skipped change**, we try to send
> > keepalive message by calculating whether a timeout will occur based on
> > the current time and the last time the keepalive was sent. But this will brings
> slight overhead.
> > So I want to try another approach: after **constantly skipping some
> > changes**, we try to send keepalive message by calculating whether a
> > timeout will occur based on the current time and the last time the keepalive
> was sent.
> 
> You meant that calling system calls like GetCurrentTimestamp() should be
> reduced, right? I'm not sure how it affects but it seems reasonable.
Yes. There is no need to invoke frequently, and it will bring overhead.

> > IMO, we should send keepalive message after skipping a certain number
> > of changes constantly.
> > And I want to calculate the threshold dynamically by using a fixed
> > value to avoid adding too much code.
> > In addition, different users have different machine performance, and
> > users can modify wal_sender_timeout, so the threshold should be
> > dynamically calculated according to wal_sender_timeout.
> 
> Your experiment seems not bad, but the background cannot be understand
> from code comments. I prefer a static threshold because it's more simple, which
> as Amit said in the following too:
> 
> https://www.postgresql.org/message-id/CAA4eK1%2B-
> p_K_j%3DNiGGD6tCYXiJH0ypT4REX5PBKJ4AcUoF3gZQ%40mail.gmail.com
Yes, you are right. Fixed.
And I set the threshold to 10000.

> BTW, this patch cannot be applied to current master.
Thanks for reminder. Rebase it.
Kindly have a look at new patch shared in [1].

[1]
https://www.postgresql.org/message-id/OS3PR01MB6275FEB9F83081F1C87539B99E019%40OS3PR01MB6275.jpnprd01.prod.outlook.com

Regards,
Wang wei

pgsql-hackers by date:

Previous
From: "wangw.fnst@fujitsu.com"
Date:
Subject: RE: Logical replication timeout problem
Next
From: "tanghy.fnst@fujitsu.com"
Date:
Subject: RE: Design of pg_stat_subscription_workers vs pgstats