RE: Logical replication timeout problem - Mailing list pgsql-hackers

From wangw.fnst@fujitsu.com
Subject RE: Logical replication timeout problem
Date
Msg-id OS3PR01MB62754D7C91CE80B3A68FFC1A9EF39@OS3PR01MB6275.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Logical replication timeout problem  ("Euler Taveira" <euler@eulerto.com>)
List pgsql-hackers
On Thur, Apr 14, 2022 at 8:21 PM Euler Taveira <euler@eulerto.com> wrote:
>
Thanks for your comments.

> + * For a large transaction, if we don't send any change to the downstream for a
> + * long time then it can timeout. This can happen when all or most of the
> + * changes are either not published or got filtered out.
>
> We should probable mention that "long time" is wal_receiver_timeout on
> subscriber.
Improve as suggested.
Add "(exceeds the wal_receiver_timeout of standby)" to explain what "long time"
means.

> +    * change as that can have overhead. Testing reveals that there is no
> +    * noticeable overhead in doing it after continuously processing 100 or so
> +    * changes.
>
> Tests revealed that ...
Improve as suggested.

> +    * We don't have a mechanism to get the ack for any LSN other than end xact
> +    * lsn from the downstream. So, we track lag only for end xact lsn's.
>
> s/lsn/LSN/ and s/lsn's/LSNs/
>
> I would say "end of transaction LSN".
Improve as suggested.

> + * If too many changes are processed then try to send a keepalive message to
> + * receiver to avoid timeouts.
>
> In logical replication, if too many changes are processed then try to send a
> keepalive message. It might avoid a timeout in the subscriber.
Improve as suggested.

Kindly have a look at new patch shared in [1].

[1] -
https://www.postgresql.org/message-id/OS3PR01MB627561344A2C7ECF68E41D7E9EF39%40OS3PR01MB6275.jpnprd01.prod.outlook.com

Regards,
Wang wei



pgsql-hackers by date:

Previous
From: "wangw.fnst@fujitsu.com"
Date:
Subject: RE: Logical replication timeout problem
Next
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: pg_get_publication_tables() output duplicate relid