Home > mailing lists

Re: Logical replication timeout problem - Mailing list pgsql-hackers

From	Euler Taveira
Subject	Re: Logical replication timeout problem
Date	April 14, 2022 12:21:27
Msg-id	ef68bb18-37f9-4303-998e-a15d2fdb2563@www.fastmail.com Whole thread Raw
In response to	Re: Logical replication timeout problem (Amit Kapila <amit.kapila16@gmail.com>)
Responses	Re: Logical replication timeout problem RE: Logical replication timeout problem
List	pgsql-hackers

Tree view

On Wed, Apr 13, 2022, at 7:45 AM, Amit Kapila wrote:

On Mon, Apr 11, 2022 at 12:09 PM wangw.fnst@fujitsu.com
<wangw.fnst@fujitsu.com> wrote:
>
> So I skip tracking lag during a transaction just like the current HEAD.
> Attach the new patch.
>

Thanks, please find the updated patch where I have slightly modified
the comments.

Sawada-San, Euler, do you have any opinion on this approach? I
personally still prefer the approach implemented in v10 [1] especially
due to the latest finding by Wang-San that we can't update the
lag-tracker apart from when it is invoked at the transaction end.
However, I am fine if we like this approach more.

It seems v15 is simpler and less error prone than v10. v10 has a mix of

OutputPluginUpdateProgress() and the new function update_progress(). The v10

also calls update_progress() for every change action in pgoutput_change(). It

is not a good approach for maintainability -- new changes like sequences need

extra calls. However, as you mentioned there should handle the track lag case.

Both patches change the OutputPluginUpdateProgress() so it cannot be

backpatched. Are you planning to backpatch it? If so, the boolean variable

(last_write or end_xacts depending of which version you are considering) could

be added to LogicalDecodingContext. (You should probably consider this approach

for skipped_xact too)

+ * For a large transaction, if we don't send any change to the downstream for a

+ * long time then it can timeout. This can happen when all or most of the

+ * changes are either not published or got filtered out.

We should probable mention that "long time" is wal_receiver_timeout on

subscriber.

+ * change as that can have overhead. Testing reveals that there is no

+ * noticeable overhead in doing it after continuously processing 100 or so

+ * changes.

Tests revealed that ...

+ * We don't have a mechanism to get the ack for any LSN other than end xact

+ * lsn from the downstream. So, we track lag only for end xact lsn's.

s/lsn/LSN/ and s/lsn's/LSNs/

I would say "end of transaction LSN".

+ * If too many changes are processed then try to send a keepalive message to

+ * receiver to avoid timeouts.

In logical replication, if too many changes are processed then try to send a

keepalive message. It might avoid a timeout in the subscriber.

Does this same issue occur for long transactions? I mean keep a long

transaction open and execute thousands of transactions.

BEGIN;

INSERT INTO foo (a) VALUES(1);

-- wait a few hours while executing 10^x transactions

INSERT INTO foo (a) VALUES(2);

COMMIT;

Euler Taveira

EDB https://www.enterprisedb.com/

pgsql-hackers by date:

From: Masahiko Sawada
Date: 14 April 2022, 12:20:13
Subject: Re: Logical replication timeout problem

From: Frédéric Yhuel
Date: 14 April 2022, 12:25:13
Subject: Re: Allow parallel plan for referential integrity checks?

Re: Logical replication timeout problem - Mailing list pgsql-hackers

Previous

Next