Re: Logical replication timeout problem - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Logical replication timeout problem
Date
Msg-id CAA4eK1L2xNjQ7A6Wok00ai=7YB+YbVZCP18LywntDiFhFazqtA@mail.gmail.com
Whole thread Raw
In response to Re: Logical replication timeout problem  (Fabrice Chapuis <fabrice636861@gmail.com>)
Responses RE: Logical replication timeout problem
List pgsql-hackers
On Fri, Jan 21, 2022 at 10:45 PM Fabrice Chapuis
<fabrice636861@gmail.com> wrote:
>
> I keep your patch 0001 and I add these two calls in function WalSndUpdateProgress without modifying
WalSndKeepaliveIfNecessary,it works too.
 
> What do your think of this patch?
>

I think this will also work. Here, the point was to just check what is
the exact problem and the possible approach to solve it, the actual
patch might be different from these ideas. So, let me try to summarize
the problem and the possible approach to solve it so that others can
also share their opinion.

Here, the problem is that we don't send keep-alive messages for a long
time while processing large transactions during logical replication
where we don't send any data of such transactions (say because the
table modified in the transaction is not published). We do try to send
the keep_alive if necessary at the end of the transaction (via
WalSndWriteData()) but by that time the subscriber-side can timeout
and exit.

Now, one idea to solve this problem could be that whenever we skip
sending any change we do try to update the plugin progress via
OutputPluginUpdateProgress(for walsender, it will invoke
WalSndUpdateProgress), and there it tries to process replies and send
keep_alive if necessary as we do when we send some data via
OutputPluginWrite(for walsender, it will invoke WalSndWriteData). I
don't know whether it is a good idea to invoke such a mechanism for
every change we skip to send or we should do it after we skip sending
some threshold of continuous changes. I think later would be
preferred. Also, we might want to introduce a new parameter
send_keep_alive to this API so that there is flexibility to invoke
this mechanism as we don't need to invoke it while we are actually
sending data and before that, we just update the progress via this
API.

Thoughts?

Note: I have added Simon and Petr J. to this thread as they introduced
the API OutputPluginUpdateProgress in commit 024711bb54 and know this
part of code/design well but ideas suggestions from everyone are
welcome.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Skipping logical replication transactions on subscriber side
Next
From: James Coleman
Date:
Subject: Re: Document atthasmissing default optimization avoids verification table scan