Re: Logical replication timeout problem - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Logical replication timeout problem |
Date | |
Msg-id | 20230208181841.bftk5trxmollfrzq@awork3.anarazel.de Whole thread Raw |
In response to | Re: Logical replication timeout problem (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Logical replication timeout problem
|
List | pgsql-hackers |
Hi, On 2023-02-08 13:36:02 +0530, Amit Kapila wrote: > On Wed, Feb 8, 2023 at 10:57 AM Andres Freund <andres@anarazel.de> wrote: > > > > On 2023-02-03 10:13:54 +0530, Amit Kapila wrote: > > > I am planning to push this to HEAD sometime next week (by Wednesday). > > > To backpatch this, we need to fix it in some non-standard way, like > > > without introducing a callback which I am not sure is a good idea. If > > > some other committers vote to get this in back branches with that or > > > some different idea that can be backpatched then we can do that > > > separately as well. I don't see this as a must-fix in back branches > > > because we have a workaround (increase timeout) or users can use the > > > streaming option (for >=14). > > > > I just saw the commit go in, and a quick scan over it makes me think neither > > this commit, nor f95d53eded, which unfortunately was already backpatched, is > > the right direction. The wrong direction likely started quite a bit earlier, > > with 024711bb544. > > > > It feels quite fundamentally wrong that bascially every output plugin needs to > > call a special function in nearly every callback. > > > > In 024711bb544 there was just one call to OutputPluginUpdateProgress() in > > pgoutput.c. Quite tellingly, it just updated pgoutput, without touching > > test_decoding. > > > > Then a8fd13cab0b added to more calls. 63cf61cdeb7 yet another. > > > > I think the original commit 024711bb544 forgets to call it in > test_decoding and the other commits followed the same and missed to > update test_decoding. I think that's a symptom of the wrong architecture having been chosen. This should *never* have been the task of output plugins. > > I don't think: > > /* > > * Wait until there is no pending write. Also process replies from the other > > * side and check timeouts during that. > > */ > > static void > > ProcessPendingWrites(void) > > > > Is really a good name. What are we processing? > > > > It is for sending the keep_alive message (if required). That is > normally done when we skipped processing a transaction to ensure sync > replication is not delayed. But how is that "processing pending writes"? For me "processing" implies we're doing some analysis on them or such. If we want to write data in WalSndUpdateProgress(), shouldn't we move the common code of WalSndWriteData() and WalSndUpdateProgress() into ProcessPendingWrites()? > It has been discussed previously [1][2] to > extend the WalSndUpdateProgress() interface. Basically, as explained > by Craig [2], this has to be done from plugin as it can do filtering > or there could be other reasons why the output plugin skips all > changes. We used the same interface for sending keep-alive message > when we processed a lot of (DDL) changes without sending anything to > plugin. > > [1] - https://www.postgresql.org/message-id/20200309183018.tzkzwu635sd366ej%40alap3.anarazel.de > [2] - https://www.postgresql.org/message-id/CAMsr%2BYE3o8Dt890Q8wTooY2MpN0JvdHqUAHYL-LNhBryXOPaKg%40mail.gmail.com I don't buy that this has to be done by the output plugin. The actual sending out of data happens via the LogicalDecodingContext callbacks, so we very well can know whether we recently did send out data or not. This really is a concern of the LogicalDecodingContext, it has pretty much nothing to do with output plugins. We should remove all calls of OutputPluginUpdateProgress() from pgoutput, and add the necessary calls to LogicalDecodingContext->update_progress() to generic code. And Additionally we should either rename WalSndUpdateProgress(), because it's now doing *far* more than "updating progress", or alternatively, split it into two functions. I don't think the syncrep logic in WalSndUpdateProgress really works as-is - consider what happens if e.g. the origin filter filters out entire transactions. We'll afaics never get to WalSndUpdateProgress(). In some cases we'll be lucky because we'll return quickly to XLogSendLogical(), but not reliably. Greetings, Andres Freund
pgsql-hackers by date: