Rework LogicalOutputPluginWriterUpdateProgress (WAS Re: Logical replication timeout ...) - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Rework LogicalOutputPluginWriterUpdateProgress (WAS Re: Logical replication timeout ...) |
Date | |
Msg-id | CAA4eK1+DB66cYRRVyGcaMm7+tQ_u=q=+HWGjpu9X0pqMFWbsZQ@mail.gmail.com Whole thread Raw |
In response to | Re: Logical replication timeout problem (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Rework LogicalOutputPluginWriterUpdateProgress (WAS Re: Logical replication timeout ...)
(Amit Kapila <amit.kapila16@gmail.com>)
|
List | pgsql-hackers |
On Thu, Feb 9, 2023 at 1:33 AM Andres Freund <andres@anarazel.de> wrote: > > Hacking on a rough prototype how I think this should rather look, I had a few > questions / remarks: > > - We probably need to call UpdateProgress from a bunch of places in decode.c > as well? Indicating that we're lagging by a lot, just because all > transactions were in another database seems decidedly suboptimal. > We can do that but I think in all those cases we will reach quickly enough back to walsender logic (WalSndLoop - that will send keepalive if required) that we don't need to worry. After processing each record, the logic will return back to the main loop that will send keepalive if required. Also, while reading WAL if we need to block, it will call WalSndWaitForWal() which will send keepalive if required. The real problem we have seen in the field reports or tests is that when we process a large transaction where changes are queued in the reorderbuffer and while processing those we discard all or most of the changes. The patch calls update_progress in change_cb_wrapper and other wrappers which will miss the case of DDLs that generates a lot of data that is not processed by the plugin. I think for that we either need to call update_progress from reorderbuffer.c similar to what the patch has removed or we need some other way to address it. Do you have any better idea? > - Why should lag tracking only be updated at commit like points? That seems > like it adds odd discontinuinities? > We have previously experimented to call it from non-commit locations but that turned out to give inaccurate information about Lag. See email [1]. > - The mix of skipped_xact and ctx->end_xact in WalSndUpdateProgress() seems > somewhat odd. They have very overlapping meanings IMO. > > - there's no UpdateProgress calls in pgoutput_stream_abort(), but ISTM there > should be? It's legit progress. > Agreed with both of the above points. > - That's from 6912acc04f0: I find LagTrackerRead(), LagTrackerWrite() quite > confusing, naming-wise. IIUC "reading" is about receiving confirmation > messages, "writing" about the time the record was generated. ISTM that the > current time is a quite poor approximation in XLogSendPhysical(), but pretty > much meaningless in WalSndUpdateProgress()? Am I missing something? > Leaving it for Thomas to answer. > - Aren't the wal_sender_timeout / 2 checks in WalSndUpdateProgress(), > WalSndWriteData() missing wal_sender_timeout <= 0 checks? > It seems we are checking that via ProcessPendingWrites()->WalSndKeepaliveIfNecessary(). Do you think we need to check it before as well? > - I don't really understand why f95d53edged55 added !end_xact to the if > condition for ProcessPendingWrites(). Is the theory that we'll end up in an > outer loop soon? > Yes. For non-empty xacts, we will anyway send a commit message. For empty (skipped) xacts, we will send for synchronous replication case to avoid any delay. > > Attached is a current, quite rough, prototype. It addresses some of the points > raised, but far from all. There's also several XXXs/FIXMEs in it. I changed > the file-ending to .txt to avoid hijacking the CF entry. > I have started a separate thread to avoid such confusion. I hope that is fine with you. > > > I don't think the syncrep logic in WalSndUpdateProgress really works as-is - > > > consider what happens if e.g. the origin filter filters out entire > > > transactions. We'll afaics never get to WalSndUpdateProgress(). In some cases > > > we'll be lucky because we'll return quickly to XLogSendLogical(), but not > > > reliably. > > Which case are you worried about? As mentioned in one of the previous points I thought the timeout/keepalive handling in the callers should be enough. > > Is it actually the right thing to check SyncRepRequested() in that logic? It's > > quite common to set up syncrep so that individual users or transactions opt > > into syncrep, but to leave the default disabled. > > > > I don't really see an alternative to making this depend solely on > > sync_standbys_defined. Fair point. How about renaming ProcessPendingWrites to WaitToSendPendingWrites or WalSndWaitToSendPendingWrites? [1] - https://www.postgresql.org/message-id/OS3PR01MB62755D216245199554DDC8DB9EEA9%40OS3PR01MB6275.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
pgsql-hackers by date: