RE: Logical replication timeout problem - Mailing list pgsql-hackers

From wangw.fnst@fujitsu.com
Subject RE: Logical replication timeout problem
Date
Msg-id OS3PR01MB6275CFAEDCCE48884C94072F9EC59@OS3PR01MB6275.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Logical replication timeout problem  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Fri, Jan 20, 2023 at 10:10 AM Peter Smith <smithpb2250@gmail.com> wrote:
> Here are some review comments for patch v3-0001.

Thanks for your comments.

> ======
> Commit message
> 
> 1.
> The problem is when there is a DDL in a transaction that generates lots of
> temporary data due to rewrite rules, these temporary data will not be
> processed
> by the pgoutput - plugin. Therefore, the previous fix (f95d53e) for DML had no
> impact on this case.
> 
> ~
> 
> 1a.
> IMO this comment needs to give a bit of background about the original
> problem here, rather than just starting with "The problem is" which is
> describing the flaws of the previous fix.

Added some related message.

> ~
> 
> 1b.
> "pgoutput - plugin" -> "pgoutput plugin" ??

Changed.

> ~~~
> 
> 2.
> 
> To fix this, we introduced a new ReorderBuffer callback -
> 'ReorderBufferUpdateProgressCB'. This callback is called to try to update the
> process after each change has been processed during sending data of a
> transaction (and its subtransactions) to the output plugin.
> 
> IIUC it's not really "after each change" - shouldn't this comment
> mention something about the CHANGES_THRESHOLD 100?

Changed.

> ~~~
> 
> 4. update_progress_cb_wrapper
> 
> +/*
> + * Update progress callback
> + *
> + * Try to update progress and send a keepalive message if too many changes
> were
> + * processed when processing txn.
> + *
> + * For a large transaction, if we don't send any change to the downstream for a
> + * long time (exceeds the wal_receiver_timeout of standby) then it can
> timeout.
> + * This can happen when all or most of the changes are either not published or
> + * got filtered out.
> + */
> 
> SUGGESTION (instead of the "Try to update" sentence)
> Send a keepalive message whenever more than <CHANGES_THRESHOLD>
> changes are encountered while processing a transaction.

Since it's possible that keep-alive messages won't be sent even if the
threshold is reached (see function WalSndKeepaliveIfNecessary), I thought it
might be better to use "try to".
And rewrote the comments here because the threshold logic is moved to the
function ReorderBufferProcessTXN.

> ~~~
> 
> 5.
> 
> +static void
> +update_progress_cb_wrapper(ReorderBuffer *cache, ReorderBufferTXN *txn,
> +    ReorderBufferChange *change)
> +{
> + LogicalDecodingContext *ctx = cache->private_data;
> + LogicalErrorCallbackState state;
> + ErrorContextCallback errcallback;
> + static int changes_count = 0; /* Static variable used to accumulate
> + * the number of changes while
> + * processing txn. */
> +
> 
> IMO this may be more readable if the static 'changes_count' local var
> was declared first and separated from the other vars by a blank line.

Changed.

> ~~~
> 
> 6.
> 
> + /*
> + * We don't want to try sending a keepalive message after processing each
> + * change as that can have overhead. Tests revealed that there is no
> + * noticeable overhead in doing it after continuously processing 100 or so
> + * changes.
> + */
> +#define CHANGES_THRESHOLD 100
> 
> 6a.
> I think it might be better to define this right at the top of the
> function adjacent to the 'changes_count' variable (e.g. a bit like the
> original HEAD code looked)

Changed.

> ~
> 
> 6b.
> SUGGESTION (for the comment)
> Sending keepalive messages after every change has some overhead, but
> testing showed there is no noticeable overhead if keepalive is only
> sent after every ~100 changes.

Changed.

> ~~~
> 
> 7.
> 
> +
> + /*
> + * After continuously processing CHANGES_THRESHOLD changes, we
> + * try to send a keepalive message if required.
> + */
> + if (++changes_count >= CHANGES_THRESHOLD)
> + {
> + ctx->update_progress(ctx, ctx->write_location, ctx->write_xid, false);
> + changes_count = 0;
> + }
> +
> 
> 7a.
> SUGGESTION (for comment)
> Send a keepalive message after every CHANGES_THRESHOLD changes.

Changed.

Regards,
Wang Wei

pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: generic plans and "initial" pruning
Next
From: Peter Smith
Date:
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)