RE: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From wangw.fnst@fujitsu.com
Subject RE: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id OS3PR01MB62755190D8302CDDFB72C3189E6B9@OS3PR01MB6275.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Perform streaming logical transactions by background workers and parallel apply  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Fri, August 12, 2022 17:22 PM Peter Smith <smithpb2250@gmail.com> wrote:
> Here are some review comments for v20-0004:
> 
> (This completes my reviews of the v20* patch set. Sorry, the reviews
> are time consuming, so I am lagging slightly behind the latest posted
> version)

Thanks for your comments.

> 1. doc/src/sgml/ref/create_subscription.sgml
> 
> @@ -245,6 +245,11 @@ CREATE SUBSCRIPTION <replaceable
> class="parameter">subscription_name</replaceabl
>            also be the unique column on the publisher-side; 2) there cannot be
>            any non-immutable functions used by the subscriber-side replicated
>            table.
> +          When applying a streaming transaction, if either requirement is not
> +          met, the background worker will exit with an error.
> +          The <literal>parallel</literal> mode is disregarded when retrying;
> +          instead the transaction will be applied using <literal>on</literal>
> +          mode.
>           </para>
> 
> The "on mode" still sounds strange to me. Maybe it's just my personal
> opinion, but I don’t really consider 'on' and 'off' to be "modes".
> Anyway I already posted the same comment several times before [1,
> #4.3]. Let's see what others think.
> 
> SUGGESTION
> "using on mode" -> "using streaming = on"

Okay, I will follow the relevant comments later.

> 2. src/backend/replication/logical/worker.c - start_table_sync
> 
> @@ -3902,20 +3925,28 @@ start_table_sync(XLogRecPtr *origin_startpos,
> char **myslotname)
>   }
>   PG_CATCH();
>   {
> + /*
> + * Emit the error message, and recover from the error state to an idle
> + * state
> + */
> + HOLD_INTERRUPTS();
> +
> + EmitErrorReport();
> + AbortOutOfAnyTransaction();
> + FlushErrorState();
> +
> + RESUME_INTERRUPTS();
> +
> + /* Report the worker failed during table synchronization */
> + pgstat_report_subscription_error(MySubscription->oid, false);
> +
>   if (MySubscription->disableonerr)
> - DisableSubscriptionAndExit();
> - else
> - {
> - /*
> - * Report the worker failed during table synchronization. Abort
> - * the current transaction so that the stats message is sent in an
> - * idle state.
> - */
> - AbortOutOfAnyTransaction();
> - pgstat_report_subscription_error(MySubscription->oid, false);
> + DisableSubscriptionOnError();
> 
> - PG_RE_THROW();
> - }
> + /* Set the retry flag. */
> + set_subscription_retry(true);
> +
> + proc_exit(0);
>   }
>   PG_END_TRY();
> 
> Perhaps current code is OK, but I am not 100% sure if we should set
> the retry flag when the disable_on_error is set, because the
> subscription is not going to be retried (because it is disabled). And
> later, if/when the user does enable the subscription, presumably that
> will be after they have already addressed the problem that caused the
> error/disablement in the first place.

I think it is okay. Because even after addressing the problem, it is also
*retrying* to apply the failed transaction. And, in the worst case, it just
applies the first failed streaming transaction using "on" mode instead of
"parallel" mode.

> 3. src/backend/replication/logical/worker.c - start_apply
> 
>   PG_CATCH();
>   {
> + /*
> + * Emit the error message, and recover from the error state to an idle
> + * state
> + */
> + HOLD_INTERRUPTS();
> +
> + EmitErrorReport();
> + AbortOutOfAnyTransaction();
> + FlushErrorState();
> +
> + RESUME_INTERRUPTS();
> +
> + /* Report the worker failed while applying changes */
> + pgstat_report_subscription_error(MySubscription->oid,
> + !am_tablesync_worker());
> +
>   if (MySubscription->disableonerr)
> - DisableSubscriptionAndExit();
> - else
> - {
> - /*
> - * Report the worker failed while applying changes. Abort the
> - * current transaction so that the stats message is sent in an
> - * idle state.
> - */
> - AbortOutOfAnyTransaction();
> - pgstat_report_subscription_error(MySubscription-
> >oid, !am_tablesync_worker());
> + DisableSubscriptionOnError();
> 
> - PG_RE_THROW();
> - }
> + /* Set the retry flag. */
> + set_subscription_retry(true);
>   }
>   PG_END_TRY();
>  }
> 
> 3a.
> Same comment as #2
> 
> 3b.
> This PG_CATCH used to leave by either proc_exit(0) or PG_RE_THROW but
> what does it do now? My first impression is there is a bug here due to
> some missing code, because AFAICT the exception is caught and gobbled
> up and then what...?

=>3a.
See the reply to #2.
=>3b.
The function `proc_exit(0)` is invoked after invoking function start_apply. See
function ApplyWorkerMain.

> 4. src/backend/replication/logical/worker.c - set_subscription_retry
> 
> + if (MySubscription->retry == retry ||
> + am_apply_bgworker())
> + return;
> 
> 4a.
> I this this quick exit can be split and given some appropriate comments
> 
> SUGGESTION (for example)
> /* Fast path - if no state change then nothing to do */
> if (MySubscription->retry == retry)
> return;
> 
> /* Fast path - skip for apply background workers */
> if (am_apply_bgworker())
> return;

Changed.

> 5. .../subscription/t/032_streaming_apply.pl
> 
> @@ -78,9 +78,13 @@ my $timer =
> IPC::Run::timeout($PostgreSQL::Test::Utils::timeout_default);
>  my $h = $node_publisher->background_psql('postgres', \$in, \$out, $timer,
>   on_error_stop => 0);
> 
> +#
> =============================================================
> ===============
> 
> All those comment highlighting lines like "# ==============" really
> belong in the earlier patch (0003 ?) when this TAP test file was
> introduced.

Changed.

The new patches were attached in [1].

[1] -
https://www.postgresql.org/message-id/OS3PR01MB6275739E73E8BEC5D13FB6739E6B9%40OS3PR01MB6275.jpnprd01.prod.outlook.com

Regards,
Wang wei

pgsql-hackers by date:

Previous
From: "wangw.fnst@fujitsu.com"
Date:
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Next
From: "Drouvot, Bertrand"
Date:
Subject: Re: Patch proposal: New hooks in the connection path