RE: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers
From | wangw.fnst@fujitsu.com |
---|---|
Subject | RE: Perform streaming logical transactions by background workers and parallel apply |
Date | |
Msg-id | OS3PR01MB62755190D8302CDDFB72C3189E6B9@OS3PR01MB6275.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Perform streaming logical transactions by background workers and parallel apply (Peter Smith <smithpb2250@gmail.com>) |
List | pgsql-hackers |
On Fri, August 12, 2022 17:22 PM Peter Smith <smithpb2250@gmail.com> wrote: > Here are some review comments for v20-0004: > > (This completes my reviews of the v20* patch set. Sorry, the reviews > are time consuming, so I am lagging slightly behind the latest posted > version) Thanks for your comments. > 1. doc/src/sgml/ref/create_subscription.sgml > > @@ -245,6 +245,11 @@ CREATE SUBSCRIPTION <replaceable > class="parameter">subscription_name</replaceabl > also be the unique column on the publisher-side; 2) there cannot be > any non-immutable functions used by the subscriber-side replicated > table. > + When applying a streaming transaction, if either requirement is not > + met, the background worker will exit with an error. > + The <literal>parallel</literal> mode is disregarded when retrying; > + instead the transaction will be applied using <literal>on</literal> > + mode. > </para> > > The "on mode" still sounds strange to me. Maybe it's just my personal > opinion, but I don’t really consider 'on' and 'off' to be "modes". > Anyway I already posted the same comment several times before [1, > #4.3]. Let's see what others think. > > SUGGESTION > "using on mode" -> "using streaming = on" Okay, I will follow the relevant comments later. > 2. src/backend/replication/logical/worker.c - start_table_sync > > @@ -3902,20 +3925,28 @@ start_table_sync(XLogRecPtr *origin_startpos, > char **myslotname) > } > PG_CATCH(); > { > + /* > + * Emit the error message, and recover from the error state to an idle > + * state > + */ > + HOLD_INTERRUPTS(); > + > + EmitErrorReport(); > + AbortOutOfAnyTransaction(); > + FlushErrorState(); > + > + RESUME_INTERRUPTS(); > + > + /* Report the worker failed during table synchronization */ > + pgstat_report_subscription_error(MySubscription->oid, false); > + > if (MySubscription->disableonerr) > - DisableSubscriptionAndExit(); > - else > - { > - /* > - * Report the worker failed during table synchronization. Abort > - * the current transaction so that the stats message is sent in an > - * idle state. > - */ > - AbortOutOfAnyTransaction(); > - pgstat_report_subscription_error(MySubscription->oid, false); > + DisableSubscriptionOnError(); > > - PG_RE_THROW(); > - } > + /* Set the retry flag. */ > + set_subscription_retry(true); > + > + proc_exit(0); > } > PG_END_TRY(); > > Perhaps current code is OK, but I am not 100% sure if we should set > the retry flag when the disable_on_error is set, because the > subscription is not going to be retried (because it is disabled). And > later, if/when the user does enable the subscription, presumably that > will be after they have already addressed the problem that caused the > error/disablement in the first place. I think it is okay. Because even after addressing the problem, it is also *retrying* to apply the failed transaction. And, in the worst case, it just applies the first failed streaming transaction using "on" mode instead of "parallel" mode. > 3. src/backend/replication/logical/worker.c - start_apply > > PG_CATCH(); > { > + /* > + * Emit the error message, and recover from the error state to an idle > + * state > + */ > + HOLD_INTERRUPTS(); > + > + EmitErrorReport(); > + AbortOutOfAnyTransaction(); > + FlushErrorState(); > + > + RESUME_INTERRUPTS(); > + > + /* Report the worker failed while applying changes */ > + pgstat_report_subscription_error(MySubscription->oid, > + !am_tablesync_worker()); > + > if (MySubscription->disableonerr) > - DisableSubscriptionAndExit(); > - else > - { > - /* > - * Report the worker failed while applying changes. Abort the > - * current transaction so that the stats message is sent in an > - * idle state. > - */ > - AbortOutOfAnyTransaction(); > - pgstat_report_subscription_error(MySubscription- > >oid, !am_tablesync_worker()); > + DisableSubscriptionOnError(); > > - PG_RE_THROW(); > - } > + /* Set the retry flag. */ > + set_subscription_retry(true); > } > PG_END_TRY(); > } > > 3a. > Same comment as #2 > > 3b. > This PG_CATCH used to leave by either proc_exit(0) or PG_RE_THROW but > what does it do now? My first impression is there is a bug here due to > some missing code, because AFAICT the exception is caught and gobbled > up and then what...? =>3a. See the reply to #2. =>3b. The function `proc_exit(0)` is invoked after invoking function start_apply. See function ApplyWorkerMain. > 4. src/backend/replication/logical/worker.c - set_subscription_retry > > + if (MySubscription->retry == retry || > + am_apply_bgworker()) > + return; > > 4a. > I this this quick exit can be split and given some appropriate comments > > SUGGESTION (for example) > /* Fast path - if no state change then nothing to do */ > if (MySubscription->retry == retry) > return; > > /* Fast path - skip for apply background workers */ > if (am_apply_bgworker()) > return; Changed. > 5. .../subscription/t/032_streaming_apply.pl > > @@ -78,9 +78,13 @@ my $timer = > IPC::Run::timeout($PostgreSQL::Test::Utils::timeout_default); > my $h = $node_publisher->background_psql('postgres', \$in, \$out, $timer, > on_error_stop => 0); > > +# > ============================================================= > =============== > > All those comment highlighting lines like "# ==============" really > belong in the earlier patch (0003 ?) when this TAP test file was > introduced. Changed. The new patches were attached in [1]. [1] - https://www.postgresql.org/message-id/OS3PR01MB6275739E73E8BEC5D13FB6739E6B9%40OS3PR01MB6275.jpnprd01.prod.outlook.com Regards, Wang wei
pgsql-hackers by date: