Re: Perform streaming logical transactions by background workers and parallel apply - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Perform streaming logical transactions by background workers and parallel apply
Date
Msg-id CAHut+Pu660Kv0DX8ZHfkfTAJd+2wq5-QX6VHzV_muDPmOhejHQ@mail.gmail.com
Whole thread Raw
In response to RE: Perform streaming logical transactions by background workers and parallel apply  ("wangw.fnst@fujitsu.com" <wangw.fnst@fujitsu.com>)
Responses RE: Perform streaming logical transactions by background workers and parallel apply
List pgsql-hackers
Here are some review comments for v20-0004:

(This completes my reviews of the v20* patch set. Sorry, the reviews
are time consuming, so I am lagging slightly behind the latest posted
version)

======

1. doc/src/sgml/ref/create_subscription.sgml

@@ -245,6 +245,11 @@ CREATE SUBSCRIPTION <replaceable
class="parameter">subscription_name</replaceabl
           also be the unique column on the publisher-side; 2) there cannot be
           any non-immutable functions used by the subscriber-side replicated
           table.
+          When applying a streaming transaction, if either requirement is not
+          met, the background worker will exit with an error.
+          The <literal>parallel</literal> mode is disregarded when retrying;
+          instead the transaction will be applied using <literal>on</literal>
+          mode.
          </para>

The "on mode" still sounds strange to me. Maybe it's just my personal
opinion, but I don’t really consider 'on' and 'off' to be "modes".
Anyway I already posted the same comment several times before [1,
#4.3]. Let's see what others think.

SUGGESTION
"using on mode" -> "using streaming = on"

======

2. src/backend/replication/logical/worker.c - start_table_sync

@@ -3902,20 +3925,28 @@ start_table_sync(XLogRecPtr *origin_startpos,
char **myslotname)
  }
  PG_CATCH();
  {
+ /*
+ * Emit the error message, and recover from the error state to an idle
+ * state
+ */
+ HOLD_INTERRUPTS();
+
+ EmitErrorReport();
+ AbortOutOfAnyTransaction();
+ FlushErrorState();
+
+ RESUME_INTERRUPTS();
+
+ /* Report the worker failed during table synchronization */
+ pgstat_report_subscription_error(MySubscription->oid, false);
+
  if (MySubscription->disableonerr)
- DisableSubscriptionAndExit();
- else
- {
- /*
- * Report the worker failed during table synchronization. Abort
- * the current transaction so that the stats message is sent in an
- * idle state.
- */
- AbortOutOfAnyTransaction();
- pgstat_report_subscription_error(MySubscription->oid, false);
+ DisableSubscriptionOnError();

- PG_RE_THROW();
- }
+ /* Set the retry flag. */
+ set_subscription_retry(true);
+
+ proc_exit(0);
  }
  PG_END_TRY();

Perhaps current code is OK, but I am not 100% sure if we should set
the retry flag when the disable_on_error is set, because the
subscription is not going to be retried (because it is disabled). And
later, if/when the user does enable the subscription, presumably that
will be after they have already addressed the problem that caused the
error/disablement in the first place.

~~~

3. src/backend/replication/logical/worker.c - start_apply

  PG_CATCH();
  {
+ /*
+ * Emit the error message, and recover from the error state to an idle
+ * state
+ */
+ HOLD_INTERRUPTS();
+
+ EmitErrorReport();
+ AbortOutOfAnyTransaction();
+ FlushErrorState();
+
+ RESUME_INTERRUPTS();
+
+ /* Report the worker failed while applying changes */
+ pgstat_report_subscription_error(MySubscription->oid,
+ !am_tablesync_worker());
+
  if (MySubscription->disableonerr)
- DisableSubscriptionAndExit();
- else
- {
- /*
- * Report the worker failed while applying changes. Abort the
- * current transaction so that the stats message is sent in an
- * idle state.
- */
- AbortOutOfAnyTransaction();
- pgstat_report_subscription_error(MySubscription->oid, !am_tablesync_worker());
+ DisableSubscriptionOnError();

- PG_RE_THROW();
- }
+ /* Set the retry flag. */
+ set_subscription_retry(true);
  }
  PG_END_TRY();
 }

3a.
Same comment as #2

3b.
This PG_CATCH used to leave by either proc_exit(0) or PG_RE_THROW but
what does it do now? My first impression is there is a bug here due to
some missing code, because AFAICT the exception is caught and gobbled
up and then what...?

~~~

4. src/backend/replication/logical/worker.c - set_subscription_retry

+ if (MySubscription->retry == retry ||
+ am_apply_bgworker())
+ return;

4a.
I this this quick exit can be split and given some appropriate comments

SUGGESTION (for example)
/* Fast path - if no state change then nothing to do */
if (MySubscription->retry == retry)
return;

/* Fast path - skip for apply background workers */
if (am_apply_bgworker())
return;

======

5. .../subscription/t/032_streaming_apply.pl

@@ -78,9 +78,13 @@ my $timer =
IPC::Run::timeout($PostgreSQL::Test::Utils::timeout_default);
 my $h = $node_publisher->background_psql('postgres', \$in, \$out, $timer,
  on_error_stop => 0);

+# ============================================================================

All those comment highlighting lines like "# ==============" really
belong in the earlier patch (0003 ?) when this TAP test file was
introduced.

------
[1] https://www.postgresql.org/message-id/CAHut%2BPvrw%2BtgCEYGxv%2BnKrqg-zbJdYEXee6o4irPAsYoXcuUcw%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Introduce wait_for_subscription_sync for TAP tests
Next
From: Amit Kapila
Date:
Subject: Re: Support logical replication of DDLs