Re: Slow catchup of 2PC (twophase) transactions on replica in LR - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Date
Msg-id CAHut+PssnvZ86cZw5WGEoOU3mAzzMEBwYcyFG9Th0XM=n49H5A@mail.gmail.com
Whole thread Raw
In response to RE: Slow catchup of 2PC (twophase) transactions on replica in LR  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Responses RE: Slow catchup of 2PC (twophase) transactions on replica in LR
List pgsql-hackers
Hi Kuroda-san, Here are some review comments for all patches v9*

//////////
Patch v9-0001
//////////

There were no changes since v8-0001, so no comments.

//////////
Patch v9-0002
//////////

======
Commit Message

2.1.
Regarding the off->on case, the logical replication already has a
mechanism for it, so there is no need to do anything special for the
on->off case; however, we must connect to the publisher and expressly
change the parameter. The operation cannot be rolled back, and
altering the parameter from "on" to "off" within a transaction is
prohibited.

In the opposite case, there is no need to prevent this because the
logical replication worker already had the mechanism to alter the slot
option at a convenient time.

~

This explanation seems to be going around in circles, without giving
any new information:

AFAICT, "Regarding the off->on case, the logical replication already
has a mechanism for it, so there is no need to do anything special for
the on->off case;"

is saying pretty much the same as:

"In the opposite case, there is no need to prevent this because the
logical replication worker already had the mechanism to alter the slot
option at a convenient time."

But, what I hoped for in previous review comments was an explanation
somewhat less vague than "already has a mechanism" or "already had the
mechanism". Can't this have just 1 or 2 lines to say WHAT is that
existing mechanism for the "off" to "on" case, and WHY that means
there is nothing special to do in that scenario?

======
src/backend/commands/subscriptioncmds.c

2.2. AlterSubscription

  /*
- * The changed two_phase option of the slot can't be rolled
- * back.
+ * Since altering the two_phase option of subscriptions
+ * also leads to changing the slot option, this command
+ * cannot be rolled back. So prevent this if we are in a
+ * transaction block. In the opposite case, there is no
+ * need to prevent this because the logical replication
+ * worker already had the mechanism to alter the slot
+ * option at a convenient time.
  */

(Same previous review comments, and same as my review comment for the
commit message above).

I don't think "already had the mechanism" is enough explanation.

Also, the 2nd sentence doesn't make sense here because the comment
only said "altering the slot option" -- it didn't say it was altering
it to "on" or altering it to "off", so "the opposite case" has no
meaning.

~~~

2.3. AlterSubscription

  /*
- * Try to acquire the connection necessary for altering slot.
+ * Check the need to alter the replication slot. Failover and two_phase
+ * options are controlled by both the publisher (as a slot option) and the
+ * subscriber (as a subscription option). The slot option must be altered
+ * only when changing "on" to "off". Because in opposite case, the logical
+ * replication worker already has the mechanism to do so at a convenient
+ * time.
+ */
+ update_failover = replaces[Anum_pg_subscription_subfailover - 1];
+ update_two_phase = (replaces[Anum_pg_subscription_subtwophasestate - 1] &&
+ !opts.twophase);

This is again the same as other review comments above. Probably, when
some better explanation can be found for "already has the mechanism to
do so at a convenient time." then all of these places can be changed
using similar text.

//////////
Patch v9-0003
//////////

There are some imperfect code comments but AFAIK they are the same
ones from patch 0002. I think patch 0003 is just moving those comments
to different places, so probably they would already be addressed by
patch 0002.

//////////
Patch v9-0004
//////////

======
doc/src/sgml/catalogs.sgml

4.1.
+     <row>
+      <entry role="catalog_table_entry"><para role="column_definition">
+       <structfield>subforcealter</structfield> <type>bool</type>
+      </para>
+      <para>
+       If true, the subscription can be altered <literal>two_phase</literal>
+       option, even if there are prepared transactions
+      </para></entry>
+     </row>
+

BEFORE
If true, the subscription can be altered <literal>two_phase</literal>
option, even if there are prepared transactions

SUGGESTION
If true, then the ALTER SUBSCRIPTION command can disable
<literal>two_phase</literal> option, even if there are uncommitted
prepared transactions from when <literal>two_phase</literal> was
enabled

======
doc/src/sgml/ref/alter_subscription.sgml

4.2.
-
-     <para>
-      The <literal>two_phase</literal> parameter can only be altered when the
-      subscription is disabled. When altering the parameter from
<literal>on</literal>
-      to <literal>off</literal>, the backend process checks for any incomplete
-      prepared transactions done by the logical replication worker (from when
-      <literal>two_phase</literal> parameter was still <literal>on</literal>)
-      and, if any are found, those are aborted.
-     </para>

Well, I still think there ought to be some mention of the relationship
between 'force_alter' and 'two_phase' given on this ALTER SUBSCRIPTION
page. Then the user can cross-reference to read what the 'force_alter'
actually does.

======
doc/src/sgml/ref/create_subscription.sgml

4.3.
+
+       <varlistentry id="sql-createsubscription-params-with-force-alter">
+        <term><literal>force_alter</literal> (<type>boolean</type>)</term>
+        <listitem>
+         <para>
+          Specifies whether the subscription can be altered
+          <literal>two_phase</literal> option, even if there are prepared
+          transactions. If specified, the backend process checks for any
+          incomplete prepared transactions done by the logical replication
+          worker (from when <literal>two_phase</literal> parameter was still
+          <literal>on</literal>), if any are found, those are aborted.
+          Otherwise, Altering the parameter from <literal>on</literal> to
+          <literal>off</literal> will give an error when there are prepared
+          transactions done by the logical replication worker.
+          The default is <literal>false</literal>.
+         </para>
+        </listitem>
+       </varlistentry>

This explanation seems a bit repetitive. I think it can be improved as follows:

SUGGESTION
Specifies if the ALTER SUBSCRIPTION can be forced to proceed instead
of giving an error.

There is currently only one scenario where this parameter has any
effect: When altering two_phase option from true to false it is
possible for there to be incomplete prepared transactions done by the
logical replication worker (from when two_phase parameter was still
true). If force_alter is false, then this will give an error; if
force_alter is true, then the incomplete prepared transactions are
aborted and the alter will proceed.

The default is false.

======
src/backend/commands/subscriptioncmds.c

4.4. CreateSubscription

  values[Anum_pg_subscription_subfailover - 1] = BoolGetDatum(opts.failover);
+ values[Anum_pg_subscription_subforcealter] = BoolGetDatum(opts.force_alter);
  values[Anum_pg_subscription_subconninfo - 1] =

Hmm, looks like a bug. Shouldn't that index say -1?

~~~
4.5. AlterSubscription

+ /*
+ * Abort prepared transactions only if
+ * 'force_alter' option is true. Otherwise raise
+ * an ERROR.
+ */
+ if (IsSet(opts.specified_opts, SUBOPT_FORCE_ALTER))
+ {
+ if (!opts.force_alter)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot alter %s when there are prepared transactions",
+ "two_phase = off"),
+ errhint("Resolve these transactions or set %s, and then try again.",
+ "force_alter = true")));
+ }
+ else
+ {
+ if (!sub->forcealter)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot alter %s when there are prepared transactions",
+ "two_phase = off"),
+ errhint("Resolve these transactions or set %s, and then try again.",
+ "force_alter = true")));
+ }
+

IIUC this code can be simplified to remove the error duplication.
Something like below:

SUGGESTION

bool raise_error = IsSet(opts.specified_opts, SUBOPT_FORCE_ALTER) ?
!opts.force_alter : !sub->forcealter;

if (raise_error)
  ereport(ERROR,
    (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
    errmsg("cannot alter %s when there are prepared transactions",
      "two_phase = off"),
    errhint("Resolve these transactions or set %s, and then try again.",
      "force_alter = true")));

======
src/bin/pg_dump/pg_dump.c

4.6. getSubscriptions

+ if (fout->remoteVersion >= 170000)
+ appendPQExpBufferStr(query,
+ " s.subforcealter\n");
+ else
+ appendPQExpBuffer(query,
+   " false AS subforcealter\n");
+
+

4.6a.
Should this just be combined with the existing "if
(fout->remoteVersion >= 170000)" for failover?

~

4.6b.
Double blank lines.

======
src/bin/psql/describe.c

4.7.
+ if (pset.sversion >= 170000)
+ appendPQExpBuffer(&buf,
+   ", subforcealter AS \"%s\"\n",
+   gettext_noop("Force_alter"));

IMO the column title should be "Force alter" (i.e. without the underscore)

======
src/include/catalog/pg_subscription.h

4.8. CATALOG

+ bool subforcealter; /* True if we allow to drop prepared transactions
+ * when altering two_phase "on"->"off". */

I think this is not actually the description of 'force_alter'. What
you wrote just happens to be the only case that this option currently
works for. Maybe a more correct description is something like below.

SUGGESTION
True allows the ALTER SUBSCRIPTION command to proceed under conditions
that would otherwise result in an error. Currently, 'force_alter' only
has an effect when altering the two_phase option from "true" to
"false".

~~~

4.9. struct Subscription

+ bool forcealter; /* True if we allow to drop prepared
+ * transactions when altering two_phase
+ * "on"->"off". */

Ditto the previous review comment.

======
src/test/regress/expected/subscription.out

4.10.
-
                                           List of subscriptions
-       Name       |           Owner           | Enabled | Publication
| Binary | Streaming | Two-phase commit | Disable on error | Origin |
Password required | Run as owner? | Failover | Synchronous commit |
      Conninfo           | Skip LSN

-------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+--------------------+-----------------------------+----------
- regress_testsub4 | regress_subscription_user | f       | {testpub}
| f      | off       | d                | f                | none   |
t                 | f             | f        | off                |
dbname=regress_doesnotexist | 0/0
+
                                                  List of
subscriptions
+       Name       |           Owner           | Enabled | Publication
| Binary | Streaming | Two-phase commit | Disable on error | Origin |
Password required | Run as owner? | Failover | Force_alter |
Synchronous commit |          Conninfo           | Skip LSN

+------------------+---------------------------+---------+-------------+--------+-----------+------------------+------------------+--------+-------------------+---------------+----------+-------------+--------------------+-----------------------------+----------

The column heading should be "Force alter", as already mentioned by an
earlier review comment (#4.7)

======
src/test/subscription/t/099_twophase_added.pl

4.11.

+# Alter the two_phase with the force_alter option. Apart from the the last
+# ALTER SUBSCRIPTION command, the command will abort the prepared transaction
+# and succeed.

There is typo "the the" and the wording is a bit strange. Why not just say:

SUGGESTION
Alter the two_phase true to false with the force_alter option enabled.
This command will succeed after aborting the prepared transaction.

======
Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: JIT compilation per plan node
Next
From: Michael Paquier
Date:
Subject: Re: [PATCH] Fix bug when calling strncmp in check_authmethod_valid