Re: pg_upgrade and logical replication - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: pg_upgrade and logical replication
Date
Msg-id ZUHXaatBQOHIk_K0@paquier.xyz
Whole thread Raw
In response to Re: pg_upgrade and logical replication  (vignesh C <vignesh21@gmail.com>)
Responses Re: pg_upgrade and logical replication
List pgsql-hackers
On Mon, Oct 30, 2023 at 03:05:09PM +0530, vignesh C wrote:
> The patch was not applying because of recent commits. Here is a
> rebased version of the patches.

+     * We don't want the launcher to run while upgrading because it may start
+     * apply workers which could start receiving changes from the publisher
+     * before the physical files are put in place, causing corruption on the
+     * new cluster upgrading to, so setting max_logical_replication_workers=0
+     * to disable launcher.
      */
     if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
-        appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1");
+        appendPQExpBufferStr(&pgoptions, " -c max_slot_wal_keep_size=-1 -c max_logical_replication_workers=0");

At least that's consistent with the other side of the coin with
publications.  So 0001 looks basically OK seen from here.

The indentation of 0002 seems off in a few places.

+    <para>
+     Verify that all the subscription tables in the old subscriber are in
+     <literal>r</literal> (ready) state. Setup the
+     <link linkend="logical-replication-config-subscriber"> subscriber
+     configurations</link> in the new subscriber.
[...]
+    <para>
+     There is a prerequisites that all the subscription tables should be in
+     <literal>r</literal> (ready) state for
+     <application>pg_upgrade</application> to be able to upgrade the
+     subscriber. If this is not met an error will be reported.
+    </para>

This part is repeated.  Globally, this documentation addition does not
seem really helpful for the end-user as it describes the checks that
are done during the upgrade.  Shouldn't this part of the docs,
similarly to the publication part, focus on providing a check list of
actions to take to achieve a clean upgrade, with a list of commands
and configurations required?  The good part is that information about
what's copied is provided (pg_subscription_rel and the origin status),
still this could be improved.

+    <para>
+     Enable the subscriptions by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... ENABLE</command></link>.
+    </para>

This is something users can act on, but how does this operation help
with the upgrade?  Should this happen for all the descriptions
subscriptions?  Or you mean that this is something that needs to be
run after the upgrade?

+    <para>
+     Create all the new tables that were created in the publication and
+     refresh the publication by executing
+     <link linkend="sql-altersubscription"><command>ALTER SUBSCRIPTION ... REFRESH PUBLICATION</command></link>.
+    </para>

What does "new tables" refer to in this case?  Are you referring to
the case where new relations have been added on a publication node
after an upgrade and need to be copied?  Does one need to DISABLE the
subscriptions on the subscriber node before running the upgrade, or is
a REFRESH enough?  The test only uses a REFRESH, so the docs and the
code don't entirely agree with each other.

+  <para>
+   For upgradation of the subscriptions, all the subscription tables should be
+   in <literal>r</literal> (ready) state, or else the
+   <application>pg_upgrade</application> run will error.
+  </para>

"Upgradation"?

+# Set tables to 'i' state
+$old_sub->safe_psql(
+    'postgres',
+    "UPDATE pg_subscription_rel
+        SET srsubstate = 'i' WHERE srsubstate = 'r'");

I am not sure that doing catalog manipulation in the TAP test itself
is a good idea, because this can finish by being unpredictible in the
long-term for the test maintenance.  I think that this portion of the
test should just be removed.  poll_query_until() or wait queries
making sure that all the relations are in the state we want them to be
before the beginning of the upgrade is enough in terms of test
coverag, IMO.

+$result = $new_sub->safe_psql('postgres',
+    "SELECT remote_lsn FROM pg_replication_origin_status");

This assumes one row, but perhaps this had better do a match based on
external_id and/or local_id?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: pg_upgrade and logical replication
Next
From: Richard Guo
Date:
Subject: Re: Support "Right Semi Join" plan shapes