Re: Logical Replication of sequences - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Logical Replication of sequences
Date
Msg-id CAD21AoDNT4uTDVwg76gZXn9xOX4mNCg-ZVFJHbPdDVvqS8V6PQ@mail.gmail.com
Whole thread Raw
In response to RE: Logical Replication of sequences  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
Responses Re: Logical Replication of sequences
List pgsql-hackers
On Mon, Oct 20, 2025 at 4:59 AM Hayato Kuroda (Fujitsu)
<kuroda.hayato@fujitsu.com> wrote:
>
> > Here is the latest patch set which addressed Shveta[1], Amit[2], Chao[3][4],
> > Dilip[5], Sawada-San's[6] comments.
>
> I found the patch could not pass the sanity check, because 0001 missed a comma.
> Also, there was a duplicated entry for `REFRESH SEQUENCES`.
>
> Attached set fixed the issue.
>

Thank you for updating the patch! I have a few comments on the 0001 patch:

-       appendStringInfo(&cmd, "SELECT DISTINCT n.nspname, c.relname,
gpt.attrs\n"
-                        "       FROM pg_class c\n"
+       appendStringInfo(&cmd, "SELECT DISTINCT n.nspname, c.relname,
gpt.attrs");
+
+       if (support_relkind_seq)
+           appendStringInfo(&cmd, ", c.relkind\n");
+
+       appendStringInfo(&cmd, "   FROM pg_class c\n"
                         "         JOIN pg_namespace n ON n.oid =
c.relnamespace\n"
                         "         JOIN ( SELECT
(pg_get_publication_tables(VARIADIC array_agg(pubname::text))).*\n"
                         "                FROM pg_publication\n"
                         "                WHERE pubname IN ( %s )) AS gpt\n"
                         "             ON gpt.relid = c.oid\n",
                         pub_names->data);

I think we don't necessarily need to avoid getting relkind for servers
older than 19. IIUC the reason why we had to have check_columnlist was
that the attnames column was introduced to pg_publication_tables
catalog. But I think this patch is not the case since we get relkind
from pg_class. That is, I think we can get 4 columns from server >=16.

---
 /*
- * Check and log a warning if the publisher has subscribed to the same table,
- * its partition ancestors (if it's a partition), or its partition children (if
- * it's a partitioned table), from some other publishers. This check is
- * required in the following scenarios:
+ * Check and log a warning if the publisher has subscribed to the same relation
+ * (table or sequence), its partition ancestors (if it's a partition), or its
+ * partition children (if it's a partitioned table), from some other
publishers.
+ * This check is required in the following scenarios:
  *
  * 1) For CREATE SUBSCRIPTION and ALTER SUBSCRIPTION ... REFRESH PUBLICATION
  *    statements with "copy_data = true" and "origin = none":
  *    - Warn the user that data with an origin might have been copied.
- *    - This check is skipped for tables already added, as incremental sync via
- *      WAL allows origin tracking. The list of such tables is in
- *      subrel_local_oids.
+ *    - This check is skipped for tables and sequences already added, as
+ *      incremental sync via WAL allows origin tracking. The list of
such tables
+ *      is in subrel_local_oids.
  *
  * 2) For CREATE SUBSCRIPTION and ALTER SUBSCRIPTION ... REFRESH PUBLICATION
  *    statements with "retain_dead_tuples = true" and "origin = any", and for
@@ -2338,13 +2440,19 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
  *    - Warn the user that only conflict detection info for local changes on
  *      the publisher is retained. Data from other origins may lack sufficient
  *      details for reliable conflict detection.
+ *    - This check targets for tables only.
  *    - See comments atop worker.c for more details.
+ *
+ * 3) For ALTER SUBSCRIPTION ... REFRESH SEQUENCE statements with "origin =
+ *    none":
+ *    - Warn the user that sequence data from another origin might have been
+ *      copied.
  */

While this function is well documented, I find it's quite complex, and
this patch adds to that complexity. The function has 9 arguments,
making it difficult to understand which combinations of arguments
enable which checks. For example, the function header comment doesn't
explain when to use the only_sequences parameter. At first, I thought
only_sequences should be set to true when checking if the publisher
has subscribed to sequences from other publishers, but looking at the
code, I discovered it doesn't check sequences when check_rdt is true:

+   if (walrcv_server_version(wrconn) < 190000 || check_rdt)
+       appendStringInfo(&cmd, query,
+                        "(SELECT relid, TRUE as istable FROM
pg_get_publication_tables(P.pubname))");
+   else if (only_sequences)
+       appendStringInfo(&cmd, query,
+                        "(SELECT relid, FALSE as istable FROM
pg_get_publication_sequences(P.pubname))");
+   else
+       appendStringInfo(&cmd, query,
+                        "(SELECT relid, TRUE as istable FROM
pg_get_publication_tables(P.pubname) UNION ALL"
+                        " SELECT relid, FALSE as istable FROM
pg_get_publication_sequences(P.pubname))");
+

I find that the complexity might stem from checking different cases in
one function,  but I don't have better ideas to improve the logic for
now. I think we can at least describe what the caller can expect from
specifying only_sequence to true.

---
+    * apply workers initilization, and to handle origin creation dynamically

s/initilization/initialization/

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [PATCH v4] parallel pg_restore: avoid disk seeks when jumping short distance forward
Next
From: Tom Lane
Date:
Subject: Re: [PATCH v4] parallel pg_restore: avoid disk seeks when jumping short distance forward