Re: Logical Replication of sequences - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Logical Replication of sequences |
Date | |
Msg-id | CAD21AoDNT4uTDVwg76gZXn9xOX4mNCg-ZVFJHbPdDVvqS8V6PQ@mail.gmail.com Whole thread Raw |
In response to | RE: Logical Replication of sequences ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
Responses |
Re: Logical Replication of sequences
|
List | pgsql-hackers |
On Mon, Oct 20, 2025 at 4:59 AM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > > Here is the latest patch set which addressed Shveta[1], Amit[2], Chao[3][4], > > Dilip[5], Sawada-San's[6] comments. > > I found the patch could not pass the sanity check, because 0001 missed a comma. > Also, there was a duplicated entry for `REFRESH SEQUENCES`. > > Attached set fixed the issue. > Thank you for updating the patch! I have a few comments on the 0001 patch: - appendStringInfo(&cmd, "SELECT DISTINCT n.nspname, c.relname, gpt.attrs\n" - " FROM pg_class c\n" + appendStringInfo(&cmd, "SELECT DISTINCT n.nspname, c.relname, gpt.attrs"); + + if (support_relkind_seq) + appendStringInfo(&cmd, ", c.relkind\n"); + + appendStringInfo(&cmd, " FROM pg_class c\n" " JOIN pg_namespace n ON n.oid = c.relnamespace\n" " JOIN ( SELECT (pg_get_publication_tables(VARIADIC array_agg(pubname::text))).*\n" " FROM pg_publication\n" " WHERE pubname IN ( %s )) AS gpt\n" " ON gpt.relid = c.oid\n", pub_names->data); I think we don't necessarily need to avoid getting relkind for servers older than 19. IIUC the reason why we had to have check_columnlist was that the attnames column was introduced to pg_publication_tables catalog. But I think this patch is not the case since we get relkind from pg_class. That is, I think we can get 4 columns from server >=16. --- /* - * Check and log a warning if the publisher has subscribed to the same table, - * its partition ancestors (if it's a partition), or its partition children (if - * it's a partitioned table), from some other publishers. This check is - * required in the following scenarios: + * Check and log a warning if the publisher has subscribed to the same relation + * (table or sequence), its partition ancestors (if it's a partition), or its + * partition children (if it's a partitioned table), from some other publishers. + * This check is required in the following scenarios: * * 1) For CREATE SUBSCRIPTION and ALTER SUBSCRIPTION ... REFRESH PUBLICATION * statements with "copy_data = true" and "origin = none": * - Warn the user that data with an origin might have been copied. - * - This check is skipped for tables already added, as incremental sync via - * WAL allows origin tracking. The list of such tables is in - * subrel_local_oids. + * - This check is skipped for tables and sequences already added, as + * incremental sync via WAL allows origin tracking. The list of such tables + * is in subrel_local_oids. * * 2) For CREATE SUBSCRIPTION and ALTER SUBSCRIPTION ... REFRESH PUBLICATION * statements with "retain_dead_tuples = true" and "origin = any", and for @@ -2338,13 +2440,19 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId) * - Warn the user that only conflict detection info for local changes on * the publisher is retained. Data from other origins may lack sufficient * details for reliable conflict detection. + * - This check targets for tables only. * - See comments atop worker.c for more details. + * + * 3) For ALTER SUBSCRIPTION ... REFRESH SEQUENCE statements with "origin = + * none": + * - Warn the user that sequence data from another origin might have been + * copied. */ While this function is well documented, I find it's quite complex, and this patch adds to that complexity. The function has 9 arguments, making it difficult to understand which combinations of arguments enable which checks. For example, the function header comment doesn't explain when to use the only_sequences parameter. At first, I thought only_sequences should be set to true when checking if the publisher has subscribed to sequences from other publishers, but looking at the code, I discovered it doesn't check sequences when check_rdt is true: + if (walrcv_server_version(wrconn) < 190000 || check_rdt) + appendStringInfo(&cmd, query, + "(SELECT relid, TRUE as istable FROM pg_get_publication_tables(P.pubname))"); + else if (only_sequences) + appendStringInfo(&cmd, query, + "(SELECT relid, FALSE as istable FROM pg_get_publication_sequences(P.pubname))"); + else + appendStringInfo(&cmd, query, + "(SELECT relid, TRUE as istable FROM pg_get_publication_tables(P.pubname) UNION ALL" + " SELECT relid, FALSE as istable FROM pg_get_publication_sequences(P.pubname))"); + I find that the complexity might stem from checking different cases in one function, but I don't have better ideas to improve the logic for now. I think we can at least describe what the caller can expect from specifying only_sequence to true. --- + * apply workers initilization, and to handle origin creation dynamically s/initilization/initialization/ Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: