Home > mailing lists

Re: Logical Replication of sequences - Mailing list pgsql-hackers

From	vignesh C
Subject	Re: Logical Replication of sequences
Date	October 21, 2025 17:41:35
Msg-id	CALDaNm2Tx6FXyuDFj3L8nWfwMQQ2=wuVN0txA9boscbK58HxLQ@mail.gmail.com Whole thread Raw
In response to	Re: Logical Replication of sequences (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses	Re: Logical Replication of sequences Re: Logical Replication of sequences Re: Logical Replication of sequences
List	pgsql-hackers

Tree view

On Tue, 21 Oct 2025 at 03:49, Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Mon, Oct 20, 2025 at 4:59 AM Hayato Kuroda (Fujitsu)
> <kuroda.hayato@fujitsu.com> wrote:
> >
> > > Here is the latest patch set which addressed Shveta[1], Amit[2], Chao[3][4],
> > > Dilip[5], Sawada-San's[6] comments.
> >
> > I found the patch could not pass the sanity check, because 0001 missed a comma.
> > Also, there was a duplicated entry for `REFRESH SEQUENCES`.
> >
> > Attached set fixed the issue.
> >
>
> Thank you for updating the patch! I have a few comments on the 0001 patch:
>
> -       appendStringInfo(&cmd, "SELECT DISTINCT n.nspname, c.relname,
> gpt.attrs\n"
> -                        "       FROM pg_class c\n"
> +       appendStringInfo(&cmd, "SELECT DISTINCT n.nspname, c.relname,
> gpt.attrs");
> +
> +       if (support_relkind_seq)
> +           appendStringInfo(&cmd, ", c.relkind\n");
> +
> +       appendStringInfo(&cmd, "   FROM pg_class c\n"
>                          "         JOIN pg_namespace n ON n.oid =
> c.relnamespace\n"
>                          "         JOIN ( SELECT
> (pg_get_publication_tables(VARIADIC array_agg(pubname::text))).*\n"
>                          "                FROM pg_publication\n"
>                          "                WHERE pubname IN ( %s )) AS gpt\n"
>                          "             ON gpt.relid = c.oid\n",
>                          pub_names->data);
>
> I think we don't necessarily need to avoid getting relkind for servers
> older than 19. IIUC the reason why we had to have check_columnlist was
> that the attnames column was introduced to pg_publication_tables
> catalog. But I think this patch is not the case since we get relkind
> from pg_class. That is, I think we can get 4 columns from server >=16.

Modified

> ---
>  /*
> - * Check and log a warning if the publisher has subscribed to the same table,
> - * its partition ancestors (if it's a partition), or its partition children (if
> - * it's a partitioned table), from some other publishers. This check is
> - * required in the following scenarios:
> + * Check and log a warning if the publisher has subscribed to the same relation
> + * (table or sequence), its partition ancestors (if it's a partition), or its
> + * partition children (if it's a partitioned table), from some other
> publishers.
> + * This check is required in the following scenarios:
>   *
>   * 1) For CREATE SUBSCRIPTION and ALTER SUBSCRIPTION ... REFRESH PUBLICATION
>   *    statements with "copy_data = true" and "origin = none":
>   *    - Warn the user that data with an origin might have been copied.
> - *    - This check is skipped for tables already added, as incremental sync via
> - *      WAL allows origin tracking. The list of such tables is in
> - *      subrel_local_oids.
> + *    - This check is skipped for tables and sequences already added, as
> + *      incremental sync via WAL allows origin tracking. The list of
> such tables
> + *      is in subrel_local_oids.
>   *
>   * 2) For CREATE SUBSCRIPTION and ALTER SUBSCRIPTION ... REFRESH PUBLICATION
>   *    statements with "retain_dead_tuples = true" and "origin = any", and for
> @@ -2338,13 +2440,19 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
>   *    - Warn the user that only conflict detection info for local changes on
>   *      the publisher is retained. Data from other origins may lack sufficient
>   *      details for reliable conflict detection.
> + *    - This check targets for tables only.
>   *    - See comments atop worker.c for more details.
> + *
> + * 3) For ALTER SUBSCRIPTION ... REFRESH SEQUENCE statements with "origin =
> + *    none":
> + *    - Warn the user that sequence data from another origin might have been
> + *      copied.
>   */
>
> While this function is well documented, I find it's quite complex, and
> this patch adds to that complexity. The function has 9 arguments,
> making it difficult to understand which combinations of arguments
> enable which checks. For example, the function header comment doesn't
> explain when to use the only_sequences parameter. At first, I thought
> only_sequences should be set to true when checking if the publisher
> has subscribed to sequences from other publishers, but looking at the
> code, I discovered it doesn't check sequences when check_rdt is true:
>
> +   if (walrcv_server_version(wrconn) < 190000 || check_rdt)
> +       appendStringInfo(&cmd, query,
> +                        "(SELECT relid, TRUE as istable FROM
> pg_get_publication_tables(P.pubname))");
> +   else if (only_sequences)
> +       appendStringInfo(&cmd, query,
> +                        "(SELECT relid, FALSE as istable FROM
> pg_get_publication_sequences(P.pubname))");
> +   else
> +       appendStringInfo(&cmd, query,
> +                        "(SELECT relid, TRUE as istable FROM
> pg_get_publication_tables(P.pubname) UNION ALL"
> +                        " SELECT relid, FALSE as istable FROM
> pg_get_publication_sequences(P.pubname))");
> +
>
> I find that the complexity might stem from checking different cases in
> one function,  but I don't have better ideas to improve the logic for
> now. I think we can at least describe what the caller can expect from
> specifying only_sequence to true.

Split this function into check_publications_origin_sequences and
check_publications_origin_tables to reduce the complexity. After this
change we log two warnings if both tables and sequences are subscriber
to the same tables and sequences like:
WARNING:  subscription "sub1" requested copy_data with origin = NONE
but might copy data that had a different origin
DETAIL:  The subscription subscribes to a publication ("pub1") that
contains tables that are written to by other subscriptions.
HINT:  Verify that initial data copied from the publisher tables did
not come from other origins.
WARNING:  subscription "sub1" requested copy_data with origin = NONE
but might copy data that had a different origin
DETAIL:  The subscription subscribes to a publication ("pub1") that
contains sequences that are written to by other subscriptions.
HINT:  Verify that initial data copied from the publisher sequences
did not come from other origins.

I felt it is ok, just highlighting here as this behavior will be seen
in the new version. Here it will clearly mention that first message is
from tables and second one is from sequences.

> ---
> +    * apply workers initilization, and to handle origin creation dynamically
>
> s/initilization/initialization/

Modified

Also Amit's comments from [1], Shveta's comments from [2] have been
handled and Shveta's first patch comments from [3] has been handled.

The attached patch has the changes for the same.

[1] - https://www.postgresql.org/message-id/CAA4eK1Kobtqw3gf7RupcV%3DMYS53iHjgshVHQFTfL-n83XxucPg%40mail.gmail.com
[2] - https://www.postgresql.org/message-id/CAJpy0uCvHmDdp7W1kDvVEwFT2%2BV9gnj5boE5%2BmtUqVknE9-bvQ%40mail.gmail.com
[3] - https://www.postgresql.org/message-id/CAJpy0uDxms8ynrHWXHGFuxDLA7QzDLLASqpqdWnD%3DLa8UJPt7Q%40mail.gmail.com

Regards,
Vignesh

Attachment

pgsql-hackers by date:

From: Tom Lane
Date: 21 October 2025, 17:39:57
Subject: Re: CI: Add task that runs pgindent

From: Tomas Vondra
Date: 21 October 2025, 17:43:08
Subject: Re: Optimizing ResouceOwner to speed up COPY

Re: Logical Replication of sequences - Mailing list pgsql-hackers

Attachment

Previous

Next