RE: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From wangw.fnst@fujitsu.com
Subject RE: Data is copied twice when specifying both child and parent table in publication
Date
Msg-id OS3PR01MB6275350A29E2A3B9156D08089E869@OS3PR01MB6275.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: Data is copied twice when specifying both child and parent table in publication  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
List pgsql-hackers
On Wed, Mar 22, 2023 at 14:32 PM Kuroda, Hayato/黒田 隼人 <kuroda.hayato@fujitsu.com> wrote:
> Dear Wang,
> 
> Thank you for updating patch! Following are comments form v19-0001.

Thanks for your comments.

> 01. logical-replication.sgml
> 
> I found a following statement in logical-replication.sgml. I think this may cause
> mis-reading because it's OK when publishers list partitions and publish_via_root
> is true.
> 
> ```
>   <para>
>    A subscriber node may have multiple subscriptions if desired.  It is
>    possible to define multiple subscriptions between a single
>    publisher-subscriber pair, in which case care must be taken to ensure
>    that the subscribed publication objects don't overlap.
>   </para>
> ```
> 
> How about adding "If publications are set publish_via_partition_root as true and
> they publish partitions that have same partitioned table, only a change to
> partitioned
> table is published from the publisher."or something like that?

I think these seem to be two different scenarios: The scenario mentioned here is
multiple subscriptions at the subscription node, while the scenario we fixed
this time is a single subscription at the subscription node. So, it seems that
these two notes are not strongly related.

> 02. filter_partitions
> 
> IIUC this function can refactor like following to avoid "skip" flag.
> How do you think?
> 
> ```
> @@ -209,7 +209,6 @@ filter_partitions(List *table_infos)
> 
>         foreach(lc, table_infos)
>         {
> -               bool                            skip = false;
>                 List                       *ancestors = NIL;
>                 ListCell                   *lc2;
>                 published_rel      *table_info = (published_rel *) lfirst(lc);
> @@ -224,13 +223,10 @@ filter_partitions(List *table_infos)
>                         /* Is ancestor exists in the published table list? */
>                         if (is_ancestor_member_tableinfos(ancestor, table_infos))
>                         {
> -                               skip = true;
> +                               table_infos = foreach_delete_current(table_infos, lc);
>                                 break;
>                         }
>                 }
> -
> -               if (skip)
> -                       table_infos = foreach_delete_current(table_infos, lc);
>         }
>  }
> ```

I think this approach deletes the cell of the list of the outer loop in the
inner loop. IIUC, we can only use function foreach_delete_current in the current
loop to delete the cell of the current loop.

> 03. fetch_table_list
> 
> ```
> +       /* Get the list of tables from the publisher. */
> +       if (server_version >= 160000)
> ```
> 
> I think boolean variable can be used to check it like check_columnlist.
> How about "use_extended_function" or something?

Since we only need it once, I think it's fine not to add a new variable.

Regards,
Wang Wei

pgsql-hackers by date:

Previous
From: "wangw.fnst@fujitsu.com"
Date:
Subject: RE: Data is copied twice when specifying both child and parent table in publication
Next
From: Guillaume Lelarge
Date:
Subject: Re: Issue attaching a table to a partitioned table with an auto-referenced foreign key