RE: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From wangw.fnst@fujitsu.com
Subject RE: Data is copied twice when specifying both child and parent table in publication
Date
Msg-id OS3PR01MB6275D2940C734AF8C77435739E819@OS3PR01MB6275.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Data is copied twice when specifying both child and parent table in publication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Data is copied twice when specifying both child and parent table in publication  (Peter Smith <smithpb2250@gmail.com>)
RE: Data is copied twice when specifying both child and parent table in publication  ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>)
List pgsql-hackers
On Mon, Mar 20, 2023 at 18:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>

Thanks for your comments.

> On Mon, Mar 20, 2023 at 1:02 PM Peter Smith <smithpb2250@gmail.com>
> wrote:
> >
> >
> > ======
> > src/include/catalog/pg_proc.dat
> >
> > 4.
> > +{ oid => '6119',
> > +  descr => 'get information of the tables in the given publication array',
> >
> > Should that be worded in a way to make it more clear that the
> > "publication array" is really an "array of publication names"?
> >
> 
> I don't know how important it is to tell that the array is an array of
> publication names but the current description can be improved. How
> about something like: "get information of the tables that are part of
> the specified publications"

Changed.

> Few other comments:
> =================
> 1.
>   foreach(lc2, ancestors)
>   {
>   Oid ancestor = lfirst_oid(lc2);
> + ListCell   *lc3;
> 
>   /* Check if the parent table exists in the published table list. */
> - if (list_member_oid(relids, ancestor))
> + foreach(lc3, table_infos)
>   {
> - skip = true;
> - break;
> + Oid relid = ((published_rel *) lfirst(lc3))->relid;
> +
> + if (relid == ancestor)
> + {
> + skip = true;
> + break;
> + }
>   }
> +
> + if (skip)
> + break;
>   }
> 
> - if (!skip)
> - result = lappend_oid(result, relid);
> + if (skip)
> + table_infos = foreach_delete_current(table_infos, lc);
> 
> The usage of skip looks a bit ugly to me. Can we move the code for the
> inner loop to a separate function (like
> is_ancestor_member_tableinfos()) and remove the current cell if it
> returns true?

Changed.

> 2.
>   * Filter out the partitions whose parent tables were also specified in
>   * the publication.
>   */
> -static List *
> -filter_partitions(List *relids)
> +static void
> +filter_partitions(List *table_infos)
> 
> The comment atop filter_partitions is no longer valid. Can we slightly
> change it to: "Filter out the partitions whose parent tables are also
> present in the list."?

Changed.

> 3.
> -# Note: We create two separate tables, not a partitioned one, so that we can
> -# easily identity through which relation were the changes replicated.
> +# Note: We only create one table (tab4) here. We specified
> +# publish_via_partition_root = true (see pub_all and pub_lower_level above), so
> +# all data will be replicated to that table.
>  $node_subscriber2->safe_psql('postgres',
>   "CREATE TABLE tab4 (a int PRIMARY KEY)");
> -$node_subscriber2->safe_psql('postgres',
> - "CREATE TABLE tab4_1 (a int PRIMARY KEY)");
> 
> I am not sure if it is a good idea to remove tab4_1 here. It is
> testing something different as mentioned in the comments. Also, I
> don't see any data in tab4 for the initial sync, so not sure if this
> tests the behavior changed by this patch.

Reverted this change. And inserted the initial sync data into table tab4 to test
this more clearly.

> 4.
> --- a/src/test/subscription/t/031_column_list.pl
> +++ b/src/test/subscription/t/031_column_list.pl
> @@ -959,7 +959,8 @@ $node_publisher->safe_psql(
>   CREATE TABLE test_root_1 PARTITION OF test_root FOR VALUES FROM (1) TO
> (10);
>   CREATE TABLE test_root_2 PARTITION OF test_root FOR VALUES FROM (10) TO
> (20);
> 
> - CREATE PUBLICATION pub_root_true FOR TABLE test_root (a) WITH
> (publish_via_partition_root = true);
> + CREATE PUBLICATION pub_root_true_1 FOR TABLE test_root (a) WITH
> (publish_via_partition_root = true);
> + CREATE PUBLICATION pub_root_true_2 FOR TABLE test_root_1 (a, b) WITH
> (publish_via_partition_root = true);
> 
>   -- initial data
>   INSERT INTO test_root VALUES (1, 2, 3);
> @@ -968,7 +969,7 @@ $node_publisher->safe_psql(
> 
>  $node_subscriber->safe_psql(
>   'postgres', qq(
> - CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION
> pub_root_true;
> + CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION
> pub_root_true_1, pub_root_true_2;
> 
> It is not clear to me what exactly you want to test here. Please add
> some comments.

Tried to add the following comment to make it clear:
```
+# Subscribe to pub_root_true_1 and pub_root_true_2 at the same time, which
+# means that the initial data will be synced once, and only the column list of
+# the parent table (test_root) in the publication pub_root_true_1 will be used
+# for both table sync and data replication.
 $node_subscriber->safe_psql(
     'postgres', qq(
-    CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub_root_true;
+    CREATE SUBSCRIPTION sub1 CONNECTION '$publisher_connstr' PUBLICATION pub_root_true_1, pub_root_true_2;
```

> 5. I think you can merge the 0001 and 0003 patches.

Merged.

> Apart from the above, attached is a patch to change some of the
> comments in the patch.

Thanks for this improvement. I've checked and merged it.

Attach the new patch set.

Regards,
Wang Wei

Attachment

pgsql-hackers by date:

Previous
From: Eske Rahn
Date:
Subject: Options to rowwise persist result of stable/immutable function with RECORD result
Next
From: "wangw.fnst@fujitsu.com"
Date:
Subject: RE: Data is copied twice when specifying both child and parent table in publication