RE: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From wangw.fnst@fujitsu.com
Subject RE: Data is copied twice when specifying both child and parent table in publication
Date
Msg-id OS3PR01MB627506056B78D533FBB331DC9E9E9@OS3PR01MB6275.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: Data is copied twice when specifying both child and parent table in publication  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: Data is copied twice when specifying both child and parent table in publication
Re: Data is copied twice when specifying both child and parent table in publication
List pgsql-hackers
On Thur, Jul 28, 2022 at 17:17 PM Peter Smith <smithpb2250@gmail.com> wrote:
> Here are some review comments for the HEAD_v7-0001 patch:

Thanks for your comments.

> 2. Commit message.
> 
> 2a.
> 
> If there are two publications that publish the parent table and the child table
> separately, and both specify the option publish_via_partition_root, subscribing
> to both publications from one subscription causes initial copy twice.
> 
> SUGGESTION
> If there are two publications that publish the parent table and the child table
> respectively, but both specify publish_via_partition_root = true, subscribing
> to both publications from one subscription causes initial copy twice.
> 
> 2b. <General>
> 
> Actually, isn't it more subtle than what that comment is describing?
> Maybe nobody is explicitly publishing a parent table at all. Maybe
> pub1 publishes partition1 and pub2 publishes partition2, but both
> publications are using publish_via_partition_root = true. Is this
> scenario even tested? Does the logic of pg_get_publication_tables
> cover this scenario?

=>2a.
Okay, changed it as suggested.

=>2b.
This is not the case we are trying to fix. The problematic scenario is when the
a parent table is published via root partitioned table and in this case we need
to ignore other partitions. And I try to improve the commit message to make it
clear.

> 4.
> 
> + /* Filter by final published table. */
> + foreach(lc, results)
> + {
> + Oid *table_info = (Oid *) lfirst(lc);
> +
> + if (!list_member_oid(tables, table_info[0]))
> + results = foreach_delete_current(results, lc);
>   }
> 
> The comment did not convey enough meaning. Can you make it more
> descriptive to explain why/what the logic is doing here?

I think the comments above `tables = filter_partitions(tables);` explain this.

> 5. src/backend/commands/subscriptioncmds.c - fetch_table_list
> 
>   /* Get column lists for each relation if the publisher supports it */
> - if (check_columnlist)
> - appendStringInfoString(&cmd, ", t.attnames\n");
> + if (server_version >= 160000)
> + appendStringInfo(&cmd, "SELECT DISTINCT n.nspname, c.relname,\n"
> 
> That comment is exactly the same as it was before the patch. But it
> doesn't seem quite appropriate anymore for this new condition and this
> new query.

Improved the comments as following:
```
Get information of the tables belonging to the specified publications
```

The rest of the comments are improved as suggested.
I also rebased the patch based on the commit (0c20dd3) on HEAD, and made some
changes to the back-branch patches based on some of Peter's comments.

Attach the new patches.

Regards,
Wang wei

Attachment

pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Patch to address creation of PgStat* contexts with null parent context
Next
From: Thomas Munro
Date:
Subject: Re: Checking pgwin32_is_junction() errors