RE: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From wangw.fnst@fujitsu.com
Subject RE: Data is copied twice when specifying both child and parent table in publication
Date
Msg-id OS3PR01MB6275CD6236DEB022C0785E9B9EE69@OS3PR01MB6275.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: Data is copied twice when specifying both child and parent table in publication  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Responses RE: Data is copied twice when specifying both child and parent table in publication  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Thur, Mar 10, 2021 at 10:08 AM houzj.fnst@fujitsu.com wrote:
> Hi,
> 
> When reviewing some logical replication related features. I noticed another
> possible problem if the subscriber subscribes multiple publications which
> publish parent and child table.
> 
> For example:
> 
> ----pub
> create table t (a int, b int, c int) partition by range (a);
> create table t_1 partition of t for values from (1) to (10);
> 
> create publication pub1 for table t
>   with (PUBLISH_VIA_PARTITION_ROOT);
> create publication pub2 for table t_1
>   with (PUBLISH_VIA_PARTITION_ROOT);
> 
> ----sub
> ---- prepare table t and t_1
> CREATE SUBSCRIPTION sub CONNECTION 'port=10000 dbname=postgres'
> PUBLICATION pub1, pub2;
> 
> select * from pg_subscription_rel ;
>  srsubid | srrelid | srsubstate | srsublsn
> ---------+---------+------------+-----------
>    16391 |   16385(t) | r          | 0/150D100
>    16391 |   16388(t_1) | r          | 0/150D138
> 
> If subscribe two publications one of them publish parent table with
> (pubviaroot=true) and another publish child table. Both the parent table and
> child table will exist in pg_subscription_rel which also means we will do
> initial copy for both tables.
> 
> But after initial copy, we only publish change with the schema of the parent
> table(t). It looks a bit inconsistent.
> 
> Based on the document of PUBLISH_VIA_PARTITION_ROOT option. I think the
> expected behavior could be we only store the top most parent(table t) in
> pg_subscription_rel and do initial copy for it if pubviaroot is on. I haven't
> thought about how to fix this and will investigate this later.
Hi,
I try to fix this bug. Attach the patch.

The current HEAD get table list for one publication by invoking function
pg_get_publication_tables. If multiple publications are subscribed, then this
function is invoked multiple times. So option PUBLISH_VIA_PARTITION_ROOT works
independently on every publication, I think it does not work correctly on
different publications of the same subscription.

So I fix this bug by the following two steps:
First step,
I get oids of subscribed tables by publication list. Then for tables with the
same topmost root table, I filter them base on the option
PUBLISH_VIA_PARTITION_ROOT(see new function filter_partitions_oids).
After filtering, I get the final oid list.
Second step,
I get the required informations(nspname and relname) base on the oid list of
first step.

Regards,
Wang wei

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: How about a psql backslash command to show GUCs?
Next
From: David Rowley
Date:
Subject: Re: How about a psql backslash command to show GUCs?