RE: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers
From | shiy.fnst@fujitsu.com |
---|---|
Subject | RE: Data is copied twice when specifying both child and parent table in publication |
Date | |
Msg-id | OSZPR01MB63109BE4F5521C542A75AC9BFDF29@OSZPR01MB6310.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | RE: Data is copied twice when specifying both child and parent table in publication ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>) |
Responses |
RE: Data is copied twice when specifying both child and parent table in publication
Re: Data is copied twice when specifying both child and parent table in publication |
List | pgsql-hackers |
On Tue, Apr 19, 2022 3:05 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > > -----Original Message----- > > From: Wang, Wei/王 威 <wangw.fnst@fujitsu.com> > On Thursday, April 7, 2022 11:08 AM > > > > On Thur, Mar 10, 2021 at 10:08 AM houzj.fnst@fujitsu.com wrote: > > > Hi, > > > > > > When reviewing some logical replication related features. I noticed another > > > possible problem if the subscriber subscribes multiple publications which > > > publish parent and child table. > > > > > > For example: > > > > > > ----pub > > > create table t (a int, b int, c int) partition by range (a); > > > create table t_1 partition of t for values from (1) to (10); > > > > > > create publication pub1 for table t > > > with (PUBLISH_VIA_PARTITION_ROOT); > > > create publication pub2 for table t_1 > > > with (PUBLISH_VIA_PARTITION_ROOT); > > > > > > ----sub > > > ---- prepare table t and t_1 > > > CREATE SUBSCRIPTION sub CONNECTION 'port=10000 dbname=postgres' > > > PUBLICATION pub1, pub2; > > > > > > select * from pg_subscription_rel ; > > > srsubid | srrelid | srsubstate | srsublsn > > > ---------+---------+------------+----------- > > > 16391 | 16385(t) | r | 0/150D100 > > > 16391 | 16388(t_1) | r | 0/150D138 > > > > > > If subscribe two publications one of them publish parent table with > > > (pubviaroot=true) and another publish child table. Both the parent table and > > > child table will exist in pg_subscription_rel which also means we will do > > > initial copy for both tables. > > > > > > But after initial copy, we only publish change with the schema of the parent > > > table(t). It looks a bit inconsistent. > > > > > > Based on the document of PUBLISH_VIA_PARTITION_ROOT option. I think > > the > > > expected behavior could be we only store the top most parent(table t) in > > > pg_subscription_rel and do initial copy for it if pubviaroot is on. I haven't > > > thought about how to fix this and will investigate this later. > > Hi, > > I try to fix this bug. Attach the patch. > > > > The current HEAD get table list for one publication by invoking function > > pg_get_publication_tables. If multiple publications are subscribed, then this > > function is invoked multiple times. So option PUBLISH_VIA_PARTITION_ROOT > > works > > independently on every publication, I think it does not work correctly on > > different publications of the same subscription. > > > > So I fix this bug by the following two steps: > > First step, > > I get oids of subscribed tables by publication list. Then for tables with the > > same topmost root table, I filter them base on the option > > PUBLISH_VIA_PARTITION_ROOT(see new function filter_partitions_oids). > > After filtering, I get the final oid list. > > Second step, > > I get the required informations(nspname and relname) base on the oid list of > > first step. > > Thanks for updating the patch. > I confirmed that the bug is fixed by this patch. > > One suggestion is that can we simplify the code by moving the logic of checking > the ancestor into the SQL ?. For example, we could filter the outpout of > pg_publication_tables by adding A WHERE clause which checks whether the table > is a partition and if its ancestor is also in the output. I think we can also > filter the needless partition in this approach. > I agreed with you and I tried to fix this problem in a simpler way. What we want is to exclude the partitioned table whose ancestor is also need to be replicated, so how about implementing that by using the following SQL when getting the table list from publisher? SELECT DISTINCT ns.nspname, c.relname FROM pg_catalog.pg_publication_tables t JOIN pg_catalog.pg_namespace ns ON ns.nspname = t.schemaname JOIN pg_catalog.pg_class c ON c.relname = t.tablename AND c.relnamespace = ns.oid WHERE t.pubname IN ('p0','p2') AND (c.relispartition IS FALSE OR NOT EXISTS (SELECT 1 FROM pg_partition_ancestors(c.oid) WHERE relid IN ( SELECT DISTINCT (schemaname||'.'||tablename)::regclass::oid FROM pg_catalog.pg_publication_tables t WHERE t.pubname IN ('p0','p2') ) AND relid != c.oid)); Please find the attached patch which used this approach, I also merged the test in Wang's patch into it. Regards, Shi yu
Attachment
pgsql-hackers by date: