Home > mailing lists

RE: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From	wangw.fnst@fujitsu.com
Subject	RE: Data is copied twice when specifying both child and parent table in publication
Date	April 24, 2022 06:16:04
Msg-id	OS3PR01MB6275CC45E9BC9297261865559EF99@OS3PR01MB6275.jpnprd01.prod.outlook.com Whole thread Raw
In response to	Re: Data is copied twice when specifying both child and parent table in publication (Amit Kapila <amit.kapila16@gmail.com>)
Responses	RE: Data is copied twice when specifying both child and parent table in publication RE: Data is copied twice when specifying both child and parent table in publication
List	pgsql-hackers

Tree view

On Thur, Apr 21, 2022 at 5:41 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
Thanks for your comments.

> On Tue, Apr 19, 2022 at 2:23 PM shiy.fnst@fujitsu.com <shiy.fnst@fujitsu.com>
> wrote:
> >
> > On Tue, Apr 19, 2022 3:05 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> > >
> > > One suggestion is that can we simplify the code by moving the logic
> > > of checking the ancestor into the SQL ?. For example, we could
> > > filter the outpout of pg_publication_tables by adding A WHERE clause
> > > which checks whether the table is a partition and if its ancestor is
> > > also in the output. I think we can also filter the needless partition in this
> approach.
> > >
> >
> > I agreed with you and I tried to fix this problem in a simpler way.
> > What we want is to exclude the partitioned table whose ancestor is
> > also need to be replicated, so how about implementing that by using
> > the following SQL when getting the table list from publisher?
> >
> > SELECT DISTINCT ns.nspname, c.relname
> > FROM pg_catalog.pg_publication_tables t JOIN pg_catalog.pg_namespace
> > ns ON ns.nspname = t.schemaname JOIN pg_catalog.pg_class c ON
> > c.relname = t.tablename AND c.relnamespace = ns.oid WHERE t.pubname IN
> > ('p0','p2') AND (c.relispartition IS FALSE OR NOT EXISTS (SELECT 1
> > FROM pg_partition_ancestors(c.oid) WHERE relid IN ( SELECT DISTINCT
> > (schemaname||'.'||tablename)::regclass::oid
> > FROM pg_catalog.pg_publication_tables t WHERE t.pubname IN ('p0','p2')
> > ) AND relid != c.oid));
> >
> > Please find the attached patch which used this approach, I also merged
> > the test in Wang's patch into it.
> >
> 
> I think this will work but do we need "... relid != c.oid" at the end of the query? If
> so, why? Please use an alias for pg_partition_ancestors to make the statement
> understandable.
I think we need this (relid != c.oid). Because when we use function
pg_partition_ancestors(c.oid), its return value not only has ancestors, but
also the input table. That is to say, when we use the table (c.oid) of the
outer query to filter in the sub-query, the table of the outer query will also
appear in the result of the sub-query.
So, I think we need this condition to prevent filtering out itself.

> Now, this solution will work but I find this query a bit complex and will add some
> overhead as we are calling pg_publication_tables multiple times. So, I was
> wondering if we can have a new function pg_get_publication_tables which
> takes multiple publications as input and return the list of qualified tables? I think
> for back branches we need something on the lines of what you have proposed
> but for HEAD we can have a better solution.
Yes, it sounds reasonable to me. Now, to fix this bug:
In the patch for HEAD, add a new function pg_get_publications_tables to get
tables info from a publications array.
In the patch for back-branch (now just share the patch for REL14), modify the
SQL to get tables info.

> IIRC, the column list and row filter also have some issues exactly due to this
> reason, so, I would like those cases to be also mentioned here and probably
> include the tests for them in the patch for HEAD.
Improve the test case about the column list and row filter to cover this bug.

Attach the new patches.[suggestions by Amit-San]
The patch for HEAD:
1. Add a new function to get tables info by a publications array.
The patch for REL14:
1. Use an alias to make the statement understandable. BTW, I adjusted the alignment.
2. Improve the test cast about the column list and row filter to cover this bug.

Regards,
Wang wei

Attachment

pgsql-hackers by date:

From: David Rowley
Date: 24 April 2022, 05:16:25
Subject: Re: Use generation context to speed up tuplesorts

From: Andrew Dunstan
Date: 24 April 2022, 17:09:08
Subject: Re: pgsql: Allow db.schema.table patterns, but complain about random garbag

RE: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

Attachment

Previous

Next