Re: Data is copied twice when specifying both child and parent table in publication - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Data is copied twice when specifying both child and parent table in publication
Date
Msg-id CAHut+Pt+1PNx6VsZ-xKzAU-18HmNXhjCC1TGakKX46Wg7YNT1Q@mail.gmail.com
Whole thread Raw
In response to RE: Data is copied twice when specifying both child and parent table in publication  ("wangw.fnst@fujitsu.com" <wangw.fnst@fujitsu.com>)
Responses RE: Data is copied twice when specifying both child and parent table in publication  ("wangw.fnst@fujitsu.com" <wangw.fnst@fujitsu.com>)
List pgsql-hackers
On Mon, Oct 17, 2022 at 4:49 PM wangw.fnst@fujitsu.com
<wangw.fnst@fujitsu.com> wrote:
>
> On Wed, Oct 5, 2022 at 11:08 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > Hi Wang-san. Here are my review comments for HEAD_v12-0001 patch.
>
...
> >
> > 3. QUESTION - pg_get_publication_tables / fetch_table_list
> >
> > When the same table is published by different publications (but there
> > are other differences like row-filters/column-lists in each
> > publication) the result tuple of this function does not include the
> > pubid. Maybe the SQL of pg_publication_tables/fetch_table_list() is OK
> > as-is but how does it manage to associate each table with the correct
> > tuple?
> >
> > I know it apparently all seems to work but I’m not how does that
> > happen? Can you explain why a puboid is not needed for the result
> > tuple of this function?
>
> Sorry, I am not sure I understand your question.
> I try to answer your question by explaining the two functions you mentioned:
>
> First, the function pg_get_publication_tables gets the list (see table_infos)
> that included published table and the corresponding publication. Then based
> on this list, the function pg_get_publication_tables returns information
> (scheme, relname, row filter and column list) about the published tables in the
> publications list. It just doesn't return pubid.
>
> Then, the SQL in the function fetch_table_list will get the columns in the
> column list from pg_attribute. (This is to return all columns when the column
> list is not specified)
>

I meant, for example, if the different publications specified
different col-lists for the same table then IIUC the
fetch_table_lists() is going to return 2 list elements
(schema,rel_name,row_filter,col_list). But when the schema/rel_name
are the same for 2 elements then (without also a pubid) how are you
going to know where the list element came from, and how come that is
not important?

> > ~~
> >
> > test_pub=# create table t1(a int, b int, c int);
> > CREATE TABLE
> > test_pub=# create publication pub1 for table t1(a) where (a > 99);
> > CREATE PUBLICATION
> > test_pub=# create publication pub2 for table t1(a,b) where (b < 33);
> > CREATE PUBLICATION
> >
> > Following seems OK when I swap orders of publication names...
> >
> > test_pub=# SELECT gpt.relid, gpt.attrs, pg_get_expr(gpt.qual,
> > gpt.relid) AS rowfilter from pg_get_publication_tables(VARIADIC
> > ARRAY['pub2','pub1']) gpt(relid, attrs, qual);
> >  relid | attrs | rowfilter
> > -------+-------+-----------
> >  16385 | 1 2   | (b < 33)
> >  16385 | 1     | (a > 99)
> > (2 rows)
> >
> > test_pub=# SELECT gpt.relid, gpt.attrs, pg_get_expr(gpt.qual,
> > gpt.relid) AS rowfilter from pg_get_publication_tables(VARIADIC
> > ARRAY['pub1','pub2']) gpt(relid, attrs, qual);
> >  relid | attrs | rowfilter
> > -------+-------+-----------
> >  16385 | 1     | (a > 99)
> >  16385 | 1 2   | (b < 33)
> > (2 rows)
> >
> > But what about this (this is similar to the SQL fragment from
> > fetch_table_list); I swapped the pub names but the results are the
> > same...
> >
> > test_pub=# SELECT pg_get_publication_tables(VARIADIC
> > array_agg(p.pubname)) from pg_publication p where pubname
> > IN('pub2','pub1');
> >
> >                      pg_get_publication_tables
> >
> > -------------------------------------------------------------------------------------------------
> > ---------------------------------------------------------------------
> > -------------------------------------------------------------------------------------------------
> > ---------------------------------------------------------------------
> > -------------------------------------------------------------------
> >  (16385,1,"{OPEXPR :opno 521 :opfuncid 147 :opresulttype 16 :opretset
> > false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 :varattno 1
> > :vartype 23 :vartypmod -1 :var
> > collid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 1 :location 47}
> > {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4
> > :constbyval true :constisnull false :
> > location 51 :constvalue 4 [ 99 0 0 0 0 0 0 0 ]}) :location 49}")
> >  (16385,"1 2","{OPEXPR :opno 97 :opfuncid 66 :opresulttype 16
> > :opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1
> > :varattno 2 :vartype 23 :vartypmod -1 :v
> > arcollid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 2 :location 49}
> > {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4
> > :constbyval true :constisnull false
> >  :location 53 :constvalue 4 [ 33 0 0 0 0 0 0 0 ]}) :location 51}")
> > (2 rows)
> >
> > test_pub=# SELECT pg_get_publication_tables(VARIADIC
> > array_agg(p.pubname)) from pg_publication p where pubname
> > IN('pub1','pub2');
> >
> >                      pg_get_publication_tables
> >
> > -------------------------------------------------------------------------------------------------
> > ---------------------------------------------------------------------
> > -------------------------------------------------------------------------------------------------
> > ---------------------------------------------------------------------
> > -------------------------------------------------------------------
> >  (16385,1,"{OPEXPR :opno 521 :opfuncid 147 :opresulttype 16 :opretset
> > false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1 :varattno 1
> > :vartype 23 :vartypmod -1 :var
> > collid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 1 :location 47}
> > {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4
> > :constbyval true :constisnull false :
> > location 51 :constvalue 4 [ 99 0 0 0 0 0 0 0 ]}) :location 49}")
> >  (16385,"1 2","{OPEXPR :opno 97 :opfuncid 66 :opresulttype 16
> > :opretset false :opcollid 0 :inputcollid 0 :args ({VAR :varno 1
> > :varattno 2 :vartype 23 :vartypmod -1 :v
> > arcollid 0 :varlevelsup 0 :varnosyn 1 :varattnosyn 2 :location 49}
> > {CONST :consttype 23 :consttypmod -1 :constcollid 0 :constlen 4
> > :constbyval true :constisnull false
> >  :location 53 :constvalue 4 [ 33 0 0 0 0 0 0 0 ]}) :location 51}")
> > (2 rows)
>
> I think this is because the usage of SELECT statement. The order seems depend
> on pg_publication. Such as:
>
> postgres=# SELECT array_agg(p.pubname) FROM pg_publication p WHERE pubname IN ('pub1','pub2');
>   array_agg
> -------------
>  {pub1,pub2}
> (1 row)
>
> postgres=# SELECT array_agg(p.pubname) FROM pg_publication p WHERE pubname IN ('pub2','pub1');
>   array_agg
> -------------
>  {pub1,pub2}
> (1 row)
>

Right, so I felt it was a bit dubious that the result of the function
“seems to depend on” something. That’s why I was asking why the list
elements did not include a pubid. Then a caller could be certain what
element belonged with what publication. It's not quite clear to me why
that is not important for this patch - but anyway, even if it's not
necessary for this patch's usage, this is a function that is exposed
to users who might have different needs/expectations than this patch
has, so shouldn't the result be less fuzzy for them?

------
Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: fix stats_fetch_consistency value in postgresql.conf.sample
Next
From: Peter Smith
Date:
Subject: Re: Data is copied twice when specifying both child and parent table in publication