RE: bogus: logical replication rows/cols combinations - Mailing list pgsql-hackers

From houzj.fnst@fujitsu.com
Subject RE: bogus: logical replication rows/cols combinations
Date
Msg-id OS0PR01MB5716B82315A067F1D78F247E94FA9@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: bogus: logical replication rows/cols combinations  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wednesday, April 27, 2022 12:56 PM From: Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Tue, Apr 26, 2022 at 4:00 AM Tomas Vondra
> <tomas.vondra@enterprisedb.com> wrote:
> >
> > On 4/25/22 17:48, Alvaro Herrera wrote:
> >
> > > The desired result on subscriber is:
> > >
> > > table uno;
> > >  a  │ b │ c
> > > ────┼───┼───
> > >   1 │ 2 │
> > >  -1 │   │ 4
> > >
> > >
> > > Thoughts?
> > >
> >
> > I'm not quite sure which of the two behaviors is more "desirable". In a
> > way, it's somewhat similar to publish_as_relid, which is also calculated
> > not considering which of the row filters match?
> >
> 
> Right, or in other words, we check all publications to decide it and
> similar is the case for publication actions which are also computed
> independently for all publications.
> 
> > But maybe you're right and it should behave the way you propose ... the
> > example I have in mind is a use case replicating table with two types of
> > rows - sensitive and non-sensitive. For sensitive, we replicate only
> > some of the columns, for non-sensitive we replicate everything. Which
> > could be implemented as two publications
> >
> > create publication sensitive_rows
> >    for table t (a, b) where (is_sensitive);
> >
> > create publication non_sensitive_rows
> >    for table t where (not is_sensitive);
> >
> > But the way it's implemented now, we'll always replicate all columns,
> > because the second publication has no column list.
> >
> > Changing this to behave the way you expect would be quite difficult,
> > because at the moment we build a single OR expression from all the row
> > filters. We'd have to keep the individual expressions, so that we can
> > build a column list for each of them (in order to ignore those that
> > don't match).
> >
> > We'd have to remove various other optimizations - for example we can't
> > just discard row filters if we found "no_filter" publication.
> >
> 
> I don't think that is the right way. We need some way to combine
> expressions and I feel the current behavior is sane. I mean to say
> that even if there is one publication that has no filter (column/row),
> we should publish all rows with all columns. Now, as mentioned above
> combining row filters or column lists for all publications appears to
> be consistent with what we already do and seems correct behavior to
> me.
> 
> To me, it appears that the method used to decide whether a particular
> table is published or not is also similar to what we do for row
> filters or column lists. Even if there is one publication that
> publishes all tables, we consider the current table to be published
> irrespective of whether other publications have published that table
> or not.
> 
> > Or more
> > precisely, we'd have to consider column lists too.
> >
> > In other words, we'd have to merge pgoutput_column_list_init into
> > pgoutput_row_filter_init, and then modify pgoutput_row_filter to
> > evaluate the row filters one by one, and build the column list.
> >
> 
> Hmm, I think even if we want to do something here, we also need to
> think about how to achieve similar behavior for initial tablesync
> which will be more tricky.

I think it could be difficult to make the initial tablesync behave the same.
Currently, we make a "COPY" command to do the table sync, I am not sure
how to change the "COPY" query to achieve the expected behavior here.

BTW, For the use case mentioned here:
"""
replicating table with two types of
rows - sensitive and non-sensitive. For sensitive, we replicate only
some of the columns, for non-sensitive we replicate everything.
""" 

One approach to do this is to create two subscriptions and two
publications which seems a workaround.
-----
create publication uno for table uno (a, b) where (a > 0);
create publication dos for table uno (a, c) where (a < 0);

create subscription sub_uno connection 'port=55432 dbname=alvherre' publication uno;
create subscription sub_dos connection 'port=55432 dbname=alvherre' publication dos;
-----

Best regards,
Hou zj

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: BUG #17448: In Windows 10, version 1703 and later, huge_pages doesn't work.
Next
From: Пантюшин Александр Иванович
Date:
Subject: Re: Wrong rows count in EXPLAIN