Re: bogus: logical replication rows/cols combinations - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: bogus: logical replication rows/cols combinations
Date
Msg-id CAA4eK1KsRLSEU-0Spny7TyEyhnvq8sCHqC9wGu_DeUvNT3BktA@mail.gmail.com
Whole thread Raw
In response to Re: bogus: logical replication rows/cols combinations  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: bogus: logical replication rows/cols combinations  (Michael Paquier <michael@paquier.xyz>)
Re: bogus: logical replication rows/cols combinations  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
RE: bogus: logical replication rows/cols combinations  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Tue, Apr 26, 2022 at 4:00 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 4/25/22 17:48, Alvaro Herrera wrote:
>
> > The desired result on subscriber is:
> >
> > table uno;
> >  a  │ b │ c
> > ────┼───┼───
> >   1 │ 2 │
> >  -1 │   │ 4
> >
> >
> > Thoughts?
> >
>
> I'm not quite sure which of the two behaviors is more "desirable". In a
> way, it's somewhat similar to publish_as_relid, which is also calculated
> not considering which of the row filters match?
>

Right, or in other words, we check all publications to decide it and
similar is the case for publication actions which are also computed
independently for all publications.

> But maybe you're right and it should behave the way you propose ... the
> example I have in mind is a use case replicating table with two types of
> rows - sensitive and non-sensitive. For sensitive, we replicate only
> some of the columns, for non-sensitive we replicate everything. Which
> could be implemented as two publications
>
> create publication sensitive_rows
>    for table t (a, b) where (is_sensitive);
>
> create publication non_sensitive_rows
>    for table t where (not is_sensitive);
>
> But the way it's implemented now, we'll always replicate all columns,
> because the second publication has no column list.
>
> Changing this to behave the way you expect would be quite difficult,
> because at the moment we build a single OR expression from all the row
> filters. We'd have to keep the individual expressions, so that we can
> build a column list for each of them (in order to ignore those that
> don't match).
>
> We'd have to remove various other optimizations - for example we can't
> just discard row filters if we found "no_filter" publication.
>

I don't think that is the right way. We need some way to combine
expressions and I feel the current behavior is sane. I mean to say
that even if there is one publication that has no filter (column/row),
we should publish all rows with all columns. Now, as mentioned above
combining row filters or column lists for all publications appears to
be consistent with what we already do and seems correct behavior to
me.

To me, it appears that the method used to decide whether a particular
table is published or not is also similar to what we do for row
filters or column lists. Even if there is one publication that
publishes all tables, we consider the current table to be published
irrespective of whether other publications have published that table
or not.

> Or more
> precisely, we'd have to consider column lists too.
>
> In other words, we'd have to merge pgoutput_column_list_init into
> pgoutput_row_filter_init, and then modify pgoutput_row_filter to
> evaluate the row filters one by one, and build the column list.
>

Hmm, I think even if we want to do something here, we also need to
think about how to achieve similar behavior for initial tablesync
which will be more tricky.

> I can take a stab at it, but it seems strange to not apply the same
> logic to evaluation of publish_as_relid.
>

Yeah, the current behavior seems to be consistent with what we already do.

> I wonder what Amit thinks about
> this, as he wrote the row filter stuff.
>

I feel we can explain a bit more about this in docs. We already have
some explanation of how row filters are combined [1]. We can probably
add a few examples for column lists.

[1] -
https://www.postgresql.org/docs/devel/logical-replication-row-filter.html#LOGICAL-REPLICATION-ROW-FILTER-COMBINING

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: "Rui Zhao"
Date:
Subject: Re:Possible corruption by CreateRestartPoint at promotion
Next
From: Jian He
Date:
Subject: Fwd: range of composite types!