Re: bogus: logical replication rows/cols combinations - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: bogus: logical replication rows/cols combinations
Date
Msg-id CAA4eK1+9LCX3qKyo9S4uaaciM0WqZrP6m=kMzufgpuv=a=PgXQ@mail.gmail.com
Whole thread Raw
In response to Re: bogus: logical replication rows/cols combinations  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: bogus: logical replication rows/cols combinations  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Re: bogus: logical replication rows/cols combinations  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
List pgsql-hackers
On Wed, Apr 27, 2022 at 3:13 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2022-Apr-26, Tomas Vondra wrote:
>
> > I'm not quite sure which of the two behaviors is more "desirable". In a
> > way, it's somewhat similar to publish_as_relid, which is also calculated
> > not considering which of the row filters match?
>
> I grepped doc/src/sgml for `publish_as_relid` and found no hits, so
> I suppose it's not a user-visible feature as such.
>

`publish_as_relid` is computed based on 'publish_via_partition_root'
setting of publication which is a user-visible feature.

> > But maybe you're right and it should behave the way you propose ... the
> > example I have in mind is a use case replicating table with two types of
> > rows - sensitive and non-sensitive. For sensitive, we replicate only
> > some of the columns, for non-sensitive we replicate everything.
>
> Exactly.  If we blindly publish row/column values that aren't in *any*
> publications, this may lead to leaking protected values.
>
> > Changing this to behave the way you expect would be quite difficult,
> > because at the moment we build a single OR expression from all the row
> > filters. We'd have to keep the individual expressions, so that we can
> > build a column list for each of them (in order to ignore those that
> > don't match).
>
> I think we should do that, yeah.
>

This can hit the performance as we need to evaluate each expression
for each row.

> > I can take a stab at it, but it seems strange to not apply the same
> > logic to evaluation of publish_as_relid. I wonder what Amit thinks about
> > this, as he wrote the row filter stuff.
>
> By grepping publicationcmds.c, it seems that publish_as_relid refers to
> the ancestor partitioned table that is used for column list and
> rowfilter determination, when a partition is being published as part of
> it.
>

Yeah, this is true when the corresponding publication has set
'publish_via_partition_root' as true.

>  I don't think these things are exactly parallel.
>

Currently, when the subscription has multiple publications, we combine
the objects, and actions of those publications. It happens for
'publish_via_partition_root', publication actions, tables, column
lists, or row filters. I think the whole design works on this idea
even the initial table sync. I think it might need a major change
(which I am not sure about at this stage) if we want to make the
initial sync also behave similar to what you are proposing.

I feel it would be much easier to create two different subscriptions
as mentioned by Hou-San [1] for the case you are talking about if the
user really needs something like that.

> ... In fact I think they are quite orthogonal: probably you should be
> able to publish a partitioned table in two publications, with different
> rowfilters and different column lists (which can come from the
> topmost partitioned table), and each partition should still work in the
> way I describe above.
>

We consider the column lists or row filters for either the partition
(on which the current operation is performed) or partitioned table
based on 'publish_via_partition_root' parameter of publication.

[1] -
https://www.postgresql.org/message-id/OS0PR01MB5716B82315A067F1D78F247E94FA9%40OS0PR01MB5716.jpnprd01.prod.outlook.com

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: pgsql: Add contrib/pg_walinspect.
Next
From: Andrey Borodin
Date:
Subject: Re: Wrong rows count in EXPLAIN