Re: Column Filtering in Logical Replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Column Filtering in Logical Replication
Date
Msg-id CAA4eK1+FoJ-J7wUG5s8zCtY0iBuN9LcjQcYhV4BD17xhuHfoug@mail.gmail.com
Whole thread Raw
In response to RE: Column Filtering in Logical Replication  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Fri, Dec 17, 2021 at 3:16 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Friday, December 17, 2021 1:55 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> > On 2021-Dec-16, houzj.fnst@fujitsu.com wrote:
> >
> > > The patch ensures all columns of RT are in column list when
> > > CREATE/ALTER publication, but it seems doesn't prevent user from
> > > changing the replica identity or dropping the index used in replica
> > > identity. Do we also need to check those cases ?
> >
> > Yes, we do.  As it happens, I spent a couple of hours yesterday writing code for
> > that, at least partially.  I haven't yet checked what happens with cases like
> > REPLICA NOTHING, or REPLICA INDEX <xyz> and then dropping that index.
> >
> > My initial ideas were a bit wrong BTW: I thought we should check the
> > combination of column lists in all publications (a bitwise-OR of column bitmaps,
> > so to speak).  But conceptually that's wrong: we need to check the column list
> > of each publication individually instead.  Otherwise, if you wanted to hide a
> > column from some publication but that column was part of the replica identity,
> > there'd be no way to identify the tuple in the replica.  (Or, if the pgouput code
> > disobeys the column list and sends the replica identity even if it's not in the
> > column list, then you'd be potentially publishing data that you wanted to hide.)
>
> Thanks for the explanation.
>
> Apart from ALTER REPLICA IDENTITY and DROP INDEX, I think there could be
> some other cases we need to handle for the replica identity check:
>
> 1)
> When adding a partitioned table with column list to the publication, I think we
> need to check the RI of all its leaf partition. Because the RI on the partition
> is the one actually takes effect.
>
> 2)
> ALTER TABLE ADD PRIMARY KEY;
> ALTER TABLE DROP CONSTRAINT "PRIMAEY KEY";
>
> If the replica identity is default, it will use the primary key. we might also
> need to prevent user from adding or removing primary key in this case.
>
>
> Based on the above cases, the RI check seems could bring considerable amount of
> code. So, how about we follow what we already did in CheckCmdReplicaIdentity(),
> we can put the check for RI in that function, so that we can cover all the
> cases and reduce the code change. And if we are worried about the cost of do
> the check for UPDATE and DELETE every time, we can also save the result in the
> relcache. It's safe because every operation change the RI will invalidate the
> relcache. We are using this approach in row filter patch to make sure all
> columns in row filter expression are part of RI.
>

Another point related to RI is that this patch seems to restrict
specifying the RI columns in the column filter list irrespective of
publish action. Do we need to have such a restriction if the
publication publishes 'insert' or 'truncate'?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Column Filtering in Logical Replication
Next
From: Tom Lane
Date:
Subject: Re: sequences vs. synchronous replication