RE: row filtering for logical replication - Mailing list pgsql-hackers

From houzj.fnst@fujitsu.com
Subject RE: row filtering for logical replication
Date
Msg-id OS0PR01MB57166F44723AB76866F74A0694679@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: row filtering for logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Mon, Nov 29, 2021 6:11 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Mon, Nov 29, 2021 at 12:10 PM Greg Nancarrow <gregn4422@gmail.com>
> wrote:
> >
> > On Fri, Nov 26, 2021 at 12:40 AM houzj.fnst@fujitsu.com
> > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > When researching and writing a top-up patch about this.
> > > I found a possible issue which I'd like to confirm first.
> > >
> > > It's possible the table is published in two publications A and B,
> > > publication A only publish "insert" , publication B publish
> > > "update". When UPDATE, both row filter in A and B will be executed. Is this
> behavior expected?
> > >
> > > For example:
> > > ---- Publication
> > > create table tbl1 (a int primary key, b int); create publication A
> > > for table tbl1 where (b<2) with(publish='insert'); create
> > > publication B for table tbl1 where (a>1) with(publish='update');
> > >
> > > ---- Subscription
> > > create table tbl1 (a int primary key); CREATE SUBSCRIPTION sub
> > > CONNECTION 'dbname=postgres host=localhost port=10000'
> PUBLICATION
> > > A,B;
> > >
> > > ---- Publication
> > > update tbl1 set a = 2;
> > >
> > > The publication can be created, and when UPDATE, the rowfilter in A
> > > (b<2) will also been executed but the column in it is not part of replica
> identity.
> > > (I am not against this behavior just confirm)
> > >
> >
> > There seems to be problems related to allowing the row filter to
> > include columns that are not part of the replica identity (in the case
> > of publish=insert).
> > In your example scenario, the tbl1 WHERE clause "(b < 2)" for
> > publication A, that publishes inserts only, causes a problem, because
> > column "b" is not part of the replica identity.
> > To see this, follow the simple example below:
> > (and note, for the Subscription, the provided tbl1 definition has an
> > error, it should also include the 2nd column "b int", same as in the
> > publisher)
> >
> > ---- Publisher:
> > INSERT INTO tbl1 VALUES (1,1);
> > UPDATE tbl1 SET a = 2;
> >
> > Prior to the UPDATE above:
> > On pub side, tbl1 contains (1,1).
> > On sub side, tbl1 contains (1,1)
> >
> > After the above UPDATE:
> > On pub side, tbl1 contains (2,1).
> > On sub side, tbl1 contains (1,1), (2,1)
> >
> > So the UPDATE on the pub side has resulted in an INSERT of (2,1) on
> > the sub side.
> >
> > This is because when (1,1) is UPDATEd to (2,1), it attempts to use the
> > "insert" filter "(b<2)" to determine whether the old value had been
> > inserted (published to subscriber), but finds there is no "b" value
> > (because it only uses RI cols for UPDATE) and so has to assume the old
> > tuple doesn't exist on the subscriber, hence the UPDATE ends up doing
> > an INSERT.
> > INow if the use of RI cols were enforced for the insert filter case,
> > we'd properly know the answer as to whether the old row value had been
> > published and it would have correctly performed an UPDATE instead of
> > an INSERT in this case.
> >
> 
> I don't think it is a good idea to combine the row-filter from the publication
> that publishes just 'insert' with the row-filter that publishes 'updates'. We
> shouldn't apply the 'insert' filter for 'update' and similarly for publication
> operations. We can combine the filters when the published operations are the
> same. So, this means that we might need to cache multiple row-filters but I
> think that is better than having another restriction that publish operation
> 'insert'
> should also honor RI columns restriction.

Personally, I agreed that an UPDATE operation should only apply a row filter that
is part of a publication that has only UPDATE.

Best regards,
Hou zj

pgsql-hackers by date:

Previous
From: Greg Nancarrow
Date:
Subject: Re: Correct handling of blank/commented lines in PSQL interactive-mode history
Next
From: Michael Paquier
Date:
Subject: Re: Fix typos