Re: row filtering for logical replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: row filtering for logical replication
Date
Msg-id CAA4eK1LjrEFOrpZ7t_uCp4-qGA3VyNaL2h_+wY26=3uEyE66vg@mail.gmail.com
Whole thread Raw
In response to Re: row filtering for logical replication  (Dilip Kumar <dilipbalaut@gmail.com>)
Responses Re: row filtering for logical replication
List pgsql-hackers
On Tue, Sep 21, 2021 at 9:54 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Sep 21, 2021 at 8:58 AM Ajin Cherian <itsajin@gmail.com> wrote:
> > > I understand why this is done, but I have 2 concerns here 1) We are
> > > having extra deform and copying the field from new to old in case it
> > > is unchanged replica identity.  2) The same unchanged attribute values
> > > get qualified in the old tuple as well as in the new tuple.  What
> > > exactly needs to be done is that the only updated field should be
> > > validated as part of the old as well as the new tuple, the unchanged
> > > field does not make sense to have redundant validation.   For that we
> > > will have to change the filter for the old tuple to just validate the
> > > attributes which are actually modified and remaining unchanged and new
> > > values will anyway get validated in the new tuple.
> > >
> > But what if the filter expression depends on multiple columns, say (a+b) > 100
> > where a is unchanged while b is changed. Then we will still need both
> > columns for applying
>
> In such a case, we need to.
>
> > the filter even though one is unchanged. Also, I am not aware of any
> > mechanism by which
> > we can apply a filter expression on individual attributes. The current
> > mechanism does it
> > on a tuple. Do let me know if you have any ideas there?
>
> What I suggested is to modify the filter for the old tuple, e.g.
> filter is (a > 10 and b < 20 and c+d = 20), now only if a and c are
> modified then we can process the expression and we can transform this
> filter to (a > 10 and c+d=20).
>

If you have only a and c in the old tuple, how will it evaluate
expression c + d? I think the point is if for some expression some
values are in old tuple and others are in new then the idea proposed
in the patch seems sane. Moreover, I think in your idea for each tuple
we might need to build a new expression and sometimes twice that will
beat the purpose of cache we have kept in the patch and I am not sure
if it is less costly.

See another example where splitting filter might not give desired results:

Say filter expression: (a = 10 and b = 20 and c = 30)

Now, old_tuple has values for columns a and c and say values are 10
and 30. So, the old_tuple will match the filter if we split it as per
your suggestion. Now say new_tuple has values (a = 5, b = 15, c = 25).
In such a situation dividing the filter will give us the result that
the old_tuple is matching but new tuple is not matching which seems
incorrect. I think dividing filter conditions among old and new tuples
might not retain its sanctity.

> >
> > Even if it were done, there would still be the overhead of deforming the tuple.
>
> Suppose filter is just (a > 10 and b < 20) and only if the a is
> updated, and if we are able to modify the filter for the oldtuple to
> be just (a>10) then also do we need to deform?
>

Without deforming, how will you determine which columns are part of
the old tuple?

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Amul Sul
Date:
Subject: Re: Deduplicate code updating ControleFile's DBState.
Next
From: Amit Kapila
Date:
Subject: Re: relation OID in ReorderBufferToastReplace error message