Re: row filtering for logical replication - Mailing list pgsql-hackers

From Peter Smith
Subject Re: row filtering for logical replication
Date
Msg-id CAHut+Pvn+S8upyrkBi3m=rc4b5Ttw3xGsg09rNec=qJK0pD5mQ@mail.gmail.com
Whole thread Raw
In response to Re: row filtering for logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: row filtering for logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Wed, Aug 25, 2021 at 3:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
...
>
> Hmm, I think the gain via caching is not visible because we are using
> simple expressions. It will be visible when we use somewhat complex
> expressions where expression evaluation cost is significant.
> Similarly, the impact of this change will magnify and it will also be
> visible when a publication has many tables. Apart from performance,
> this change is logically correct as well because it would be any way
> better if we don't invalidate the cached expressions unless required.

Please tell me what is your idea of a "complex" row filter expression.
Do you just mean a filter that has multiple AND conditions in it? I
don't really know if few complex expressions would amount to any
significant evaluation costs, so I would like to run some timing tests
with some real examples to see the results.

On Wed, Aug 25, 2021 at 6:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Aug 25, 2021 at 10:57 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Aug 25, 2021 at 5:52 AM Euler Taveira <euler@eulerto.com> wrote:
> > >
> > > On Tue, Aug 24, 2021, at 4:46 AM, Peter Smith wrote:
>
> > >
> > > Anyway, I have implemented the suggested cache change because I agree
> > > it is probably theoretically superior, even if in practice there is
> > > almost no difference.
> > >
> > > I didn't inspect your patch carefully but it seems you add another List to
> > > control this new cache mechanism. I don't like it. IMO if we can use the data
> > > structures that we have now, let's implement your idea; otherwise, -1 for this
> > > new micro optimization.
> > >
> >
> > As mentioned above, without this we will invalidate many cached
> > expressions even though it is not required. I don't deny that there
> > might be a better way to achieve the same and if you or Peter have any
> > ideas, I am all ears.
> >
>
> I see that the new list is added to store row_filter node which we
> later use to compute expression. This is not required for invalidation
> but for delaying the expression evaluation till it is required (for
> example, for truncate, we may not need the row evaluation, so there is
> no need to compute it). Can we try to postpone the syscache lookup to
> a later stage when we are actually doing row_filtering? If we can do
> that, then I think we can avoid having this extra list?

Yes, you are correct - that Node list was re-instated only because you
had requested that the ExprState evaluation should be deferred until
it is needed by the pgoutput_row_filter. Otherwise, the additional
list would not be needed so everything would be much the same as in
v23 except the invalidations would be more focussed on single tables.

I don't think the syscache lookup can be easily postponed. That logic
of get_rel_sync_entry processes the table filters of *all*
publications, so moving that publications loop (including the
partition logic) into the pgoutput_row_filter seems a bridge too far
IMO.

Furthermore, I am not yet convinced that this ExprState postponement
is very useful. It may be true that for truncate there is no need to
compute it, but consider that the user would never even define a row
filter in the first place unless they intended there will be some CRUD
operations. So even if the truncate does not need the filter,
*something* is surely going to need it. In other words, IIUC this
postponement is not going to save any time overall - it only shifting
when the (one time) expression evaluation will happen.

I feel it would be better to just remove the postponed evaluation of
the ExprState added in v24. That will remove any need for the extra
Node list (which I think is Euler's concern). The ExprState cache will
still be slightly improved from how it was implemented before because
it is "logically correct" that we don't invalidate the cached
expressions unless required.

Thoughts?

------
Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Separate out FileSet from SharedFileSet (was Re: pgsql: pgstat: Bring up pgstat in BaseInit() to fix uninitialized use o)
Next
From: Ajin Cherian
Date:
Subject: Re: Failure of subscription tests with topminnow