Re: row filtering for logical replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: row filtering for logical replication
Date
Msg-id CAA4eK1JL2q+HENgiCf1HLRU7nD9jCcttB9sEqV1tech4mMv_0A@mail.gmail.com
Whole thread Raw
In response to Re: row filtering for logical replication  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: row filtering for logical replication
Re: row filtering for logical replication
List pgsql-hackers
On Mon, Jul 12, 2021 at 3:01 PM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 7/12/21 6:46 AM, Amit Kapila wrote:
> > On Mon, Jul 12, 2021 at 7:19 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> > Now, the other idea I had in mind was to traverse the WHERE clause
> > expression in publication_add_relation and identify if it contains
> > anything other than the ANDed list of 'foo.bar op constant'
> > expressions. OTOH, for index where clause expressions or policy check
> > expressions, we use a technique similar to what we have in the patch
> > to prohibit certain kinds of expressions.
> >
> > Do you have any preference on how this should be addressed?
> >
>
> I don't think this is sufficient, because who knows where "op" comes
> from? It might be from an extension, in which case the problem pointed
> out by Petr Jelinek [1] would apply. OTOH I suppose we could allow
> expressions like (Var op Var), i.e. "a < b" or something like that. And
> then why not allow (a+b < c-10) and similar "more complex" expressions,
> as long as all the operators are built-in?
>

Yeah, and the patch already disallows the user-defined operators in
filters. I think ideally if the operator doesn't refer to UDFs, we can
allow to directly use such an OP in the filter as we can add a
dependency for the same.

> In terms of implementation, I think there are two basic options - either
> we can define a new "expression" type in gram.y, which would be a subset
> of a_expr etc. Or we can do it as some sort of expression walker, kinda
> like what the transform* functions do now.
>

I think it is better to use some form of walker here rather than
extending the grammar for this. However, the question is do we need
some special kind of expression walker here or can we handle all
required cases via transformWhereClause() call as the patch is trying
to do. AFAIU, the main things we want to prohibit in the filter are:
(a) it doesn't refer to any relation other than catalog in where
clause, (b) it doesn't use UDFs in any way (in expressions, in
user-defined operators, user-defined types, etc.), (c) the columns
referred to in the filter should be part of PK or Replica Identity.
Now, if all such things can be detected by the approach patch has
taken then why do we need a special kind of expression walker? OTOH,
if we can't detect some of this then probably we can use a special
walker.

I think in the long run one idea to allow UDFs is probably by
explicitly allowing users to specify whether the function is
publication predicate safe and if so, then we can allow such functions
in the filter clause.

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Add proper planner support for ORDER BY / DISTINCT aggregates
Next
From: Tatsuo Ishii
Date:
Subject: Re: [HACKERS] WIP aPatch: Pgbench Serialization and deadlock errors