Re: row filtering for logical replication - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: row filtering for logical replication |
Date | |
Msg-id | 4d9e0142-0e80-daee-5236-154cd785828c@enterprisedb.com Whole thread Raw |
In response to | Re: row filtering for logical replication (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: row filtering for logical replication
|
List | pgsql-hackers |
On 7/13/21 12:57 PM, Amit Kapila wrote: > On Tue, Jul 13, 2021 at 10:24 AM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Mon, Jul 12, 2021 at 3:01 PM Tomas Vondra >> <tomas.vondra@enterprisedb.com> wrote: >> >>> In terms of implementation, I think there are two basic options - either >>> we can define a new "expression" type in gram.y, which would be a subset >>> of a_expr etc. Or we can do it as some sort of expression walker, kinda >>> like what the transform* functions do now. >>> >> >> I think it is better to use some form of walker here rather than >> extending the grammar for this. However, the question is do we need >> some special kind of expression walker here or can we handle all >> required cases via transformWhereClause() call as the patch is trying >> to do. AFAIU, the main things we want to prohibit in the filter are: >> (a) it doesn't refer to any relation other than catalog in where >> clause, (b) it doesn't use UDFs in any way (in expressions, in >> user-defined operators, user-defined types, etc.), (c) the columns >> referred to in the filter should be part of PK or Replica Identity. >> Now, if all such things can be detected by the approach patch has >> taken then why do we need a special kind of expression walker? OTOH, >> if we can't detect some of this then probably we can use a special >> walker. >> >> I think in the long run one idea to allow UDFs is probably by >> explicitly allowing users to specify whether the function is >> publication predicate safe and if so, then we can allow such functions >> in the filter clause. >> > > Another idea here could be to read the publication-related catalog > with the latest snapshot instead of a historic snapshot. If we do that > then if the user faces problems as described by Petr [1] due to > missing dependencies via UDFs then she can Alter the Publication to > remove/change the filter clause and after that, we would be able to > recognize the updated filter clause and the system will be able to > move forward. > > I might be missing something but reading publication catalogs with > non-historic snapshots shouldn't create problems as we use the > historic snapshots are required to decode WAL. > IMHO the best option for v1 is to just restrict the filters to known-safe expressions. That is, just built-in operators, no UDFs etc. Yes, it's not great, but both alternative proposals (allowing UDFs or using current snapshot) are problematic for various reasons. Even with those restrictions the row filtering seems quite useful, and we can relax those restrictions later if we find acceptable compromise and/or decide it's worth the risk. Seems better than having to introduce new restrictions later. > I think the problem described by Petr[1] is also possible today if the > user drops the publication and there is a corresponding subscription, > basically, the system will stuck with error: "ERROR: publication > "mypub" does not exist. I think allowing to use non-historic snapshots > just for publications will resolve that problem as well. > > [1] - https://www.postgresql.org/message-id/92e5587d-28b8-5849-2374-5ca3863256f1%402ndquadrant.com > That seems like a completely different problem, TBH. For example the slot is dropped too, which means the WAL is likely gone etc. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: