Re: row filtering for logical replication - Mailing list pgsql-hackers

From Peter Smith
Subject Re: row filtering for logical replication
Date
Msg-id CAHut+Pt6+=w7_r=CHBCS+yZXk5V+tnrzHLi3b2ZOVP1LHL2W9w@mail.gmail.com
Whole thread Raw
In response to Re: row filtering for logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: row filtering for logical replication
List pgsql-hackers
I have attached a POC row-filter validation patch implemented using a
parse-tree 'walker' function.

PSA the incremental patch v28-0003.

v28-0001 --> v28-0001 (same as before - base patch)
v28-0002 --> v28-0002 (same as before - replica identity validation patch)
                     v28-0003 (NEW POC PATCH using "walker" validation)

~~

This kind of 'walker' validation has been proposed/recommended already
several times up-thread. [1][2][3].

For this POC patch, I have removed all the existing
EXPR_KIND_PUBLICATION_WHERE parser errors. I am not 100% sure this is
the best idea (see below), but for now, the parser errors are
temporarily #if 0 in the code. I will clean up this patch and re-post
later when there is some feedback/consensus on how to proceed.

~

1. PROS

1.1 Using a 'walker' validator allows the row filter expression
validation to be 'opt-in' instead of 'opt-out' checking logic. This
may be considered *safer* because now we can have a very
controlled/restricted set of allowed nodes - e.g. only allow simple
(Var op Const) expressions. This eliminates the risk that some
unforeseen dangerous loophole could be exploited.

1.2 It is convenient to have all the row-filter validation errors in
one place, instead of being scattered across the parser code based on
EXPR_KIND_PUBLICATION_WHERE. Indeed, there seems some confusion
already caused by the existing scattering of row-filter validation
(patch 0001). For example, I found some of the new "aggregate
functions are not allowed" errors are not even reachable because they
are shielded by the earlier "functions are not allowed" error.

2. CONS

2.1 Error messages thrown from the parser can include the character
location of the problem. Actually, this is also possible using the
'walker' (I have done it locally) but it requires passing the
ParseState into the walker code - something I thought seemed a bit
unusual, so I did not include that in this 0003 POC patch.

~~

Perhaps a hybrid validation is preferred. e.g. retain some/all of the
parser validation errors from the 0001 patch, but also keep the walker
validation as a 'catch-all' to trap anything unforeseen that may slip
through the parsing. Or perhaps this 'walker' validator is fine as the
only validator and all the current parser errors for
EXPR_KIND_PUBLICATION_WHERE can just be permanently removed.

I am not sure what is the best approach, so I am hoping for some
feedback and/or review comments.

------
[1] https://www.postgresql.org/message-id/33c033f7-be44-e241-5fdf-da1b328c288d%40enterprisedb.com
[2] https://www.postgresql.org/message-id/CAA4eK1Jumuio6jZK8AVQd6z7gpDsZydQhK6d%3DMUARxk3nS7%2BPw%40mail.gmail.com
[3] https://www.postgresql.org/message-id/CAA4eK1JL2q%2BHENgiCf1HLRU7nD9jCcttB9sEqV1tech4mMv_0A%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: postgres.h included from relcache.h - but removing it breaks pg_upgrade
Next
From: Amit Kapila
Date:
Subject: Re: PG Docs - CREATE SUBSCRIPTION option list order