Re: row filtering for logical replication - Mailing list pgsql-hackers

From Euler Taveira
Subject Re: row filtering for logical replication
Date
Msg-id 48e13621-10f7-49ef-bcbe-ed530195efd2@www.fastmail.com
Whole thread Raw
In response to Re: row filtering for logical replication  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses Re: row filtering for logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
On Tue, Jul 13, 2021, at 6:06 PM, Alvaro Herrera wrote:
1. if you use REPLICA IDENTITY FULL, then the expressions would work
even if they use any other column with DELETE.  Maybe it would be
reasonable to test for this in the code and raise an error if the
expression requires a column that's not part of the replica identity.
(But that could be relaxed if the publication does not publish
updates/deletes.)
I thought about it but came to the conclusion that it doesn't worth it.  Even
with REPLICA IDENTITY FULL expression evaluates to false if the column allows
NULL values. Besides that REPLICA IDENTITY is changed via another DDL (ALTER
TABLE) and you have to make sure you don't allow changing REPLICA IDENTITY
because some row filter uses the column you want to remove from it.

2. For UPDATE, does the expression apply to the old tuple or to the new
tuple?  You say it's the new tuple, but from the user point of view I
think it would make more sense that it would apply to the old tuple.
(Of course, if you're thinking that the R.I. is the PK and the PK is
never changed, then you don't really care which one it is, but I bet
that some people would not like that assumption.)
New tuple. The main reason is that new tuple is always there for UPDATEs.
Hence, row filter might succeed even if the row filter contains a column that
is not part of PK or REPLICA IDENTITY. pglogical also chooses to use new tuple
when it is available (e.g. for INSERT and UPDATE). If you don't like this
approach we can (a) create a new publication option to choose between old tuple
and new tuple for UPDATEs or (b) qualify columns using a special reference
(such as NEW.id or OLD.foo). Both options can provide flexibility but (a) is
simpler.

I think it is sensible that it's the old tuple that is matched, not the
new; consider what happens if you change the PK in the update and the
replica already has that tuple.  If you match on the new tuple and it
doesn't match the expression (so you filter out the update), but the old
tuple does match the expression, then the replica will retain the
mismatching tuple forever.

3. You say that a NULL value in any of those columns causes the
expression to become false and thus the tuple is not published.  This
seems pretty unfriendly, but maybe it would be useful to have examples
of the behavior.  Does ExecInitCheck() handle things in the other way,
and if so does using a similar trick give more useful behavior?
ExecInitCheck() is designed for CHECK constraints and SQL standard requires
taht NULL constraint conditions are not treated as errors. This feature uses a
WHERE clause and behaves like it. I mean, a NULL result does not return the
row. See ExecQual().


--
Euler Taveira

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: enable_resultcache confusion
Next
From: David Rowley
Date:
Subject: Re: Add proper planner support for ORDER BY / DISTINCT aggregates