Home > mailing lists

Re: row filtering for logical replication - Mailing list pgsql-hackers

From	Euler Taveira
Subject	Re: row filtering for logical replication
Date	July 14, 2021 03:57:44
Msg-id	48e13621-10f7-49ef-bcbe-ed530195efd2@www.fastmail.com Whole thread Raw
In response to	Re: row filtering for logical replication (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses	Re: row filtering for logical replication (Amit Kapila <amit.kapila16@gmail.com>)
List	pgsql-hackers

Tree view

On Tue, Jul 13, 2021, at 6:06 PM, Alvaro Herrera wrote:

1. if you use REPLICA IDENTITY FULL, then the expressions would work
even if they use any other column with DELETE. Maybe it would be
reasonable to test for this in the code and raise an error if the
expression requires a column that's not part of the replica identity.
(But that could be relaxed if the publication does not publish
updates/deletes.)

I thought about it but came to the conclusion that it doesn't worth it. Even

with REPLICA IDENTITY FULL expression evaluates to false if the column allows

NULL values. Besides that REPLICA IDENTITY is changed via another DDL (ALTER

TABLE) and you have to make sure you don't allow changing REPLICA IDENTITY

because some row filter uses the column you want to remove from it.

2. For UPDATE, does the expression apply to the old tuple or to the new
tuple? You say it's the new tuple, but from the user point of view I
think it would make more sense that it would apply to the old tuple.
(Of course, if you're thinking that the R.I. is the PK and the PK is
never changed, then you don't really care which one it is, but I bet
that some people would not like that assumption.)

New tuple. The main reason is that new tuple is always there for UPDATEs.

Hence, row filter might succeed even if the row filter contains a column that

is not part of PK or REPLICA IDENTITY. pglogical also chooses to use new tuple

when it is available (e.g. for INSERT and UPDATE). If you don't like this

approach we can (a) create a new publication option to choose between old tuple

and new tuple for UPDATEs or (b) qualify columns using a special reference

(such as NEW.id or OLD.foo). Both options can provide flexibility but (a) is

simpler.

I think it is sensible that it's the old tuple that is matched, not the
new; consider what happens if you change the PK in the update and the
replica already has that tuple. If you match on the new tuple and it
doesn't match the expression (so you filter out the update), but the old
tuple does match the expression, then the replica will retain the
mismatching tuple forever.

3. You say that a NULL value in any of those columns causes the
expression to become false and thus the tuple is not published. This
seems pretty unfriendly, but maybe it would be useful to have examples
of the behavior. Does ExecInitCheck() handle things in the other way,
and if so does using a similar trick give more useful behavior?

ExecInitCheck() is designed for CHECK constraints and SQL standard requires

taht NULL constraint conditions are not treated as errors. This feature uses a

WHERE clause and behaves like it. I mean, a NULL result does not return the

row. See ExecQual().

Euler Taveira

EDB https://www.enterprisedb.com/

pgsql-hackers by date:

From: David Rowley
Date: 14 July 2021, 03:46:44
Subject: Re: enable_resultcache confusion

From: David Rowley
Date: 14 July 2021, 04:15:27
Subject: Re: Add proper planner support for ORDER BY / DISTINCT aggregates

Re: row filtering for logical replication - Mailing list pgsql-hackers

Previous

Next