Re: row filtering for logical replication - Mailing list pgsql-hackers

From Alvaro Herrera
Subject Re: row filtering for logical replication
Date
Msg-id 202107132106.wvjgvjgcyezo@alvherre.pgsql
Whole thread Raw
In response to Re: row filtering for logical replication  ("Euler Taveira" <euler@eulerto.com>)
Responses Re: row filtering for logical replication  ("Euler Taveira" <euler@eulerto.com>)
List pgsql-hackers
On 2021-Jul-13, Euler Taveira wrote:

> +  <para>
> +   The <literal>WHERE</literal> clause should contain only columns that are
> +   part of the primary key or be covered  by <literal>REPLICA
> +   IDENTITY</literal> otherwise, <command>DELETE</command> operations will not
> +   be replicated. That's because old row is used and it only contains primary
> +   key or columns that are part of the <literal>REPLICA IDENTITY</literal>; the
> +   remaining columns are <literal>NULL</literal>. For <command>INSERT</command>
> +   and <command>UPDATE</command> operations, any column might be used in the
> +   <literal>WHERE</literal> clause. New row is used and it contains all
> +   columns. A <literal>NULL</literal> value causes the expression to evaluate
> +   to false; avoid using columns without not-null constraints in the
> +   <literal>WHERE</literal> clause. The <literal>WHERE</literal> clause does
> +   not allow functions and user-defined operators.
> +  </para>

There's a couple of points in this paragraph ..

1. if you use REPLICA IDENTITY FULL, then the expressions would work
even if they use any other column with DELETE.  Maybe it would be
reasonable to test for this in the code and raise an error if the
expression requires a column that's not part of the replica identity.
(But that could be relaxed if the publication does not publish
updates/deletes.)

2. For UPDATE, does the expression apply to the old tuple or to the new
tuple?  You say it's the new tuple, but from the user point of view I
think it would make more sense that it would apply to the old tuple.
(Of course, if you're thinking that the R.I. is the PK and the PK is
never changed, then you don't really care which one it is, but I bet
that some people would not like that assumption.)

I think it is sensible that it's the old tuple that is matched, not the
new; consider what happens if you change the PK in the update and the
replica already has that tuple.  If you match on the new tuple and it
doesn't match the expression (so you filter out the update), but the old
tuple does match the expression, then the replica will retain the
mismatching tuple forever.

3. You say that a NULL value in any of those columns causes the
expression to become false and thus the tuple is not published.  This
seems pretty unfriendly, but maybe it would be useful to have examples
of the behavior.  Does ExecInitCheck() handle things in the other way,
and if so does using a similar trick give more useful behavior?

<para>
 The WHERE clause may only contain references to columns that are part
 of the table's replica identity.
 If <>DELETE</> or <>UPDATE</> operations are published, this
 restriction can be bypassed by making the replica identity be the whole
 row with <command>ALTER TABLE .. SET REPLICA IDENTITY FULL</command>.
 The <literal>WHERE</literal> clause does not allow functions or
 user-defined operators.
</para>

-- 
Álvaro Herrera              Valdivia, Chile  —  https://www.EnterpriseDB.com/
"The Gord often wonders why people threaten never to come back after they've
been told never to return" (www.actsofgord.com)



pgsql-hackers by date:

Previous
From: "Euler Taveira"
Date:
Subject: Re: row filtering for logical replication
Next
From: "r.takahashi_2@fujitsu.com"
Date:
Subject: RE: Transactions involving multiple postgres foreign servers, take 2