Re: row filtering for logical replication - Mailing list pgsql-hackers

From Euler Taveira
Subject Re: row filtering for logical replication
Date
Msg-id 7e4b5c48-a3d3-4b7e-b455-2d50c60a0ab7@www.fastmail.com
Whole thread Raw
In response to Re: row filtering for logical replication  (Rahila Syed <rahilasyed90@gmail.com>)
Responses Re: row filtering for logical replication  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
List pgsql-hackers
On Thu, Mar 18, 2021, at 7:51 AM, Rahila Syed wrote:
1. 
I think the docs are being incorrectly updated to add a column to pg_partitioned_table
instead of pg_publication_rel.
Good catch.

2.   +typedef struct PublicationRelationQual
 +{
+       Oid                     relid;
+       Relation        relation;
+       Node       *whereClause;
+} PublicationRelationQual;

Can this be given a more generic name like PublicationRelationInfo, so that the same struct 
can be used to store additional relation information in future, for ex. column names, if column filtering is introduced.
Good idea. I rename it and it'll be in this next patch set.

3. Also, in the above structure, it seems that we can do with storing just relid and derive relation information from it
using table_open when needed. Am I missing something?
We need the Relation. See OpenTableList(). The way this code is organized, it
opens all publication tables and append each Relation to a list. This list is
used in PublicationAddTables() to update the catalog. I tried to minimize the
number of refactors while introducing this feature. We could probably revise
this code in the future (someone said in a previous discussion that it is weird
to open relations in one source code file -- publicationcmds.c -- and use it
into another one -- pg_publication.c).

4.  Currently in logical replication, I noticed that an UPDATE is being applied on the subscriber even if the column values
 are unchanged. Can row-filtering feature be used to change it such that, when all the OLD.columns = NEW.columns, filter out 
the row from being sent to the subscriber. I understand this would need REPLICA IDENTITY FULL to work, but would be an
improvement from the existing state.
This is how Postgres works.

postgres=# create table foo (a integer, b integer);
CREATE TABLE
postgres=# insert into foo values(1, 100);
INSERT 0 1
postgres=# select ctid, xmin, xmax, a, b from foo;
ctid  |  xmin  | xmax | a |  b 
-------+--------+------+---+-----
(0,1) | 488920 |    0 | 1 | 100
(1 row)

postgres=# update foo set b = 101 where a = 1;
UPDATE 1
postgres=# select ctid, xmin, xmax, a, b from foo;
ctid  |  xmin  | xmax | a |  b 
-------+--------+------+---+-----
(0,2) | 488921 |    0 | 1 | 101
(1 row)

postgres=# update foo set b = 101 where a = 1;
UPDATE 1
postgres=# select ctid, xmin, xmax, a, b from foo;
ctid  |  xmin  | xmax | a |  b 
-------+--------+------+---+-----
(0,3) | 488922 |    0 | 1 | 101
(1 row)

You could probably abuse this feature and skip some UPDATEs when old tuple is
identical to new tuple. The question is: why would someone issue the same
command multiple times? A broken application? I would say: don't do it. Besides
that, this feature could impose an overhead into a code path that already
consume substantial CPU time. I've seen some tables with RIF and dozens of
columns that would certainly contribute to increase the replication lag.

5. Currently, any existing rows that were not replicated, when updated to match the publication quals
using UPDATE tab SET pub_qual_column = 'not_filtered' where a = 1; won't be applied, as row 
does not exist on the subscriber.  It would be good if ALTER SUBSCRIBER REFRESH PUBLICATION
would help fetch such existing rows from publishers that match the qual now(either because the row changed
or the qual changed)
I see. This should be addressed by a resynchronize feature. Such option is
useful when you have to change the row filter. It should certainly be implement
as an ALTER SUBSCRIPTION subcommand.

I attached a new patch set that addresses:

* fix documentation;
* rename PublicationRelationQual to PublicationRelationInfo;
* remove the memset that was leftover from a previous patch set;
* add new tests to improve coverage (INSERT/UPDATE/DELETE to exercise the row
  filter code).


--
Euler Taveira

Attachment

pgsql-hackers by date:

Previous
From: "Euler Taveira"
Date:
Subject: Re: row filtering for logical replication
Next
From: Fujii Masao
Date:
Subject: Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.