Home > mailing lists

Re: row filtering for logical replication - Mailing list pgsql-hackers

From	Euler Taveira
Subject	Re: row filtering for logical replication
Date	March 22, 2021 05:15:10
Msg-id	7e4b5c48-a3d3-4b7e-b455-2d50c60a0ab7@www.fastmail.com Whole thread Raw
In response to	Re: row filtering for logical replication (Rahila Syed <rahilasyed90@gmail.com>)
Responses	Re: row filtering for logical replication
List	pgsql-hackers

Tree view

On Thu, Mar 18, 2021, at 7:51 AM, Rahila Syed wrote:

1.
I think the docs are being incorrectly updated to add a column to pg_partitioned_table
instead of pg_publication_rel.

Good catch.

2. +typedef struct PublicationRelationQual
+{
+ Oid relid;
+ Relation relation;
+ Node *whereClause;
+} PublicationRelationQual;

Can this be given a more generic name like PublicationRelationInfo, so that the same struct
can be used to store additional relation information in future, for ex. column names, if column filtering is introduced.

Good idea. I rename it and it'll be in this next patch set.

3. Also, in the above structure, it seems that we can do with storing just relid and derive relation information from it
using table_open when needed. Am I missing something?

We need the Relation. See OpenTableList(). The way this code is organized, it

opens all publication tables and append each Relation to a list. This list is

used in PublicationAddTables() to update the catalog. I tried to minimize the

number of refactors while introducing this feature. We could probably revise

this code in the future (someone said in a previous discussion that it is weird

to open relations in one source code file -- publicationcmds.c -- and use it

into another one -- pg_publication.c).

4. Currently in logical replication, I noticed that an UPDATE is being applied on the subscriber even if the column values
are unchanged. Can row-filtering feature be used to change it such that, when all the OLD.columns = NEW.columns, filter out
the row from being sent to the subscriber. I understand this would need REPLICA IDENTITY FULL to work, but would be an
improvement from the existing state.

This is how Postgres works.

postgres=# create table foo (a integer, b integer);

CREATE TABLE

postgres=# insert into foo values(1, 100);

INSERT 0 1

postgres=# select ctid, xmin, xmax, a, b from foo;

ctid | xmin | xmax | a | b

-------+--------+------+---+-----

(0,1) | 488920 | 0 | 1 | 100

(1 row)

postgres=# update foo set b = 101 where a = 1;

UPDATE 1

postgres=# select ctid, xmin, xmax, a, b from foo;

ctid | xmin | xmax | a | b

-------+--------+------+---+-----

(0,2) | 488921 | 0 | 1 | 101

(1 row)

postgres=# update foo set b = 101 where a = 1;

UPDATE 1

postgres=# select ctid, xmin, xmax, a, b from foo;

ctid | xmin | xmax | a | b

-------+--------+------+---+-----

(0,3) | 488922 | 0 | 1 | 101

(1 row)

You could probably abuse this feature and skip some UPDATEs when old tuple is

identical to new tuple. The question is: why would someone issue the same

command multiple times? A broken application? I would say: don't do it. Besides

that, this feature could impose an overhead into a code path that already

consume substantial CPU time. I've seen some tables with RIF and dozens of

columns that would certainly contribute to increase the replication lag.

5. Currently, any existing rows that were not replicated, when updated to match the publication quals
using UPDATE tab SET pub_qual_column = 'not_filtered' where a = 1; won't be applied, as row
does not exist on the subscriber. It would be good if ALTER SUBSCRIBER REFRESH PUBLICATION
would help fetch such existing rows from publishers that match the qual now(either because the row changed
or the qual changed)

I see. This should be addressed by a resynchronize feature. Such option is

useful when you have to change the row filter. It should certainly be implement

as an ALTER SUBSCRIPTION subcommand.

I attached a new patch set that addresses:

* fix documentation;

* rename PublicationRelationQual to PublicationRelationInfo;

* remove the memset that was leftover from a previous patch set;

* add new tests to improve coverage (INSERT/UPDATE/DELETE to exercise the row

filter code).

Euler Taveira

EDB https://www.enterprisedb.com/

Attachment

pgsql-hackers by date:

From: "Euler Taveira"
Date: 22 March 2021, 04:58:14
Subject: Re: row filtering for logical replication

From: Fujii Masao
Date: 22 March 2021, 05:22:07
Subject: Re: Wrong statistics for size of XLOG_SWITCH during pg_waldump.

Re: row filtering for logical replication - Mailing list pgsql-hackers

Attachment

Previous

Next