Re: row filtering for logical replication - Mailing list pgsql-hackers

From Euler Taveira
Subject Re: row filtering for logical replication
Date
Msg-id 86d76e89-9a96-411a-8077-f8918cbae2e1@www.fastmail.com
Whole thread Raw
In response to Re: row filtering for logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: row filtering for logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
Re: row filtering for logical replication  (Andres Freund <andres@anarazel.de>)
Re: row filtering for logical replication  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Tue, Mar 30, 2021, at 8:23 AM, Amit Kapila wrote:
On Mon, Mar 29, 2021 at 6:47 PM Euler Taveira <euler@eulerto.com> wrote:
>
Few comments:
==============
1. How can we specify row filters for multiple tables for a
publication? Consider a case as below:
It is not possible. Row filter is a per table option. Isn't it clear from the
synopsis? The current design allows different row filter for tables in the same
publication. It is more flexible than a single row filter for a set of tables
(even if we would support such variant, there are some cases where the
condition should be different because the column names are not the same). You
can easily build a CREATE PUBLICATION command that adds the same row filter
multiple times using a DO block or use a similar approach in your favorite
language.

postgres=# CREATE TABLE tab_rowfilter_1 (a int primary key, b text);
CREATE TABLE
postgres=# CREATE TABLE tab_rowfilter_2 (c int primary key);
CREATE TABLE

postgres=# CREATE PUBLICATION tap_pub_1 FOR TABLE tab_rowfilter_1,
tab_rowfilter_2 WHERE (a > 1000 AND b <> 'filtered');
ERROR:  column "a" does not exist
LINE 1: ...FOR TABLE tab_rowfilter_1, tab_rowfilter_2 WHERE (a > 1000 A...

                                                             ^

postgres=# CREATE PUBLICATION tap_pub_1 FOR TABLE tab_rowfilter_1,
tab_rowfilter_2  WHERE (c > 1000);
CREATE PUBLICATION

It gives an error when I tried to specify the columns corresponding to
the first relation but is fine for columns for the second relation.
Then, I tried few more combinations like below but that didn't work.
CREATE PUBLICATION tap_pub_1 FOR TABLE tab_rowfilter_1 As t1,
tab_rowfilter_2 As t2 WHERE (t1.a > 1000 AND t1.b <> 'filtered');

Will users be allowed to specify join conditions among columns from
multiple tables?
It seems you are envisioning row filter as a publication property instead of a
publication-relation property. Due to the flexibility that the later approach
provides, I decided to use it because it covers more use cases. Regarding
allowing joins, it could possibly slow down a critical path, no? This code path
is executed by every change. If there are interest in the join support, we
might add it in a future patch.

2.
+ /*
+ * Although ALTER PUBLICATION grammar allows WHERE clause to be specified
+ * for DROP TABLE action, it doesn't make sense to allow it. We implement
+ * this restriction here, instead of complicating the grammar to enforce
+ * it.
+ */
+ if (stmt->tableAction == DEFELEM_DROP)
+ {
+ ListCell   *lc;
+
+ foreach(lc, stmt->tables)
+ {
+ PublicationTable *t = lfirst(lc);
+
+ if (t->whereClause)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot use a WHERE clause when removing table from
publication \"%s\"",
+ NameStr(pubform->pubname))));
+ }
+ }

Is there a reason to deal with this here separately rather than in the
ALTER PUBLICATION grammar?
Good question. IIRC the issue is that AlterPublicationStmt->tables has a list
element that was a relation_expr_list and was converted to
publication_table_list. If we share 'tables' with relation_expr_list (for ALTER
PUBLICATION ... DROP TABLE) and publication_table_list (for the other ALTER
PUBLICATION ... ADD|SET TABLE), the OpenTableList() has to know what list
element it is dealing with. I think I came to the conclusion that it is less
uglier to avoid changing OpenTableList() and CloseTableList().

[Doing some experimentation...]

Here is a patch that remove the referred code. It uses 2 distinct list
elements: relation_expr_list for ALTER PUBLICATION ... DROP TABLE and
publication_table_list for for ALTER PUBLICATION ... ADD|SET TABLE. A new
parameter was introduced to deal with the different elements of the list
'tables'.


--
Euler Taveira

Attachment

pgsql-hackers by date:

Previous
From: Andy Fan
Date:
Subject: Re: What to call an executor node which lazily caches tuples in a hash table?
Next
From: Tom Lane
Date:
Subject: Re: [PATCH v3 1/1] Fix detection of preadv/pwritev support for OSX.