Re: row filtering for logical replication - Mailing list pgsql-hackers

From Peter Smith
Subject Re: row filtering for logical replication
Date
Msg-id CAHut+PspqgAY1SANix1UKUEBYsW2DZjstAkmLXP=K7H2-1pgZg@mail.gmail.com
Whole thread Raw
In response to Re: row filtering for logical replication  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: row filtering for logical replication  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
The current PG docs text for CREATE PUBLICATION (in the v54-0001
patch) has a part that says

+   A nullable column in the <literal>WHERE</literal> clause could cause the
+   expression to evaluate to false; avoid using columns without not-null
+   constraints in the <literal>WHERE</literal> clause.

I felt that the caution to "avoid using" nullable columns is too
strongly worded. AFAIK nullable columns will work perfectly fine so
long as you take due care of them in the WHERE clause. In fact, it
might be very useful sometimes to filter on nullable columns.

Here is a small test example:

// publisher
test_pub=# create table t1 (id int primary key, msg text null);
test_pub=# create publication p1 for table t1 where (msg != 'three');
// subscriber
test_sub=# create table t1 (id int primary key, msg text null);
test_sub=# CREATE SUBSCRIPTION sub1 CONNECTION 'host=localhost
dbname=test_pub application_name=sub1' PUBLICATION p1;

// insert some data
test_pub=# insert into t1 values (1, 'one'), (2, 'two'), (3, 'three'),
(4, null), (5, 'five');
test_pub=# select * from t1;
 id |  msg
----+-------
  1 | one
  2 | two
  3 | three
  4 |
  5 | five
(5 rows)

// data at sub
test_sub=# select * from t1;
 id | msg
----+------
  1 | one
  2 | two
  5 | five
(3 rows)

Notice the row 4 with the NULL is also not replicated. But, perhaps we
were expecting it to be replicated (because NULL is not 'three'). To
do this, simply rewrite the WHERE clause to properly account for
nulls.

// truncate both sides
test_pub=# truncate table t1;
test_sub=# truncate table t1;

// alter the WHERE clause
test_pub=# alter publication p1 set table t1 where (msg is null or msg
!= 'three');

// insert data at pub
test_pub=# insert into t1 values (1, 'one'), (2, 'two'), (3, 'three'),
(4, null), (5, 'five');
INSERT 0 5
test_pub=# select * from t1;
 id |  msg
----+-------
  1 | one
  2 | two
  3 | three
  4 |
  5 | five
(5 rows)

// data at sub (not it includes the row 4)
test_sub=# select * from t1;
 id | msg
----+------
  1 | one
  2 | two
  4 |
  5 | five
(4 rows)

~~

So, IMO the PG docs wording for this part should be relaxed a bit.

e.g.
BEFORE:
+   A nullable column in the <literal>WHERE</literal> clause could cause the
+   expression to evaluate to false; avoid using columns without not-null
+   constraints in the <literal>WHERE</literal> clause.
AFTER:
+   A nullable column in the <literal>WHERE</literal> clause could cause the
+   expression to evaluate to false. To avoid unexpected results, any possible
+   null values should be accounted for.

Thoughts?

------
Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Allow escape in application_name
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: sequences vs. synchronous replication