Re: row filtering for logical replication - Mailing list pgsql-hackers

From japin
Subject Re: row filtering for logical replication
Date
Msg-id ME3P282MB16678160F5EA09DCB962474EB6B69@ME3P282MB1667.AUSP282.PROD.OUTLOOK.COM
Whole thread Raw
In response to Re: row filtering for logical replication  ("Euler Taveira" <euler@eulerto.com>)
Responses Re: row filtering for logical replication
List pgsql-hackers
On Mon, 01 Feb 2021 at 08:23, Euler Taveira <euler@eulerto.com> wrote:
> On Mon, Mar 16, 2020, at 10:58 AM, David Steele wrote:
>> Please submit to a future CF when a new patch is available.
> Hi,
>
> This is another version of the row filter patch. Patch summary:
>
> 0001: refactor to remove dead code
> 0002: grammar refactor for row filter
> 0003: core code, documentation, and tests
> 0004: psql code
> 0005: pg_dump support
> 0006: debug messages (only for test purposes)
> 0007: measure row filter overhead (only for test purposes)
>

Thanks for updating the patch.  Here are some comments:

(1)
+         <para>
+          If this parameter is <literal>false</literal>, it uses the
+          <literal>WHERE</literal> clause from the partition; otherwise,the
+          <literal>WHERE</literal> clause from the partitioned table is used.
          </para>

otherwise,the -> otherwise, the

(2)
+  <para>
+  Columns used in the <literal>WHERE</literal> clause must be part of the
+  primary key or be covered by <literal>REPLICA IDENTITY</literal> otherwise
+  <command>UPDATE</command> and <command>DELETE</command> operations will not
+  be replicated.
+  </para>
+

IMO we should indent one space here.

(3)
+
+  <para>
+  The <literal>WHERE</literal> clause expression is executed with the role used
+  for the replication connection.
+  </para>

Same as (2).

The documentation says:

>  Columns used in the <literal>WHERE</literal> clause must be part of the
>  primary key or be covered by <literal>REPLICA IDENTITY</literal> otherwise
>  <command>UPDATE</command> and <command>DELETE</command> operations will not
>  be replicated.

Why we need this limitation? Am I missing something?

When I tested, I find that the UPDATE can be replicated, while the DELETE
cannot be replicated.  Here is my test-case:

    -- 1. Create tables and publications on publisher
    CREATE TABLE t1 (a int primary key, b int);
        CREATE TABLE t2 (a int primary key, b int);
        INSERT INTO t1 VALUES (1, 11);
        INSERT INTO t2 VALUES (1, 11);
    CREATE PUBLICATION mypub1 FOR TABLE t1;
        CREATE PUBLICATION mypub2 FOR TABLE t2 WHERE (b > 10);

    -- 2. Create tables and subscriptions on subscriber
        CREATE TABLE t1 (a int primary key, b int);
        CREATE TABLE t2 (a int primary key, b int);
        CREATE SUBSCRIPTION mysub1 CONNECTION 'host=localhost port=8765 dbname=postgres' PUBLICATION mypub1;
        CREATE SUBSCRIPTION mysub2 CONNECTION 'host=localhost port=8765 dbname=postgres' PUBLICATION mypub2;

    -- 3. Check publications on publisher
        postgres=# \dRp+
                               Publication mypub1
     Owner | All tables | Inserts | Updates | Deletes | Truncates | Via root
    -------+------------+---------+---------+---------+-----------+----------
     japin | f          | t       | t       | t       | t         | f
    Tables:
        "public.t1"
    
                               Publication mypub2
     Owner | All tables | Inserts | Updates | Deletes | Truncates | Via root
    -------+------------+---------+---------+---------+-----------+----------
     japin | f          | t       | t       | t       | t         | f
    Tables:
        "public.t2"  WHERE (b > 10)

    -- 4. Check initialization data on subscriber
    postgres=# table t1;
     a | b
    ---+----
     1 | 11
    (1 row)
    
    postgres=# table t2;
     a | b
    ---+----
     1 | 11
    (1 row)

    -- 5. The update on publisher
    postgres=# update t1 set b = 111 where b = 11;
    UPDATE 1
    postgres=# table t1;
     a |  b
    ---+-----
     1 | 111
    (1 row)

    postgres=# update t2 set b = 111 where b = 11;
    UPDATE 1
    postgres=# table t2;
     a |  b
    ---+-----
     1 | 111
    (1 row)

    -- 6. check the updated records on subscriber
    postgres=# table  t1;
     a |  b
    ---+-----
     1 | 111
    (1 row)
    
    postgres=# table  t2;
     a |  b
    ---+-----
     1 | 111
    (1 row)

    -- 7. Delete records on publisher
    postgres=# delete from t1 where b = 111;
    DELETE 1
    postgres=# table t1;
     a | b
    ---+---
    (0 rows)
    
    postgres=# delete from t2 where b = 111;
    DELETE 1
    postgres=# table t2;
     a | b
    ---+---
    (0 rows)

    -- 8. Check the deleted records on subscriber
    postgres=# table t1;
     a | b
    ---+---
    (0 rows)
    
    postgres=# table t2;
     a |  b
    ---+-----
     1 | 111
    (1 row)

I do a simple debug, and find that the pgoutput_row_filter() return false when I
execute "delete from t2 where b = 111;".

Does the publication only load the REPLICA IDENTITY columns into oldtuple when we
execute DELETE? So the pgoutput_row_filter() cannot find non REPLICA IDENTITY
columns, which cause it return false, right?  If that's right, the UPDATE might
not be limitation by REPLICA IDENTITY, because all columns are in newtuple,
isn't it?

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.



pgsql-hackers by date:

Previous
From: wenjing
Date:
Subject: Re: [Proposal] Global temporary tables
Next
From: "Hou, Zhijie"
Date:
Subject: RE: Parallel INSERT (INTO ... SELECT ...)