Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From Önder Kalacı
Subject Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id CACawEhWdoiaXwKjn5ZfAKF_Toz=t1aqT1ewHmJLRnXmKOQKwyg@mail.gmail.com
Whole thread Raw
In response to RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  ("shiy.fnst@fujitsu.com" <shiy.fnst@fujitsu.com>)
Responses RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  ("wangw.fnst@fujitsu.com" <wangw.fnst@fujitsu.com>)
List pgsql-hackers
Hi,

Thanks for the review!


Here are some comments on v17 patch.

1.
-LogicalRepRelMapEntry *
+LogicalRepPartMapEntry *
 logicalrep_partition_open(LogicalRepRelMapEntry *root,
                                                  Relation partrel, AttrMap *map)
 {

Is there any reason to change the return type of logicalrep_partition_open()? It
seems ok without this change.

I think you are right, I probably needed that in some of my earlier iterations of the patch, but now it seems redundant. Reverted back to the original version.
 

2.

+                * of the relation cache entry (e.g., such as ANALYZE or
+                * CREATE/DROP index on the relation).

"e.g." and "such as" mean the same. I think we remove one of them.

fixed
 

3.
+$node_subscriber->poll_query_until(
+       'postgres', q{select (idx_scan = 2) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for'check subscriber tap_sub_rep_full deletes one row via index";
+

+$node_subscriber->poll_query_until(
+       'postgres', q{select (idx_scan = 1) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idy';}
+) or die "Timed out while waiting for'check subscriber tap_sub_rep_full deletes one row via index";


"for'check" -> "for check"

fixed
 

3.
+$node_subscriber->safe_psql('postgres',
+       "SELECT pg_reload_conf();");
+
+# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN
+# ====================================================================
+
+$node_subscriber->stop('fast');
+$node_publisher->stop('fast');
+

"Testcase start" in the comment should be "Testcase end".


fixed
 
4.
There seems to be a problem in the following scenario, which results in
inconsistent data between publisher and subscriber.

-- publisher
CREATE TABLE test_replica_id_full (x int, y int);
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;

-- subscriber
CREATE TABLE test_replica_id_full (x int, y int);
CREATE UNIQUE INDEX test_replica_id_full_idx ON test_replica_id_full(x);
CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres port=5432' PUBLICATION tap_pub_rep_full;

-- publisher
INSERT INTO test_replica_id_full VALUES (NULL,1);
INSERT INTO test_replica_id_full VALUES (NULL,2);
INSERT INTO test_replica_id_full VALUES (NULL,3);
update test_replica_id_full SET x=1 where y=2;

The data in publisher:
postgres=# select * from test_replica_id_full order by y;
 x | y
---+---
   | 1
 1 | 2
   | 3
(3 rows)

The data in subscriber:
postgres=# select * from test_replica_id_full order by y;
 x | y
---+---
   | 2
 1 | 2
   | 3
(3 rows)

There is no such problem on master branch.


Uff, the second problem reported regarding NULL values for this patch (both by you). First, v18 contains the fix for the problem. It turns out that my idea of treating all unique indexes (pkey, replica identity and unique regular indexes) the same proved to be wrong.  The former two require all the involved columns to have NOT NULL. The latter not. 

This resulted in RelationFindReplTupleByIndex() to skip tuples_equal() for regular unique indexes (e.g., non pkey/replid). Hence, the first NULL value is considered the matching tuple. Instead, we should be doing a full tuple equality check (e.g., tuples_equal). This is what v18 does. Also, add the above scenario as a test.

I think we can probably skip tuples_equal() for unique indexes that consist of only NOT NULL columns. However, that seems like an over-optimization. If you have such a unique index, why not create a primary key anyway?  That's why I don't see much value in compicating the code for that use case.

Thanks for the review & testing. I'll focus more on the NULL values on my own testing as well. Still, I wanted to push my changes so that you can also have a look if possible.

Attach v18. 

Onder KALACI

 
Attachment

pgsql-hackers by date:

Previous
From: Karina Litskevich
Date:
Subject: Re: Error for WITH options on partitioned tables
Next
From: "Drouvot, Bertrand"
Date:
Subject: Add regular expression testing for user name mapping in the peer authentication TAP test