Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From Önder Kalacı
Subject Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id CACawEhUUHcSVhp-bODSAnYNUQHY=4gvY29DXM=yuphgrkcqqmQ@mail.gmail.com
Whole thread Raw
In response to RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  ("shiy.fnst@fujitsu.com" <shiy.fnst@fujitsu.com>)
Responses Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
List pgsql-hackers
Hi Shi yu, all


In execReplication.c:

+       TypeCacheEntry **eq = NULL; /* only used when the index is not unique */

Maybe the comment here should be changed. Now it is used when the index is not
primary key or replica identity index.


makes sense, updated
 
2.
+# wait until the index is created
+$node_subscriber->poll_query_until(
+       'postgres', q{select count(*)=1 from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates one row via index";

The message doesn't seem right,  should it be changed to "Timed out while
waiting for creating index test_replica_id_full_idx"?

yes, updated
 

3.
+# now, ingest more data and create index on column y which has higher cardinality
+# then create an index on column y so that future commands uses the index on column
+$node_publisher->safe_psql('postgres',
+       "INSERT INTO test_replica_id_full SELECT 50, i FROM generate_series(0,3100)i;");

The comment say "create (an) index on column y" twice, maybe it can be changed
to:

now, ingest more data and create index on column y which has higher cardinality,
so that future commands will use the index on column y


fixed
 
4.
+# deletes 200 rows
+$node_publisher->safe_psql('postgres',
+       "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+       'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates 200 rows via index";

It would be better to call wait_for_catchup() after DELETE. (And some other
places in this file.)

Hmm, I cannot follow this easily. 

Why do you think wait_for_catchup() should be called? In general, I tried to follow a pattern where we call poll_query_until() so that we are sure that all the changes are replicated via the index. And then, an additional check with `is($result, ..` such that we also verify the correctness of the data. 

One alternative could be to use wait_for_catchup() and then have multiple  `is($result, ..` to check both pg_stat_all_indexes and the correctness of the data.

One minor advantage I see with the current approach is that every  `is($result, ..` adds one step to the test. So, if I use  `is($result, ..`  for pg_stat_all_indexes queries, then I'd be adding multiple steps for a single test. It felt it is more natural/common to test roughly once with  `is($result, ..` on each test. Or, at least do not add additional ones for pg_stat_all_indexes checks.

 
Besides, the "updates" in the message should be "deletes".


fixed
 
5.
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+       'postgres', q{select sum(idx_scan)=10 from pg_stat_all_indexes where indexrelname ilike 'users_table_part_%';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates partitioned table";

Maybe we should say "updates partitioned table with index" in this message.


Fixed

Attached v20.

Thanks!

Onder KALACI
Attachment

pgsql-hackers by date:

Previous
From: "Drouvot, Bertrand"
Date:
Subject: Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
Next
From: Justin Pryzby
Date:
Subject: Re: parse partition strategy string in gram.y