Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From Önder Kalacı
Subject Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id CACawEhX6TvX+j8EpcpCKvnMGao8Gcp8W43Sgc87pg9o6-Xbf2Q@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
Responses Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
List pgsql-hackers
Hi hackers,

I rebased the changes to the current master branch, reflected pg_indent suggestions and also made a few minor style changes.

Also, tested the patch with a few new PG 15 features in combination (such as row/column filter in logical replication, NULLS NOT DISTINCT indexes etc.)  as well somethings that I haven't tested before such as publish_via_partition_root.

I have not added those tests to the regression tests as the existing tests of this patch are already bulky and I don't see a specific reason to add all combinations. Still, if anyone thinks that it is a good idea to add more tests, I can do that. For reference, here are the tests that I did manually: More Replication Index Tests (github.com)

Attached v21.

Onder KALACI
  


Önder Kalacı <onderkalaci@gmail.com>, 21 Eki 2022 Cum, 14:14 tarihinde şunu yazdı:
Hi Shi yu, all


In execReplication.c:

+       TypeCacheEntry **eq = NULL; /* only used when the index is not unique */

Maybe the comment here should be changed. Now it is used when the index is not
primary key or replica identity index.


makes sense, updated
 
2.
+# wait until the index is created
+$node_subscriber->poll_query_until(
+       'postgres', q{select count(*)=1 from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates one row via index";

The message doesn't seem right,  should it be changed to "Timed out while
waiting for creating index test_replica_id_full_idx"?

yes, updated
 

3.
+# now, ingest more data and create index on column y which has higher cardinality
+# then create an index on column y so that future commands uses the index on column
+$node_publisher->safe_psql('postgres',
+       "INSERT INTO test_replica_id_full SELECT 50, i FROM generate_series(0,3100)i;");

The comment say "create (an) index on column y" twice, maybe it can be changed
to:

now, ingest more data and create index on column y which has higher cardinality,
so that future commands will use the index on column y


fixed
 
4.
+# deletes 200 rows
+$node_publisher->safe_psql('postgres',
+       "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+       'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates 200 rows via index";

It would be better to call wait_for_catchup() after DELETE. (And some other
places in this file.)

Hmm, I cannot follow this easily. 

Why do you think wait_for_catchup() should be called? In general, I tried to follow a pattern where we call poll_query_until() so that we are sure that all the changes are replicated via the index. And then, an additional check with `is($result, ..` such that we also verify the correctness of the data. 

One alternative could be to use wait_for_catchup() and then have multiple  `is($result, ..` to check both pg_stat_all_indexes and the correctness of the data.

One minor advantage I see with the current approach is that every  `is($result, ..` adds one step to the test. So, if I use  `is($result, ..`  for pg_stat_all_indexes queries, then I'd be adding multiple steps for a single test. It felt it is more natural/common to test roughly once with  `is($result, ..` on each test. Or, at least do not add additional ones for pg_stat_all_indexes checks.

 
Besides, the "updates" in the message should be "deletes".


fixed
 
5.
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+       'postgres', q{select sum(idx_scan)=10 from pg_stat_all_indexes where indexrelname ilike 'users_table_part_%';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates partitioned table";

Maybe we should say "updates partitioned table with index" in this message.


Fixed

Attached v20.

Thanks!

Onder KALACI
Attachment

pgsql-hackers by date:

Previous
From: Christoph Berg
Date:
Subject: Re: [PATCH] Support % wildcard in extension upgrade filenames
Next
From: Tom Lane
Date:
Subject: Re: ExecRTCheckPerms() and many prunable partitions