Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers
From | Önder Kalacı |
---|---|
Subject | Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher |
Date | |
Msg-id | CACawEhX6TvX+j8EpcpCKvnMGao8Gcp8W43Sgc87pg9o6-Xbf2Q@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher (Önder Kalacı <onderkalaci@gmail.com>) |
Responses |
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
|
List | pgsql-hackers |
Hi hackers,
I rebased the changes to the current master branch, reflected pg_indent suggestions and also made a few minor style changes.
Also, tested the patch with a few new PG 15 features in combination (such as row/column filter in logical replication, NULLS NOT DISTINCT indexes etc.) as well somethings that I haven't tested before such as publish_via_partition_root.
I have not added those tests to the regression tests as the existing tests of this patch are already bulky and I don't see a specific reason to add all combinations. Still, if anyone thinks that it is a good idea to add more tests, I can do that. For reference, here are the tests that I did manually: More Replication Index Tests (github.com)
Attached v21.
Onder KALACI
Önder Kalacı <onderkalaci@gmail.com>, 21 Eki 2022 Cum, 14:14 tarihinde şunu yazdı:
Hi Shi yu, all
In execReplication.c:
+ TypeCacheEntry **eq = NULL; /* only used when the index is not unique */
Maybe the comment here should be changed. Now it is used when the index is not
primary key or replica identity index.makes sense, updated2.
+# wait until the index is created
+$node_subscriber->poll_query_until(
+ 'postgres', q{select count(*)=1 from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates one row via index";
The message doesn't seem right, should it be changed to "Timed out while
waiting for creating index test_replica_id_full_idx"?yes, updated
3.
+# now, ingest more data and create index on column y which has higher cardinality
+# then create an index on column y so that future commands uses the index on column
+$node_publisher->safe_psql('postgres',
+ "INSERT INTO test_replica_id_full SELECT 50, i FROM generate_series(0,3100)i;");
The comment say "create (an) index on column y" twice, maybe it can be changed
to:
now, ingest more data and create index on column y which has higher cardinality,
so that future commands will use the index on column yfixed4.
+# deletes 200 rows
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates 200 rows via index";
It would be better to call wait_for_catchup() after DELETE. (And some other
places in this file.)Hmm, I cannot follow this easily.Why do you think wait_for_catchup() should be called? In general, I tried to follow a pattern where we call poll_query_until() so that we are sure that all the changes are replicated via the index. And then, an additional check with `is($result, ..` such that we also verify the correctness of the data.One alternative could be to use wait_for_catchup() and then have multiple `is($result, ..` to check both pg_stat_all_indexes and the correctness of the data.One minor advantage I see with the current approach is that every `is($result, ..` adds one step to the test. So, if I use `is($result, ..` for pg_stat_all_indexes queries, then I'd be adding multiple steps for a single test. It felt it is more natural/common to test roughly once with `is($result, ..` on each test. Or, at least do not add additional ones for pg_stat_all_indexes checks.Besides, the "updates" in the message should be "deletes".fixed5.
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select sum(idx_scan)=10 from pg_stat_all_indexes where indexrelname ilike 'users_table_part_%';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates partitioned table";
Maybe we should say "updates partitioned table with index" in this message.FixedAttached v20.Thanks!Onder KALACI
Attachment
pgsql-hackers by date: