Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id CAA4eK1K9K5r9yo6gCtsOyHBEX2HCSmv3FWwnreXmWE5pY3NS4A@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
Responses Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
On Fri, Mar 10, 2023 at 7:58 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>
> I think one option could be to drop some cases altogether, but not sure we'd want that.
>
> As a semi-related question: Are you aware of any setting that'd make pg_stat_all_indexes
> reflect the changes sooner? It is hard to debug what is the bottleneck in the tests, but
> I have a suspicion that there might be several poll_query_until() calls on
> pg_stat_all_indexes, which might be the reason?
>

Yeah, I also think poll_query_until() calls on pg_stat_all_indexes is
the main reason for these tests taking more time. When I commented
those polls, it drastically reduces the test time. On looking at
pgstat_report_stat(), it seems we don't report stats sooner than 1s
and as most of this patch's test relies on stats, it leads to taking
more time. I don't have a better idea to verify this patch without
checking whether the index scan is really used by referring to
pg_stat_all_indexes. I think trying to reduce the poll calls may help
in reducing the test timings further. Some ideas on those lines are as
follows:
1.
+# Testcase start: SUBSCRIPTION USES INDEX WITH PUB/SUB DIFFERENT DATA VIA
+# A UNIQUE INDEX THAT IS NOT PRIMARY KEY OR REPLICA IDENTITY

No need to use Delete test separate for this.

2.
+$node_publisher->safe_psql('postgres',
+ "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;");
+$node_publisher->safe_psql('postgres',
+ "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 3;");
+
+# check if the index is used even when the index has NULL values
+$node_subscriber->poll_query_until(
+ 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full updates test_replica_id_full table";

Here, I think only one update is sufficient.

3.
+$node_subscriber->safe_psql('postgres',
+ "CREATE INDEX people_last_names ON people(lastname)");
+
+# wait until the index is created
+$node_subscriber->poll_query_until(
+ 'postgres', q{select count(*)=1 from pg_stat_all_indexes where
indexrelname = 'people_last_names';}
+) or die "Timed out while waiting for creating index people_last_names";

I don't think we need this poll.

4.
+# update 2 rows
+$node_publisher->safe_psql('postgres',
+ "UPDATE people SET firstname = 'no-name' WHERE firstname = 'first_name_1';");
+$node_publisher->safe_psql('postgres',
+ "UPDATE people SET firstname = 'no-name' WHERE firstname =
'first_name_3' AND lastname = 'last_name_3';");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
indexrelname = 'people_names';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full updates two rows via index scan with index on
expressions and columns";
+
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM people WHERE firstname = 'no-name';");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select idx_scan=4 from pg_stat_all_indexes where
indexrelname = 'people_names';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full deletes two rows via index scan with index on
expressions and columns";

I think having one update or delete should be sufficient.

5.
+# update rows, moving them to other partitions
+$node_publisher->safe_psql('postgres',
+ "UPDATE users_table_part SET value_1 = 0 WHERE user_id = 4;");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select sum(idx_scan)=1 from pg_stat_all_indexes where
indexrelname ilike 'users_table_part_%';}
+) or die "Timed out while waiting for updates on partitioned table with index";
+
+# delete rows from different partitions
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM users_table_part WHERE user_id = 1 and value_1 = 1;");
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM users_table_part WHERE user_id = 12 and value_1 = 12;");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select sum(idx_scan)=3 from pg_stat_all_indexes where
indexrelname ilike 'users_table_part_%';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full updates partitioned table";
+

Can we combine these two polls?

6.
+# Testcase start: SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS, ALSO
+# DROPS COLUMN

In this test, let's try to update/delete 2-3 rows instead of 20. And
after drop columns, let's keep just one of the update or delete.

7. Apart from the above, I think it is better to use
wait_for_catchup() consistently before trying to verify the data on
the subscriber. We always use it in other tests. I guess here you are
relying on the poll for index scans to ensure that data is replicated
but I feel it may still be better to use wait_for_catchup().
Similarly, wait_for_subscription_sync uses the publisher name and
appname in other tests, so it is better to be consistent. It can avoid
random failures by ensuring data is synced.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: Add LZ4 compression in pg_dump
Next
From: "Regina Obe"
Date:
Subject: RE: Ability to reference other extensions by schema in extension scripts