Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From Önder Kalacı
Subject Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id CACawEhUvJEzcy8tNSkwDTacHmjaYo_2mfZ4dRr_6UcsUbf+nxA@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hi Amit, all

1.
+# Testcase start: SUBSCRIPTION USES INDEX WITH PUB/SUB DIFFERENT DATA VIA
+# A UNIQUE INDEX THAT IS NOT PRIMARY KEY OR REPLICA IDENTITY

No need to use Delete test separate for this.

Yeah, there is really no difference between update/delete for this patch,
so it makes sense. I initially added it for completeness for the coverage,
but as it has the perf overhead for the tests, I agree that we could 
drop some of those.
 

2.
+$node_publisher->safe_psql('postgres',
+ "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;");
+$node_publisher->safe_psql('postgres',
+ "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 3;");
+
+# check if the index is used even when the index has NULL values
+$node_subscriber->poll_query_until(
+ 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full updates test_replica_id_full table";

Here, I think only one update is sufficient.

done. I guess you requested this change so that we would wait
for idx_scan=1 not idx_scan=2, which could help.


3.
+$node_subscriber->safe_psql('postgres',
+ "CREATE INDEX people_last_names ON people(lastname)");
+
+# wait until the index is created
+$node_subscriber->poll_query_until(
+ 'postgres', q{select count(*)=1 from pg_stat_all_indexes where
indexrelname = 'people_last_names';}
+) or die "Timed out while waiting for creating index people_last_names";

I don't think we need this poll.

 true, not sure why I have this. none of the tests has this anyway.


4.
+# update 2 rows
+$node_publisher->safe_psql('postgres',
+ "UPDATE people SET firstname = 'no-name' WHERE firstname = 'first_name_1';");
+$node_publisher->safe_psql('postgres',
+ "UPDATE people SET firstname = 'no-name' WHERE firstname =
'first_name_3' AND lastname = 'last_name_3';");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
indexrelname = 'people_names';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full updates two rows via index scan with index on
expressions and columns";
+
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM people WHERE firstname = 'no-name';");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select idx_scan=4 from pg_stat_all_indexes where
indexrelname = 'people_names';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full deletes two rows via index scan with index on
expressions and columns";

I think having one update or delete should be sufficient.

So, I dropped the 2nd update, but kept 1 update and 1 delete.
The latter deletes the tuple updated by the former. Seems like
an interesting test to keep.

Still, I dropped one of the extra poll_query_until, which is probably
good enough for this one? Let me know if you think otherwise.
 

5.
+# update rows, moving them to other partitions
+$node_publisher->safe_psql('postgres',
+ "UPDATE users_table_part SET value_1 = 0 WHERE user_id = 4;");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select sum(idx_scan)=1 from pg_stat_all_indexes where
indexrelname ilike 'users_table_part_%';}
+) or die "Timed out while waiting for updates on partitioned table with index";
+
+# delete rows from different partitions
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM users_table_part WHERE user_id = 1 and value_1 = 1;");
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM users_table_part WHERE user_id = 12 and value_1 = 12;");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select sum(idx_scan)=3 from pg_stat_all_indexes where
indexrelname ilike 'users_table_part_%';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full updates partitioned table";
+

Can we combine these two polls?

Looking at it closely, the first one seems like an unnecessary poll anyway.
We can simply check the idxscan at the end of the test, I don't see
value in checking earlier.
 

6.
+# Testcase start: SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS, ALSO
+# DROPS COLUMN

In this test, let's try to update/delete 2-3 rows instead of 20. And
after drop columns, let's keep just one of the update or delete.

changed to 3 rows
 

7. Apart from the above, I think it is better to use
wait_for_catchup() consistently before trying to verify the data on
the subscriber. We always use it in other tests. I guess here you are
relying on the poll for index scans to ensure that data is replicated
but I feel it may still be better to use wait_for_catchup().

Yes, that was my understanding & expectation. I'm not convinced that
 wait_for_catchup() is strictly needed, as without catching up, how could
pg_stat_all_indexes be updated? Still, it is good to be consistent
with the test suite. So, applied your suggestion.

Similarly, wait_for_subscription_sync uses the publisher name and
appname in other tests, so it is better to be consistent. It can avoid
random failures by ensuring data is synced.
 
makes sense.

I'll attach a new patch in the next e-mail, along with your 
other comments.


Thanks,
Onder KALACI
 
 

pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Implement IF NOT EXISTS for CREATE PUBLICATION AND CREATE SUBSCRIPTION
Next
From: Önder Kalacı
Date:
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher