Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers
From | Önder Kalacı |
---|---|
Subject | Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher |
Date | |
Msg-id | CACawEhWA5P7PkW6GqLt-iBQposVxJZeu_=FNTQF-6ch6SkyZhw@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher (Peter Smith <smithpb2250@gmail.com>) |
Responses |
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher |
List | pgsql-hackers |
Hi Peter,
1. src/backend/executor/execReplication.c - build_replindex_scan_key
- * This is not generic routine, it expects the idxrel to be replication
- * identity of a rel and meet all limitations associated with that.
+ * This is not generic routine, it expects the idxrel to be an index
+ * that planner would choose if the searchslot includes all the columns
+ * (e.g., REPLICA IDENTITY FULL on the source).
*/
-static bool
+static int
build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
TupleTableSlot *searchslot)
(I know this is not caused by your patch but maybe fix it at the same time?)
"This is not generic routine, it expects..." -> "This is not a generic
routine - it expects..."
Fixed
2.
+ IndexInfo *indexInfo PG_USED_FOR_ASSERTS_ONLY;
+
+ /*
+ * This attribute is an expression, and
+ * SuitablePathsForRepIdentFull() was called earlier while the
+ * index for subscriber is selected. There, the indexes comprising
+ * *only* expressions have already been eliminated.
+ *
+ * We sanity check this now.
+ */
+#ifdef USE_ASSERT_CHECKING
+ indexInfo = BuildIndexInfo(idxrel);
+ Assert(!IndexOnlyOnExpression(indexInfo));
+#endif
2a.
"while the index for subscriber is selected..." -> "when the index for
the subscriber was selected...”
fixed
~
2b.
Because there is only one declaration in this code block you could
simplify this a bit if you wanted to.
SUGGESTION
/*
* This attribute is an expression, and
* SuitablePathsForRepIdentFull() was called earlier while the
* index for subscriber is selected. There, the indexes comprising
* *only* expressions have already been eliminated.
*
* We sanity check this now.
*/
#ifdef USE_ASSERT_CHECKING
IndexInfo *indexInfo = BuildIndexInfo(idxrel);
Assert(!IndexOnlyOnExpression(indexInfo));
#endif
Makes sense, no reason to declare above
~~~
3. src/backend/executor/execReplication.c - RelationFindReplTupleByIndex
+ /* Start an index scan. */
+ scan = index_beginscan(rel, idxrel, &snap, skey_attoff, 0);
retry:
found = false;
It might be better to have a blank line before that ‘retry’ label,
like in the original code.
agreed, fixed
======
4. src/backend/replication/logical/relation.c
+/* see LogicalRepPartMapEntry for details in logicalrelation.h */
static HTAB *LogicalRepPartMap = NULL;
Personally, I'd word that something like:
"/* For LogicalRepPartMap details see LogicalRepPartMapEntry in
logicalrelation.h */"
but YMMV.
I also don't have any strong opinions on that, updated to your suggestion.
~~~
5. src/backend/replication/logical/relation.c -
GenerateDummySelectPlannerInfoForRelation
+/*
+ * This is not a generic function, helper function for
+ * GetCheapestReplicaIdentityFullPath. The function creates
+ * a dummy PlannerInfo for the given relationId as if the
+ * relation is queried with SELECT command.
+ */
+static PlannerInfo *
+GenerateDummySelectPlannerInfoForRelation(Oid relationId)
(mentioned this one in my previous post)
"This is not a generic function, helper function" -> "This is not a
generic function. It is a helper function"
Yes, applied.
~~~
6. src/backend/replication/logical/relation.c -
GetCheapestReplicaIdentityFullPath
+/*
+ * Generate all the possible paths for the given subscriber relation,
+ * for the cases that the source relation is replicated via REPLICA
+ * IDENTITY FULL. The function returns the cheapest Path among the
+ * eligible paths, see SuitablePathsForRepIdentFull().
+ *
+ * The function guarantees to return a path, because it adds sequential
+ * scan path if needed.
+ *
+ * The function assumes that all the columns will be provided during
+ * the execution phase, given that REPLICA IDENTITY FULL guarantees
+ * that.
+ */
+static Path *
+GetCheapestReplicaIdentityFullPath(Relation localrel)
"for the cases that..." -> "for cases where..."
sounds good
~~~
7.
+ /*
+ * Currently it is not possible for planner to pick a partial index or
+ * indexes only on expressions. We still want to be explicit and eliminate
+ * such paths proactively.
"for planner..." -> "for the planner..."
fixed
======
8. .../subscription/t/032_subscribe_use_index.pl - general
8a.
(remove the 'we')
"# we wait until..." -> "# wait until..." X many occurrences
~
8b.
(remove the 'we')
"# we show that..." -> “# show that..." X many occurrences
Ok, removed all "we"s in the test
~~~
9.
There is inconsistent wording for some of your test case start/end comments
9a.
e.g.
start: SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS
end: SUBSCRIPTION USES INDEX MODIFIES MULTIPLE ROWS
~
9b.
e.g.
start: SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS
end: SUBSCRIPTION USES INDEX MODIFIES MULTIPLE ROWS
thanks, fixed all
~~~
10.
I did not really understand the point of having special subscription names
tap_sub_rep_full_0
tap_sub_rep_full_2
tap_sub_rep_full_3
tap_sub_rep_full_4
etc...
Since you drop/recreate these for each test case can't they just be
called 'tap_sub_rep_full'?
There is no special reason for that, updated all to tap_sub_rep_full.
I think I initially made it in order to distinguish certain error messages in the tests, but then we already have unique messages regardless of the subscription name.
~~~
11. SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS
+# updates 200 rows
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");
The comment says update but this is doing delete
fixed
~~~
12. SUBSCRIPTION USES INDEX WITH DROPPED COLUMNS
+# cleanup sub
+$node_subscriber->safe_psql('postgres',
+ "DROP SUBSCRIPTION tap_sub_rep_full_4");
Unusual wrapping?
Fixed
~~~
13. SUBSCRIPTION USES INDEX ON PARTITIONED TABLES
+# updates rows and moves between partitions
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM users_table_part WHERE user_id = 1 and value_1 = 1;");
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM users_table_part WHERE user_id = 12 and value_1 = 12;");
The comment says update but SQL says delete
fixed
~~~
14. SUBSCRIPTION CAN USE INDEXES WITH EXPRESSIONS AND COLUMNS
+# update 1 row and delete 1 row using index_b, so index_a still has 2 idx_scan
+$node_subscriber->poll_query_until(
+ 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
indexrelname = 'index_a';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full_0 updates two rows via index scan with index on high
cardinality column-3";
+
The comment seems misplaced. Doesn't it belong on the lines above this
where the update/delete is being done?
Yes, it seems so. moved
~~~
15. SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE - INHERITED TABLE
+# ANALYZING child will change the index used on child_1 and going to
use index_on_child_1_b
+$node_subscriber->safe_psql('postgres', "ANALYZE child_1");
Should the comment say 'child_1' instead of child?
------
Seems better, changed.
Thanks for the reviews, attached v12.
Onder Kalaci
Attachment
pgsql-hackers by date: