Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From Önder Kalacı
Subject Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id CACawEhWA5P7PkW6GqLt-iBQposVxJZeu_=FNTQF-6ch6SkyZhw@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Peter Smith <smithpb2250@gmail.com>)
Responses Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
List pgsql-hackers
Hi Peter,

 

1. src/backend/executor/execReplication.c - build_replindex_scan_key

- * This is not generic routine, it expects the idxrel to be replication
- * identity of a rel and meet all limitations associated with that.
+ * This is not generic routine, it expects the idxrel to be an index
+ * that planner would choose if the searchslot includes all the columns
+ * (e.g., REPLICA IDENTITY FULL on the source).
  */
-static bool
+static int
 build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
  TupleTableSlot *searchslot)


(I know this is not caused by your patch but maybe fix it at the same time?)

"This is not generic routine, it expects..." -> "This is not a generic
routine - it expects..."


Fixed
 

2.

+ IndexInfo  *indexInfo PG_USED_FOR_ASSERTS_ONLY;
+
+ /*
+ * This attribute is an expression, and
+ * SuitablePathsForRepIdentFull() was called earlier while the
+ * index for subscriber is selected. There, the indexes comprising
+ * *only* expressions have already been eliminated.
+ *
+ * We sanity check this now.
+ */
+#ifdef USE_ASSERT_CHECKING
+ indexInfo = BuildIndexInfo(idxrel);
+ Assert(!IndexOnlyOnExpression(indexInfo));
+#endif

2a.
"while the index for subscriber is selected..." -> "when the index for
the subscriber was selected...”


fixed
 
~

2b.
Because there is only one declaration in this code block you could
simplify this a bit if you wanted to.

SUGGESTION
/*
 * This attribute is an expression, and
 * SuitablePathsForRepIdentFull() was called earlier while the
 * index for subscriber is selected. There, the indexes comprising
 * *only* expressions have already been eliminated.
 *
 * We sanity check this now.
 */
#ifdef USE_ASSERT_CHECKING
IndexInfo  *indexInfo = BuildIndexInfo(idxrel);
Assert(!IndexOnlyOnExpression(indexInfo));
#endif


Makes sense, no reason to declare above
 
~~~

3. src/backend/executor/execReplication.c - RelationFindReplTupleByIndex

+ /* Start an index scan. */
+ scan = index_beginscan(rel, idxrel, &snap, skey_attoff, 0);
 retry:
  found = false;

It might be better to have a blank line before that ‘retry’ label,
like in the original code.

agreed, fixed
 

======

4. src/backend/replication/logical/relation.c

+/* see LogicalRepPartMapEntry for details in logicalrelation.h */
 static HTAB *LogicalRepPartMap = NULL;

Personally, I'd word that something like:
"/* For LogicalRepPartMap details see LogicalRepPartMapEntry in
logicalrelation.h */"

but YMMV.

I also don't have any strong opinions on that, updated to your suggestion.
 

~~~

5. src/backend/replication/logical/relation.c -
GenerateDummySelectPlannerInfoForRelation

+/*
+ * This is not a generic function, helper function for
+ * GetCheapestReplicaIdentityFullPath. The function creates
+ * a dummy PlannerInfo for the given relationId as if the
+ * relation is queried with SELECT command.
+ */
+static PlannerInfo *
+GenerateDummySelectPlannerInfoForRelation(Oid relationId)

(mentioned this one in my previous post)

"This is not a generic function, helper function" -> "This is not a
generic function. It is a helper function"

Yes, applied. 
 

~~~

6. src/backend/replication/logical/relation.c -
GetCheapestReplicaIdentityFullPath

+/*
+ * Generate all the possible paths for the given subscriber relation,
+ * for the cases that the source relation is replicated via REPLICA
+ * IDENTITY FULL. The function returns the cheapest Path among the
+ * eligible paths, see SuitablePathsForRepIdentFull().
+ *
+ * The function guarantees to return a path, because it adds sequential
+ * scan path if needed.
+ *
+ * The function assumes that all the columns will be provided during
+ * the execution phase, given that REPLICA IDENTITY FULL guarantees
+ * that.
+ */
+static Path *
+GetCheapestReplicaIdentityFullPath(Relation localrel)


"for the cases that..." -> "for cases where..."


sounds good 
 
~~~

7.

+ /*
+ * Currently it is not possible for planner to pick a partial index or
+ * indexes only on expressions. We still want to be explicit and eliminate
+ * such paths proactively.

"for planner..." -> "for the planner..."

fixed 
 

======

8. .../subscription/t/032_subscribe_use_index.pl - general

8a.
(remove the 'we')
"# we wait until..." -> "# wait until..." X many occurrences

~

8b.
(remove the 'we')
"# we show that..." -> “# show that..." X many occurrences

Ok, removed all "we"s in the test 

~~~

9.

There is inconsistent wording for some of your test case start/end comments

9a.
e.g.
start: SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS
end: SUBSCRIPTION USES INDEX MODIFIES MULTIPLE ROWS

~

9b.
e.g.
start: SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS
end: SUBSCRIPTION USES INDEX MODIFIES MULTIPLE ROWS


thanks, fixed all

 
~~~

10.

I did not really understand the point of having special subscription names
tap_sub_rep_full_0
tap_sub_rep_full_2
tap_sub_rep_full_3
tap_sub_rep_full_4
etc...

Since you drop/recreate these for each test case can't they just be
called 'tap_sub_rep_full'?


There is no special reason for that, updated all to tap_sub_rep_full. 

I think I initially made it in order to distinguish certain error messages in the tests, but then we already have unique messages regardless of the subscription name.
 
~~~

11. SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS

+# updates 200 rows
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");

The comment says update but this is doing delete


fixed
 

~~~

12. SUBSCRIPTION USES INDEX WITH DROPPED COLUMNS

+# cleanup sub
+$node_subscriber->safe_psql('postgres',
+ "DROP SUBSCRIPTION tap_sub_rep_full_4");

Unusual wrapping?

Fixed 
 

~~~

13. SUBSCRIPTION USES INDEX ON PARTITIONED TABLES

+# updates rows and moves between partitions
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM users_table_part WHERE user_id = 1 and value_1 = 1;");
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM users_table_part WHERE user_id = 12 and value_1 = 12;");

The comment says update but SQL says delete


fixed
 
~~~

14. SUBSCRIPTION CAN USE INDEXES WITH EXPRESSIONS AND COLUMNS

+# update 1 row and delete 1 row using index_b, so index_a still has 2 idx_scan
+$node_subscriber->poll_query_until(
+ 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
indexrelname = 'index_a';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full_0 updates two rows via index scan with index on high
cardinality column-3";
+

The comment seems misplaced. Doesn't it belong on the lines above this
where the update/delete is being done?

 
Yes, it seems so. moved
 
~~~

15. SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE - INHERITED TABLE

+# ANALYZING child will change the index used on child_1 and going to
use index_on_child_1_b
+$node_subscriber->safe_psql('postgres', "ANALYZE child_1");

Should the comment say 'child_1' instead of child?

------

Seems better, changed.

Thanks for the reviews, attached v12.

Onder Kalaci

 
Attachment

pgsql-hackers by date:

Previous
From: Önder Kalacı
Date:
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Next
From: Michael Paquier
Date:
Subject: Re: pg_create_logical_replication_slot argument incongruency