Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From Peter Smith
Subject Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id CAHut+PtomDp=9UOkE+epQ6Tq00PCFnustRX3qZJWxC_5rZxcDg@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
Responses Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
List pgsql-hackers
Here are some review comments for the latest v10 patch.

(Mostly these are just nitpick wording/comments etc)

======

1. Commit message

It is often not feasible to use `REPLICA IDENTITY FULL` on the publication
because it leads to full table scan per tuple change on the subscription.
This makes `REPLICA IDENTITY FULL` impracticable -- probably other than
some small number of use cases.

~

The "often not feasible" part seems repeated by the "impracticable" part.

SUGGESTION
Using `REPLICA IDENTITY FULL` on the publication leads to a full table
scan per tuple change on the subscription. This makes `REPLICA
IDENTITY FULL` impracticable -- probably other than some small number
of use cases.

~~~

2.

The Majority of the logic on the subscriber side already exists in
the code.

"Majority" -> "majority"

~~~

3.

The ones familiar
with this part of the code could realize that the sequential scan
code on the subscriber already implements the `tuples_equal()`
function.

SUGGESTION
Anyone familiar with this part of the code might recognize that...

~~~

4.

In short, the changes on the subscriber is mostly
combining parts of (unique) index scan and sequential scan codes.

"is mostly" -> "are mostly"

~~~

5.

From the performance point of view, there are few things to note.

"are few" -> "are a few"

======

6. src/backend/executor/execReplication.c - build_replindex_scan_key

+static int
 build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
  TupleTableSlot *searchslot)
 {
- int attoff;
+ int index_attoff;
+ int scankey_attoff = 0;

Should it be called 'skey_attoff' for consistency with the param 'skey'?

~~~

7.

  Oid operator;
  Oid opfamily;
  RegProcedure regop;
- int pkattno = attoff + 1;
- int mainattno = indkey->values[attoff];
- Oid optype = get_opclass_input_type(opclass->values[attoff]);
+ int table_attno = indkey->values[index_attoff];
+ Oid optype = get_opclass_input_type(opclass->values[index_attoff]);

Maybe the 'optype' should be adjacent to the other Oid opXXX
declarations just to keep them all together?

~~~

8.

+ if (!AttributeNumberIsValid(table_attno))
+ {
+ IndexInfo *indexInfo PG_USED_FOR_ASSERTS_ONLY;
+
+ /*
+ * There are two cases to consider. First, if the index is a primary or
+ * unique key, we cannot have any indexes with expressions. So, at this
+ * point we are sure that the index we are dealing with is not these.
+ */
+ Assert(RelationGetReplicaIndex(rel) != RelationGetRelid(idxrel) &&
+    RelationGetPrimaryKeyIndex(rel) != RelationGetRelid(idxrel));
+
+ /*
+ * At this point, we are also sure that the index is not consisting
+ * of only expressions.
+ */
+#ifdef USE_ASSERT_CHECKING
+ indexInfo = BuildIndexInfo(idxrel);
+ Assert(!IndexOnlyOnExpression(indexInfo));
+#endif

I was a bit confused by the comment. IIUC the code has already called
the FilterOutNotSuitablePathsForReplIdentFull some point prior so all
the unwanted indexes are already filtered out. Therefore these
assertions are just for no reason, other than sanity checking that
fact, right? If my understand is correct perhaps a simpler single
comment is possible:

SUGGESTION (or something like this)
This attribute is an expression, however
FilterOutNotSuitablePathsForReplIdentFull was called earlier during
[...] and the indexes comprising only expressions have already been
eliminated. We sanity check this now. Furthermore, because primary key
and unique key indexes can't include expressions we also sanity check
the index is neither of those kinds.

~~~

9.
- return hasnulls;
+ /* We should always use at least one attribute for the index scan */
+ Assert (scankey_attoff > 0);

SUGGESTION
There should always be at least one attribute for the index scan.

~~~

10. src/backend/executor/execReplication.c - RelationFindReplTupleByIndex

ScanKeyData skey[INDEX_MAX_KEYS];
IndexScanDesc scan;
SnapshotData snap;
TransactionId xwait;
Relation idxrel;
bool found;
TypeCacheEntry **eq = NULL; /* only used when the index is not unique */
bool indisunique;
int scankey_attoff;

10a.
Should 'scankey_attoff' be called 'skey_attoff' for consistency with
the 'skey' array?

~

10b.
Also, it might be tidier to declare the 'skey_attoff' adjacent to the 'skey'.

======

11. src/backend/replication/logical/relation.c

For LogicalRepPartMap, I was wondering if it should keep a small
comment to xref back to the long comment which was moved to
logicalreplication.h

e.g.
/* Refer to the LogicalRepPartMapEntry comment in logicalrelation.h */

~~~

12. src/backend/replication/logical/relation.c - logicalrep_partition_open

+ /*
+ * Finding a usable index is an infrequent task. It occurs when
+ * an operation is first performed on the relation, or after
+ * invalidation of the relation cache entry (e.g., such as ANALYZE).
+ */
+ entry->usableIndexOid = FindLogicalRepUsableIndex(entry->localrel, remoterel);
  entry->localrelvalid = true;

Should there be a blank line between those assignments? (just for
consistency with the other code of this patch in a later function that
does exactly the same assignments).

~~~

13. src/backend/replication/logical/relation.c -
FilterOutNotSuitablePathsForReplIdentFull

Not sure about this function name. Maybe should be something like
'FilterOutUnsuitablePathsForReplIdentFull', or just
'SuitablePathsForReplIdentFull'

~~~

14.

+ else
+ {
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ bool is_btree;
+ bool is_partial;
+ bool is_only_on_expression;

Is that another var that could be renamed 'idxoid' like all the others?

~~~

15. src/backend/replication/logical/relation.c -
GetCheapestReplicaIdentityFullPath

+ typentry = lookup_type_cache(attr->atttypid,
+ TYPECACHE_EQ_OPR_FINFO);

Seems unnecessary wrapping.

~~~

15.

+ /*
+ * Currently it is not possible for planner to pick a
+ * partial index or indexes only on expressions. We
+ * still want to be explicit and eliminate such
+ * paths proactively.
...
...
+ */

This large comment seems unusually skinny. Needs pg_indent.

~~~

16. src/backend/replication/logical/worker.c - check_relation_updatable

@@ -1753,7 +1738,7 @@ check_relation_updatable(LogicalRepRelMapEntry *rel)
  * We are in error mode so it's fine this is somewhat slow. It's better to
  * give user correct error.
  */
- if (OidIsValid(GetRelationIdentityOrPK(rel->localrel)))
+ if (OidIsValid(rel->usableIndexOid))

The original comment about it being "somewhat slow" does not seem
relevant anymore because it is no longer calling a function in this
condition.

~~~

17. src/backend/replication/logical/worker.c - usable_indexoid_internal

+ relmapentry = &(part_entry->relmapentry);

The parentheses seem overkill, and code is not written like this
elsewhere in the same patch.

~~~

18. src/backend/replication/logical/worker.c - apply_handle_tuple_routing

@@ -2202,13 +2225,15 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
  * suitable partition.
  */
  {
+ LogicalRepRelMapEntry *entry = &part_entry->relmapentry;

I think elsewhere in the patch the same variable is called
'relmapentry' (which seems a bit better than just 'entry')

======

19. .../subscription/t/032_subscribe_use_index.pl

+# ANALYZING child will change the index used on child_1 and going to
use index_on_child_1_b
+$node_subscriber->safe_psql('postgres', "ANALYZE child_1");

19a.
"ANALYZING child" ? Should that be worded differently? There is
nothing named 'child' that I could see.

~

19b.
"and going to use" ? wording ? "which will be used for " ??

------
Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: remove_useless_groupby_columns is too enthusiastic
Next
From: Peter Smith
Date:
Subject: Re: why can't a table be part of the same publication as its schema