RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From kuroda.hayato@fujitsu.com
Subject RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id TYAPR01MB58664E26DD2DE6F2273BC940F55C9@TYAPR01MB5866.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
Responses Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
List pgsql-hackers
Dear Önder,

Thank you for updating the patch! At first I replied to your comments.

> My thinking on those functions is that they should probably stay
> in src/backend/replication/logical/relation.c. My main motivation is that
> those functions are so much tailored to the purposes of this file that I
> cannot see any use-case for these functions in any other context.

I was not sure what should be, but I agreed that functions will be not used from other parts.

> Hmm, I cannot follow this comment. Can you please clarify?

In your patch:

```
+       /* Simple case, we already have a primary key or a replica identity index */
+       idxoid = GetRelationIdentityOrPK(localrel);
+       if (OidIsValid(idxoid))
+               return idxoid;
+
+       /*
+        * If index scans are disabled, use a sequential scan.
+        *
+        * Note that we still allowed index scans above when there is a primary
+        * key or unique index replica identity, but that is the legacy behaviour
+        * (even when enable_indexscan is false), so we hesitate to move this
+        * enable_indexscan check to be done earlier in this function.
+        */ 
```

And the paragraph " Note that we..." should be at above of GetRelationIdentityOrPK().
Future readers will read the function from top to bottom,
and when they read around GetRelationIdentityOrPK() they may be confused.

> So, I think it is better to have specific names, no?

OK.

> The inlined comment in the function has a similar comment. Is that clear
> enough?
> /* * Generate restrictions for all columns in the form of col_1 = $1 AND *
> col_2 = $2 ... */

Actually I missed it, but I still think that whole of emulated SQL should be clarified. 

> Though, I agree that we can improve the code a bit. I now
> use targetrelkind and dropped localrelid to check whether the target is a
> partitioned table. Is this better?

Great improvement. Genus!

> Well, I'm not sure if it is worth the complexity. There are only 4 usages
> of the same table, and these are all pretty simple statements, and all
> other tests seem to have a similar pattern. I have not seen any tests where
> these simple statements are done in a function even if they are repeated.
> I'd rather keep it so that this doesn't lead to other style discussions?

If other tests do not combine such parts, it's OK.
My motivation of these comments were to reduce the number of row for the test code.

> Oh, I didn't know about this, thanks!

Now meson test system do your test. OK.


And followings are the comments for v14. They are mainly about comments.

===
01. relation.c - logicalrep_rel_open

```
+               /*
+                * Finding a usable index is an infrequent task. It occurs when an
+                * operation is first performed on the relation, or after invalidation
+                * of the relation cache entry (e.g., such as ANALYZE).
+                */
+               entry->usableIndexOid = FindLogicalRepUsableIndex(entry->localrel, remoterel);
```

I thought you can mention CREATE INDEX in the comment.

According to your analysis [1] the relation cache will be invalidated if users do CREATE INDEX
At that time the hash entry will be removed (logicalrep_relmap_invalidate_cb) and "usable" index
will be checked again.

~~~
02. relation.c - logicalrep_partition_open

```
+       /*
+        * Finding a usable index is an infrequent task. It occurs when an
+        * operation is first performed on the relation, or after invalidation of
+        * the relation cache entry (e.g., such as ANALYZE).
+        */
+       entry->usableIndexOid = FindLogicalRepUsableIndex(partrel, remoterel);
+
```

Same as above

~~~
03. relation.c - GetIndexOidFromPath

```
+       if (path->pathtype == T_IndexScan || path->pathtype == T_IndexOnlyScan)
+       {
+               IndexPath  *index_sc = (IndexPath *) path;
+
+               return index_sc->indexinfo->indexoid;
+       }
```

I thought Assert(OidIsValid(indexoid)) may be added here. Or is it quite trivial?

~~~
04. relation.c - IndexOnlyOnExpression

This method just returns "yes" or "no", so the name of method should be start "Has" or "Is".

~~~
05. relation.c - SuitablePathsForRepIdentFull

```
+/*
+ * Iterates over the input path list and returns another
+ * path list that includes index [only] scans where paths
+ * with non-btree indexes, partial indexes or
+ * indexes on only expressions have been removed.
+ */
```

These lines seems to be around 60 columns. Could you expand around 80?

~~~
06. relation.c - SuitablePathsForRepIdentFull

```
+                       Relation        indexRelation;
+                       IndexInfo  *indexInfo;
+                       bool            is_btree;
+                       bool            is_partial;
+                       bool            is_only_on_expression;
+
+                       indexRelation = index_open(idxoid, AccessShareLock);
+                       indexInfo = BuildIndexInfo(indexRelation);
+                       is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
+                       is_partial = (indexInfo->ii_Predicate != NIL);
+                       is_only_on_expression = IndexOnlyOnExpression(indexInfo);
+                       index_close(indexRelation, AccessShareLock);
+
+                       if (is_btree && !is_partial && !is_only_on_expression)
+                               suitableIndexList = lappend(suitableIndexList, path);
```

Please add a comment like "eliminating not suitable path" or something.

~~~
07. relation.c - GenerateDummySelectPlannerInfoForRelation

```
+/*
+ * This is not a generic function. It is a helper function
+ * for GetCheapestReplicaIdentityFullPath. The function
+ * creates a dummy PlannerInfo for the given relationId
+ * as if the relation is queried with SELECT command.
+ */
```

These lines seems to be around 60 columns. Could you expand around 80?

~~~
08. relation.c - FindLogicalRepUsableIndex

```
+/*
+ * Returns an index oid if we can use an index for subscriber . If not,
+ * returns InvalidOid.
+ */
```

"subscriber ." should be "subscriber.", blank is not needed.

~~~
09. worker.c - get_usable_indexoid

```
+               Assert(targetResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
+                          RELKIND_PARTITIONED_TABLE);
```

I thought this assertion seems to be not needed, because this is completely same as the condition of if-statement.

[1] https://www.postgresql.org/message-id/CACawEhXgP_Kj_1iyNAp16MYos4Anrtz%2BOZVtj2z-QOPGdPCt_A%40mail.gmail.com


Best Regards,
Hayato Kuroda
FUJITSU LIMITED


pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
Next
From: Junwang Zhao
Date:
Subject: [PATCH v1] [meson] fix some typo to make it more readable