Home > mailing lists

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From	Önder Kalacı
Subject	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date	October 7, 2022 11:54:00
Msg-id	CACawEhWUtLbhMF-AJyGOyw3L7Vs53gSBL-3kheG=TNS0PfkidA@mail.gmail.com Whole thread
In response to	RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher ("kuroda.hayato@fujitsu.com" <kuroda.hayato@fujitsu.com>)
Responses	RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
List	pgsql-hackers

Tree view

Hi Kurado Hayato,

In your patch:

```
+ /* Simple case, we already have a primary key or a replica identity index */
+ idxoid = GetRelationIdentityOrPK(localrel);
+ if (OidIsValid(idxoid))
+ return idxoid;
+
+ /*
+ * If index scans are disabled, use a sequential scan.
+ *
+ * Note that we still allowed index scans above when there is a primary
+ * key or unique index replica identity, but that is the legacy behaviour
+ * (even when enable_indexscan is false), so we hesitate to move this
+ * enable_indexscan check to be done earlier in this function.
+ */
```

And the paragraph " Note that we..." should be at above of GetRelationIdentityOrPK().
Future readers will read the function from top to bottom,
and when they read around GetRelationIdentityOrPK() they may be confused.

Ah, makes sense, now I applied your feedback (with some different wording).

> The inlined comment in the function has a similar comment. Is that clear
> enough?
> /* * Generate restrictions for all columns in the form of col_1 = $1 AND *
> col_2 = $2 ... */

Actually I missed it, but I still think that whole of emulated SQL should be clarified.

Alright, it makes sense. I added the emulated SQL to the function comment of GetCheapestReplicaIdentityFullPath.

And followings are the comments for v14. They are mainly about comments.

===
01. relation.c - logicalrep_rel_open

```
+ /*
+ * Finding a usable index is an infrequent task. It occurs when an
+ * operation is first performed on the relation, or after invalidation
+ * of the relation cache entry (e.g., such as ANALYZE).
+ */
+ entry->usableIndexOid = FindLogicalRepUsableIndex(entry->localrel, remoterel);
```

I thought you can mention CREATE INDEX in the comment.

According to your analysis [1] the relation cache will be invalidated if users do CREATE INDEX
At that time the hash entry will be removed (logicalrep_relmap_invalidate_cb) and "usable" index
will be checked again.

Yes, that is right. I think it makes sense to mention that as well. In fact, I also decided to add such a test.

I realized that all tests use ANALYZE for re-calculation of the index. Now, I added an explicit test that uses CREATE/DROP index to re-calculate the index.

see # Testcase start: SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP INDEX.

~~~
02. relation.c - logicalrep_partition_open

```
+ /*
+ * Finding a usable index is an infrequent task. It occurs when an
+ * operation is first performed on the relation, or after invalidation of
+ * the relation cache entry (e.g., such as ANALYZE).
+ */
+ entry->usableIndexOid = FindLogicalRepUsableIndex(partrel, remoterel);
+
```

Same as above

done

~~~
03. relation.c - GetIndexOidFromPath

```
+ if (path->pathtype == T_IndexScan || path->pathtype == T_IndexOnlyScan)
+ {
+ IndexPath *index_sc = (IndexPath *) path;
+
+ return index_sc->indexinfo->indexoid;
+ }
```

I thought Assert(OidIsValid(indexoid)) may be added here. Or is it quite trivial?

Looking at the PG code, I couldn't see any place that asserts the information. That seems like fundamental information that is never invalid.

Btw, even if it returns InvalidOid for some reason, we'd not be crashing. Only not able to use any indexes, fall back to seq. scan.

~~~
04. relation.c - IndexOnlyOnExpression

This method just returns "yes" or "no", so the name of method should be start "Has" or "Is".

Yes, it seems like that is a common convention.

~~~
05. relation.c - SuitablePathsForRepIdentFull

```
+/*
+ * Iterates over the input path list and returns another
+ * path list that includes index [only] scans where paths
+ * with non-btree indexes, partial indexes or
+ * indexes on only expressions have been removed.
+ */
```

These lines seems to be around 60 columns. Could you expand around 80?

done

~~~
06. relation.c - SuitablePathsForRepIdentFull

```
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ bool is_btree;
+ bool is_partial;
+ bool is_only_on_expression;
+
+ indexRelation = index_open(idxoid, AccessShareLock);
+ indexInfo = BuildIndexInfo(indexRelation);
+ is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
+ is_partial = (indexInfo->ii_Predicate != NIL);
+ is_only_on_expression = IndexOnlyOnExpression(indexInfo);
+ index_close(indexRelation, AccessShareLock);
+
+ if (is_btree && !is_partial && !is_only_on_expression)
+ suitableIndexList = lappend(suitableIndexList, path);
```

Please add a comment like "eliminating not suitable path" or something.

done

~~~
07. relation.c - GenerateDummySelectPlannerInfoForRelation

```
+/*
+ * This is not a generic function. It is a helper function
+ * for GetCheapestReplicaIdentityFullPath. The function
+ * creates a dummy PlannerInfo for the given relationId
+ * as if the relation is queried with SELECT command.
+ */
```

These lines seems to be around 60 columns. Could you expand around 80?

done

~~~
08. relation.c - FindLogicalRepUsableIndex

```
+/*
+ * Returns an index oid if we can use an index for subscriber . If not,
+ * returns InvalidOid.
+ */
```

"subscriber ." should be "subscriber.", blank is not needed.

fixed

~~~
09. worker.c - get_usable_indexoid

```
+ Assert(targetResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
+ RELKIND_PARTITIONED_TABLE);
```

I thought this assertion seems to be not needed, because this is completely same as the condition of if-statement.

Yes, the refactor we made in the previous iteration made this assertion obsolete as you noted.

Attached v15, thanks for the reviews.

Thanks,

Onder KALACI

Attachment

v15_0001_use_index_on_subs_when_pub_rep_ident_full.patch

pgsql-hackers by date:

From: Maxim Orlov
Date: 07 October 2022, 11:04:09
Subject: Re: Add 64-bit XIDs into PostgreSQL 15

From: Robert Haas
Date: 07 October 2022, 11:59:08
Subject: Re: use has_privs_of_role() for pg_hba.conf

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

Attachment

Previous

Next