Home > mailing lists

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From	Önder Kalacı
Subject	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date	September 1, 2022 06:23:11
Msg-id	CACawEhVLfW7dxW2WLz5wsChnuDH=OPfpsab+EL=1=Qnjxvut_A@mail.gmail.com Whole thread
In response to	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher (Peter Smith <smithpb2250@gmail.com>)
List	pgsql-hackers

Tree view

Hi again,

======

1. Commit message

1a.
With this patch, I'm proposing the following change: If there is an
index on the subscriber, use the index as long as the planner
sub-modules picks any index over sequential scan. The index should be
a btree index, not a partital index. Finally, the index should have at
least one column reference (e.g., cannot consists of only
expressions).

SUGGESTION
With this patch, I'm proposing the following change: If there is any
index on the subscriber, let the planner sub-modules compare the costs
of index versus sequential scan and choose the cheapest. The index
should be a btree index, not a partial index, and it should have at
least one column reference (e.g., cannot consist of only expressions).

makes sense.

~

1b.
The Majority of the logic on the subscriber side exists in the code.

"exists" -> "already exists"

fixed

~

1c.
psql -c "truncate pgbench_accounts;" -p 9700 postgres

"truncate" -> "TRUNCATE"

fixed

~

1d.
Try to wrap this message text at 80 char width.

fixed

======

2. src/backend/replication/logical/relation.c - logicalrep_rel_open

+ /*
+ * Finding a usable index is an infrequent task. It is performed
+ * when an operation is first performed on the relation, or after
+ * invalidation of the relation cache entry (e.g., such as ANALYZE).
+ */
+ entry->usableIndexOid = LogicalRepUsableIndex(entry->localrel, remoterel);

Seemed a bit odd to say "performed" 2x in the same sentence.

"It is performed when..." -> "It occurs when...” (?)

fixed

~~~

3. src/backend/replication/logical/relation.c - logicalrep_partition_open

+ /*
+ * Finding a usable index is an infrequent task. It is performed
+ * when an operation is first performed on the relation, or after
+ * invalidation of the relation cache entry (e.g., such as ANALYZE).
+ */
+ part_entry->relmapentry.usableIndexOid =
+ LogicalRepUsableIndex(partrel, remoterel);

3a.
Same as comment #2 above.

done

~

3b.
The jumping between 'part_entry' and 'entry' is confusing. Since
'entry' is already assigned to be &part_entry->relmapentry can't you
use that here?

SUGGESTION
entry->usableIndexOid = LogicalRepUsableIndex(partrel, remoterel);

Yes, sure it makes sense.

~~~

4. src/backend/replication/logical/relation.c - GetIndexOidFromPath

+/*
+ * Returns a valid index oid if the input path is an index path.
+ * Otherwise, return invalid oid.
+ */
+static Oid
+GetIndexOidFromPath(Path *path)

Perhaps may this function comment more consistent with others (like
GetRelationIdentityOrPK, LogicalRepUsableIndex) and refer to the
InvalidOid.

SUGGESTION
/*
* Returns a valid index oid if the input path is an index path.
*
* Otherwise, returns InvalidOid.
*/

sounds good

~~~

5. src/backend/replication/logical/relation.c - IndexOnlyOnExpression

+bool
+IndexOnlyOnExpression(IndexInfo *indexInfo)
+{
+ int i;
+ for (i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
+ {
+ AttrNumber attnum = indexInfo->ii_IndexAttrNumbers[i];
+ if (AttributeNumberIsValid(attnum))
+ return false;
+ }
+
+ return true;
+}

5a.
Add a blank line after those declarations.

Done, also went over all the functions and ensured we don't have this anymore

~

5b.
AFAIK the C99 style for loop declarations should be OK [1] for new
code, so declaring like below would be cleaner:

for (int i = 0; ...

Done

~~~

6. src/backend/replication/logical/relation.c -
FilterOutNotSuitablePathsForReplIdentFull

+/*
+ * Iterates over the input path list and returns another path list
+ * where paths with non-btree indexes, partial indexes or
+ * indexes on only expressions are eliminated from the list.
+ */
+static List *
+FilterOutNotSuitablePathsForReplIdentFull(List *pathlist)

"are eliminated from the list." -> "have been removed."

Done

~~~

7.

+ foreach(lc, pathlist)
+ {
+ Path *path = (Path *) lfirst(lc);
+ Oid indexOid = GetIndexOidFromPath(path);
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ bool is_btree;
+ bool is_partial;
+ bool is_only_on_expression;
+
+ if (!OidIsValid(indexOid))
+ {
+ /* Unrelated Path, skip */
+ suitableIndexList = lappend(suitableIndexList, path);
+ }
+ else
+ {
+ indexRelation = index_open(indexOid, AccessShareLock);
+ indexInfo = BuildIndexInfo(indexRelation);
+ is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
+ is_partial = (indexInfo->ii_Predicate != NIL);
+ is_only_on_expression = IndexOnlyOnExpression(indexInfo);
+ index_close(indexRelation, NoLock);
+
+ if (is_btree && !is_partial && !is_only_on_expression)
+ suitableIndexList = lappend(suitableIndexList, path);
+ }
+ }

I think most of those variables are only used in the "else" block so
maybe it's better to declare them at that scope.

+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ bool is_btree;
+ bool is_partial;
+ bool is_only_on_expression;

Makes sense

~~~

8. src/backend/replication/logical/relation.c -
GetCheapestReplicaIdentityFullPath

+ * Indexes that consists of only expressions (e.g.,
+ * no simple column references on the index) are also
+ * eliminated with a similar reasoning.

"consists" -> "consist"

"with a similar reasoning" -> "with similar reasoning"

fixed

~~~

9.

+ * We also eliminate non-btree indexes, which could be relaxed
+ * if needed. If we allow non-btree indexes, we should adjust
+ * RelationFindReplTupleByIndex() to support such indexes.

This looks like another of those kinds of comments that should have
"XXX" prefix as a note to the future.

added

~~~

10. src/backend/replication/logical/relation.c -
FindUsableIndexForReplicaIdentityFull

+/*
+ * Returns an index oid if the planner submodules picks index scans
+ * over sequential scan.

10a
"picks" -> "pick"

done

~

10b.
Maybe this should also say ", otherwise returns InvalidOid" (?)

Makes sense, added similar to above suggestion

~~~

11.

+FindUsableIndexForReplicaIdentityFull(Relation localrel)
+{
+ MemoryContext usableIndexContext;
+ MemoryContext oldctx;
+ Path *cheapest_total_path;
+ Oid indexOid;

In the following function, and in the one after that, you've named the
index Oid as 'idxoid' (not 'indexOid'). IMO it's better to use
consistent naming everywhere.

Ok, existing functions use idxoid, switched to that.

~~~

12. src/backend/replication/logical/relation.c - GetRelationIdentityOrPK

12a.
I wondered what is the benefit of having this function. IIUC it is
only called from one place (LogicalRepUsableIndex) and IMO the code
would probably be easier if you just inline this logic in that
function...

I just moved that from src/backend/replication/logical/worker.c, so probably better not to remove it in this patch?

Tbh, I like the simplicity it provides.

~

12b.
+/*
+ * Get replica identity index or if it is not defined a primary key.
+ *
+ * If neither is defined, returns InvalidOid
+ */

If you want to keep the function for some reason (e.g. see #12a) then
I thought the function comment could be better.

SUGGESTION
/*
* Returns OID of the relation's replica identity index, or OID of the
* relation's primary key index.
*
* If neither is defined, returns InvalidOid.
*/

As I noted, I just moved this function. So, left as-is for now.

~~~

13. src/backend/replication/logical/relation.c - LogicalRepUsableIndex

For some reason, I feel this function should be called
FindLogicalRepUsableIndex (or similar), because it seems more
consistent with the others which might return the Oid or might return
InvalidOid...

Makes sense, changed

~~~

14.

+ /*
+ * Index scans are disabled, use sequential scan. Note that we do allow
+ * index scans when there is a primary key or unique index replica
+ * identity. That is the legacy behavior so we hesitate to move this check
+ * above.
+ */

Perhaps a slight rephrasing of that comment?

SUGGESTION
If index scans are disabled, use a sequential scan.

Note that we still allowed index scans above when there is a primary
key or unique index replica identity, but that is the legacy behaviour
(even when enable_indexscan is false), so we hesitate to move this
enable_indexscan check to be done earlier in this function.

~~~

Sounds good, changed

15.

+ * If we had a primary key or relation identity with a unique index,
+ * we would have already found a valid oid. At this point, the remote
+ * relation has replica identity full and we have at least one local
+ * index defined.

"would have already found a valid oid." -> "would have already found
and returned that oid."

Done

======

16. src/backend/replication/logical/worker.c - usable_indexoid_internal

+/*
+ * Decide whether we can pick an index for the relinfo (e.g., the relation)
+ * we're actually deleting/updating from. If it is a child partition of
+ * edata->targetRelInfo, find the index on the partition.
+ *
+ * Note that if the corresponding relmapentry has InvalidOid usableIndexOid,
+ * the function returns InvalidOid. In that case, the tuple is used via
+ * sequential execution.
+ */
+static Oid
+usable_indexoid_internal(ApplyExecutionData *edata, ResultRelInfo *relinfo)

I am not sure this is the right place to be saying that last sentence
("In that case, the tuple is used via sequential execution.") because
it's up to the *calling* code to decide what to do if InvalidOid is
returned

Right, for now this is true, but could change in the future. Removed.

======

17. src/include/replication/logicalrelation.h

@ -31,20 +32,40 @@ typedef struct LogicalRepRelMapEntry
Relation localrel; /* relcache entry (NULL when closed) */
AttrMap *attrmap; /* map of local attributes to remote ones */
bool updatable; /* Can apply updates/deletes? */
+ Oid usableIndexOid; /* which index to use? (Invalid when no index
+ * used) */

SUGGESTION (for the comment)
which index to use, or InvalidOid if none

makes sense

~~~

18.

+/*
+ * Partition map (LogicalRepPartMap)
+ *
+ * When a partitioned table is used as replication target, replicated
+ * operations are actually performed on its leaf partitions, which requires
+ * the partitions to also be mapped to the remote relation. Parent's entry
+ * (LogicalRepRelMapEntry) cannot be used as-is for all partitions, because
+ * individual partitions may have different attribute numbers, which means
+ * attribute mappings to remote relation's attributes must be maintained
+ * separately for each partition.
+ */
+typedef struct LogicalRepPartMapEntry

Something feels not quite right using the (unchanged) comment about
the Partition map which was removed from where it was originally in
relation.c.

The reason I am unsure is that this comment is still referring to the
"LogicalRepPartMap", which is not here but is declared static in
relation.c. Maybe the quick/easy fix would be to just change the first
line to say: "Partition map (see LogicalRepPartMap in relation.c)".
OTOH, I'm not sure if some part of this comment still needs to be left
in relation.c (??)

Hmm, I agree that we need some extra comments pointing where this is used (I followed something similar to your suggestion).

However, I also think that it is nicer to keep this comment here because that seems more common in the code-base that the comments are on the MapEntry, not on the Map itself, no?

Thanks,

Onder

Attachment

v10_0001_use_index_on_subs_when_pub_rep_ident_full.patch

pgsql-hackers by date:

From: Önder Kalacı
Date: 01 September 2022, 06:23:00
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Junwang Zhao
Date: 01 September 2022, 06:38:01
Subject: Re: [PATCH v1] [doc] polish the comments of reloptions

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

Attachment

Previous

Next