Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher |
Date | |
Msg-id | CAHut+PvDhmnqz0mH8d4x_K_nYZ50b_fxh=R1C0a+1hNC5vJ8gg@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher (Önder Kalacı <onderkalaci@gmail.com>) |
List | pgsql-hackers |
Here are some review comments for patch v23. ====== General 1. IIUC the previous logic for checking "cost" comparisons and selecting the "cheapest" strategy is no longer present in the latest patch. In that case, I think there are some leftover stale comments that need changing. For example, 1a. Commit message: "let the planner sub-modules compare the costs of index versus sequential scan and choose the cheapest." ~ 1b. Commit message: "Finally, pick the cheapest `Path` among." ~ 1c. FindLogicalRepUsableIndex function: + * We are looking for one more opportunity for using an index. If + * there are any indexes defined on the local relation, try to pick + * the cheapest index. ====== doc/src/sgml/logical-replication.sgml If replica identity "full" is used, indexes can be used on the subscriber side for seaching the rows. The index should be btree, non-partial and have at least one column reference (e.g., should not consists of only expressions). If there are no suitable indexes, the search on the subscriber side is very inefficient and should only be used as a fallback if no other solution is possible 2a. Fixed typo "seaching", and minor rewording. SUGGESTION When replica identity "full" is specified, indexes can be used on the subscriber side for searching the rows. These indexes should be btree, non-partial and have at least one column reference (e.g., should not consist of only expressions). If there are no such suitable indexes, the search on the subscriber side can be very inefficient, therefore replica identity "full" should only be used as a fallback if no other solution is possible. ~ 2b. I know you are just following some existing text here, but IMO this should probably refer to replica identity <literal>FULL</literal> instead of "full". ====== src/backend/executor/execReplication.c 3. IdxIsRelationIdentityOrPK +/* + * Given a relation and OID of an index, returns true if + * the index is relation's primary key's index or + * relaton's replica identity index. + * + * Returns false otherwise. + */ +static bool +IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid) +{ + Assert(OidIsValid(idxoid)); + + if (RelationGetReplicaIndex(rel) == idxoid || + RelationGetPrimaryKeyIndex(rel) == idxoid) + return true; + + return false; 3a. Comment typo "relaton" ~ 3b. Code could be written using single like below if you wish (but see #2c) return RelationGetReplicaIndex(rel) == idxoid || RelationGetPrimaryKeyIndex(rel) == idxoid; ~ 3c. Actually, RelationGetReplicaIndex and RelationGetPrimaryKeyIndex implementations are very similar so it seemed inefficient to be calling both of them. IMO it might be better to just make a new relcache function IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid). This implementation will be similar to those others, but now you need only to call the workhorse RelationGetIndexList *one* time. ~~~ 4. RelationFindReplTupleByIndex bool found; + TypeCacheEntry **eq = NULL; /* only used when the index is not repl. ident + * or pkey */ + bool idxIsRelationIdentityOrPK; If you change the comment to say "RI" instead of "repl. Ident" then it can all fit on one line, which would be an improvement. ====== src/backend/replication/logical/relation.c 5. #include "replication/logicalrelation.h" #include "replication/worker_internal.h" +#include "optimizer/cost.h" #include "utils/inval.h" Can that #include be added in alphabetical order like the others or not? ~~~ 6. logicalrep_partition_open + /* + * Finding a usable index is an infrequent task. It occurs when an + * operation is first performed on the relation, or after invalidation of + * of the relation cache entry (such as ANALYZE or CREATE/DROP index on + * the relation). + */ + entry->usableIndexOid = FindLogicalRepUsableIndex(partrel, remoterel); + Typo "of of the relation" ~~~ 7. FindUsableIndexForReplicaIdentityFull +static Oid +FindUsableIndexForReplicaIdentityFull(Relation localrel) +{ + MemoryContext usableIndexContext; + MemoryContext oldctx; + Oid usableIndex; + Oid idxoid; + List *indexlist; + ListCell *lc; + Relation indexRelation; + IndexInfo *indexInfo; + bool is_btree; + bool is_partial; + bool is_only_on_expression; It looks like some of these variables are only used within the scope of the foreach loop, so I think that is where they should be declared. ~~~ 8. + usableIndex = InvalidOid; Might as well do that assignment at the declaration. ~~~ 9. FindLogicalRepUsableIndex + /* + * Simple case, we already have a primary key or a replica identity index. + * + * Note that we do not use index scans below when enable_indexscan is + * false. Allowing primary key or replica identity even when index scan is + * disabled is the legacy behaviour. So we hesitate to move the below + * enable_indexscan check to be done earlier in this function. + */ + idxoid = GetRelationIdentityOrPK(localrel); + if (OidIsValid(idxoid)) + return idxoid; + + /* If index scans are disabled, use a sequential scan */ + if (!enable_indexscan) + return InvalidOid; ~ IMO that "Note" really belongs with the if (!enable)indexscan) more like this: SUGGESTION /* * Simple case, we already have a primary key or a replica identity index. */ idxoid = GetRelationIdentityOrPK(localrel); if (OidIsValid(idxoid)) return idxoid; /* * If index scans are disabled, use a sequential scan. * * Note we hesitate to move this check to earlier in this function * because allowing primary key or replica identity even when index scan * is disabled is the legacy behaviour. */ if (!enable_indexscan) return InvalidOid; ====== src/backend/replication/logical/worker.c 10. get_usable_indexoid +/* + * Decide whether we can pick an index for the relinfo (e.g., the relation) + * we're actually deleting/updating from. If it is a child partition of + * edata->targetRelInfo, find the index on the partition. + * + * Note that if the corresponding relmapentry has invalid usableIndexOid, + * the function returns InvalidOid. + */ "(e.g., the relation)" --> "(i.e. the relation)" ------ Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: