Home > mailing lists

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From	Peter Smith
Subject	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date	March 8, 2023 06:39:26
Msg-id	CAHut+Pvbo-AL9x4GdXGaTnq=_xUOeaDvyTuk6sHhXZE8v+mZGA@mail.gmail.com Whole thread Raw
In response to	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher (Önder Kalacı <onderkalaci@gmail.com>)
Responses	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
List	pgsql-hackers

Tree view

Here are some review comments for v35-0001

======
General

1.
Saying the index "should" or "should not" do this or that sounds like
it is still OK but just not recommended. TO remove this ambigity IMO
most of the "should" ought to be changed to "must" because IIUC this
patch will simply not consider indexes which do not obey all your
rules.

This comment applies to a few places (search for "should")

e.g.1. - Commit Message
e.g.2. - /* There should always be at least one attribute for the index scan. */
e.g.3. - The function comment for
FindUsableIndexForReplicaIdentityFull(Relation localrel)

======
doc/src/sgml/logical-replication.sgml

2.
A published table must have a “replica identity” configured in order
to be able to replicate UPDATE and DELETE operations, so that
appropriate rows to update or delete can be identified on the
subscriber side. By default, this is the primary key, if there is one.
Another unique index (with certain additional requirements) can also
be set to be the replica identity. If the table does not have any
suitable key, then it can be set to replica identity “full”, which
means the entire row becomes the key. When replica identity “full” is
specified, indexes can be used on the subscriber side for searching
the rows. Candidate indexes must be btree, non-partial, and have at
least one column reference (i.e. cannot consist of only expressions).
These restrictions on the non-unique index properties adheres some of
the restrictions that are enforced for primary keys. Internally, we
follow a similar approach for supporting index scans within logical
replication scope. If there are no such suitable indexes, the search
on the subscriber side can be very inefficient, therefore replica
identity “full” should only be used as a fallback if no other solution
is possible. If a replica identity other than “full” is set on the
publisher side, a replica identity comprising the same or fewer
columns must also be set on the subscriber side. See REPLICA IDENTITY
for details on how to set the replica identity. If a table without a
replica identity is added to a publication that replicates UPDATE or
DELETE operations then subsequent UPDATE or DELETE operations will
cause an error on the publisher. INSERT operations can proceed
regardless of any replica identity.

~

"adheres some of" --> "adhere to some of" ?

======
src/backend/executor/execReplication.c

3. build_replindex_scan_key

  {
  Oid operator;
  Oid opfamily;
  RegProcedure regop;
- int pkattno = attoff + 1;
- int mainattno = indkey->values[attoff];
- Oid optype = get_opclass_input_type(opclass->values[attoff]);
+ int table_attno = indkey->values[index_attoff];
+ Oid optype = get_opclass_input_type(opclass->values[index_attoff]);

These variable declarations might look tidier if you kept all the Oids together.

======
src/backend/replication/logical/relation.c

4. logicalrep_partition_open

+ /*
+ * Finding a usable index is an infrequent task. It occurs when an
+ * operation is first performed on the relation, or after invalidation of
+ * the relation cache entry (such as ANALYZE or CREATE/DROP index on the
+ * relation).
+ *
+ * We also prefer to run this code on the oldctx such that we do not
+ * leak anything in the LogicalRepPartMapContext (hence
+ * CacheMemoryContext).
+ */
+ entry->localindexoid = FindLogicalRepLocalIndex(partrel, remoterel)

"such that" --> "so that" ?

~~~

5. IsIndexUsableForReplicaIdentityFull

+bool
+IsIndexUsableForReplicaIdentityFull(IndexInfo *indexInfo)
+{
+ bool is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
+ bool is_partial = (indexInfo->ii_Predicate != NIL);
+ bool is_only_on_expression = IsIndexOnlyOnExpression(indexInfo);
+
+ if (is_btree && !is_partial && !is_only_on_expression)
+ {
+ return true;
+ }
+
+ return false;
+}

SUGGESTION (no need for 2 returns here)
return is_btree && !is_partial && !is_only_on_expression;

======
src/backend/replication/logical/worker.c

6. FindReplTupleInLocalRel

static bool
FindReplTupleInLocalRel(EState *estate, Relation localrel,
LogicalRepRelation *remoterel,
Oid localidxoid,
TupleTableSlot *remoteslot,
TupleTableSlot **localslot)
{
bool found;

/*
* Regardless of the top-level operation, we're performing a read here, so
* check for SELECT privileges.
*/
TargetPrivilegesCheck(localrel, ACL_SELECT);

*localslot = table_slot_create(localrel, &estate->es_tupleTable);

Assert(OidIsValid(localidxoid) ||
   (remoterel->replident == REPLICA_IDENTITY_FULL));

if (OidIsValid(localidxoid))
found = RelationFindReplTupleByIndex(localrel, localidxoid,
LockTupleExclusive,
remoteslot, *localslot);
else
found = RelationFindReplTupleSeq(localrel, LockTupleExclusive,
remoteslot, *localslot);

return found;
}

~

Since that 'found' variable is not used, you might as well remove the
if/else and simplify the code.

SUGGESTION
static bool
FindReplTupleInLocalRel(EState *estate, Relation localrel,
LogicalRepRelation *remoterel,
Oid localidxoid,
TupleTableSlot *remoteslot,
TupleTableSlot **localslot)
{
/*
* Regardless of the top-level operation, we're performing a read here, so
* check for SELECT privileges.
*/
TargetPrivilegesCheck(localrel, ACL_SELECT);

*localslot = table_slot_create(localrel, &estate->es_tupleTable);

Assert(OidIsValid(localidxoid) ||
   (remoterel->replident == REPLICA_IDENTITY_FULL));

if (OidIsValid(localidxoid))
return RelationFindReplTupleByIndex(localrel, localidxoid,
LockTupleExclusive,
remoteslot, *localslot);

return RelationFindReplTupleSeq(localrel, LockTupleExclusive,
remoteslot, *localslot);

~~~

7. apply_handle_tuple_routing

@@ -2890,6 +2877,7 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
  TupleConversionMap *map;
  MemoryContext oldctx;
  LogicalRepRelMapEntry *part_entry = NULL;
+
  AttrMap    *attrmap = NULL;

  /* ModifyTableState is needed for ExecFindPartition(). */
The added whitespace seems unrelated to this patch.


======
src/include/replication/logicalrelation.h

8.
@@ -31,6 +32,7 @@ typedef struct LogicalRepRelMapEntry
  Relation localrel; /* relcache entry (NULL when closed) */
  AttrMap    *attrmap; /* map of local attributes to remote ones */
  bool updatable; /* Can apply updates/deletes? */
+ Oid localindexoid; /* which index to use, or InvalidOid if none */

Indentation is not correct for that new member comment.


------
Kind Regards,
Peter Smith.
Fujitsu Australia

pgsql-hackers by date:

From: Amit Kapila
Date: 08 March 2023, 06:15:32
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: "Anton A. Melnikov"
Date: 08 March 2023, 06:43:47
Subject: Re: odd buildfarm failure - "pg_ctl: control file appears to be corrupt"

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

Previous

Next