RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From kuroda.hayato@fujitsu.com
Subject RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id TYAPR01MB58663E624B35CD2C5BEC0595F5549@TYAPR01MB5866.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
Responses Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
List pgsql-hackers
Dear Önder:

Thank you for updating patch! 
Your documentation seems OK, and I could not find any other places to be added

Followings are my comments.

====
01 relation.c - general

Many files are newly included.
I was not sure but some codes related with planner may be able to move to src/backend/optimizer/plan.
How do you and any other one think?

02 relation.c - FindLogicalRepUsableIndex

```
+/*
+ * Returns an index oid if we can use an index for the apply side. If not,
+ * returns InvalidOid.
+ */
+static Oid
+FindLogicalRepUsableIndex(Relation localrel, LogicalRepRelation *remoterel)
```

I grepped files, but I cannot find the word "apply side". How about "subscriber" instead?

03 relation.c - FindLogicalRepUsableIndex

```
+       /* Simple case, we already have an identity or pkey */
+       idxoid = GetRelationIdentityOrPK(localrel);
+       if (OidIsValid(idxoid))
+               return idxoid;
+
+       /*
+        * If index scans are disabled, use a sequential scan.
+        *
+        * Note that we still allowed index scans above when there is a primary
+        * key or unique index replica identity, but that is the legacy behaviour
+        * (even when enable_indexscan is false), so we hesitate to move this
+        * enable_indexscan check to be done earlier in this function.
+        */
+       if (!enable_indexscan)
+               return InvalidOid;
```

a. 
I think "identity or pkey" should be "replica identity key or primary key" or "RI or PK"

b. 
Later part should be at around GetRelationIdentityOrPK.


04 relation.c - FindUsableIndexForReplicaIdentityFull

```
+       MemoryContext usableIndexContext;
...
+       usableIndexContext = AllocSetContextCreate(CurrentMemoryContext,
+                                                                                          "usableIndexContext",
+                                                                                          ALLOCSET_DEFAULT_SIZES);
```

I grepped other sources, and I found that the name like "tmpcxt" is used for the temporary MemoryContext.

05 relation.c - SuitablePathsForRepIdentFull

```
+                       indexRelation = index_open(idxoid, AccessShareLock);
+                       indexInfo = BuildIndexInfo(indexRelation);
+                       is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
+                       is_partial = (indexInfo->ii_Predicate != NIL);
+                       is_only_on_expression = IndexOnlyOnExpression(indexInfo);
+                       index_close(indexRelation, NoLock);
```

Why the index is closed with NoLock? AccessShareLock is acquired, so shouldn't same lock be released?


06 relation.c - GetCheapestReplicaIdentityFullPath

IIUC a query like "SELECT tbl WHERE attr1 = $1 AND attr2 = $2 ... AND attrN = $N" is emulated, right?
you can write explicitly it as comment

07 relation.c - GetCheapestReplicaIdentityFullPath

```
+               Path       *path = (Path *) lfirst(lc);
+               Oid                     idxoid = GetIndexOidFromPath(path);
+
+               if (!OidIsValid(idxoid))
+               {
+                       /* Unrelated Path, skip */
+                       suitableIndexList = lappend(suitableIndexList, path);
+               }
```

I was not clear about here. IIUC in the function we want to extract "good" scan plan and based on that the cheapest one
ischosen. 
 
GetIndexOidFromPath() seems to return InvalidOid when the input path is not index scan, so why is it appended to the
suitablelist?
 


===
08 worker.c - usable_indexoid_internal

I think this is not "internal" function, such name should be used for like "apply_handle_commit" -
"apply_handle_commit_internal",or "apply_handle_insert" - "apply_handle_insert_internal".
 
How about "get_usable_index" or something?

09 worker.c - usable_indexoid_internal

```
+       Oid                     targetrelid = targetResultRelInfo->ri_RelationDesc->rd_rel->oid;
+       Oid                     localrelid = relinfo->ri_RelationDesc->rd_id;
+
+       if (targetrelid != localrelid)
```

I think these lines are very confusable.
IIUC targetrelid is corresponded to the "parent", and localrelid is corresponded to the "child", right?
How about changing name to "partitionedoid" and "leadoid" or something?

===
10 032_subscribe_use_index.pl

```
# create tables pub and sub
$node_publisher->safe_psql('postgres',
    "CREATE TABLE test_replica_id_full (x int)");
$node_publisher->safe_psql('postgres',
    "ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;");
$node_subscriber->safe_psql('postgres',
    "CREATE TABLE test_replica_id_full (x int)");
$node_subscriber->safe_psql('postgres',
    "CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x)");
```

In many places same table is defined, altered as "REPLICA IDENTITY FULL", and index is created.
Could you combine them into function?

11 032_subscribe_use_index.pl

```
# wait until the index is used on the subscriber
$node_subscriber->poll_query_until(
    'postgres', q{select (idx_scan = 1) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full_0 updates one row via index";
```

In many places this check is done. Could you combine them into function?

12 032_subscribe_use_index.pl

```
# create pub/sub
$node_publisher->safe_psql('postgres',
    "CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full");
$node_subscriber->safe_psql('postgres',
    "CREATE SUBSCRIPTION tap_sub_rep_full CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION
tap_pub_rep_full"
);
```

Same as above

13 032_subscribe_use_index.pl

```
# cleanup pub
$node_publisher->safe_psql('postgres', "DROP PUBLICATION tap_pub_rep_full");
$node_publisher->safe_psql('postgres', "DROP TABLE test_replica_id_full");
# cleanup sub
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_rep_full");
$node_subscriber->safe_psql('postgres', "DROP TABLE test_replica_id_full");
```

Same as above

14 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX

```
# make sure that the subscriber has the correct data
my $result = $node_subscriber->safe_psql('postgres',
    "SELECT sum(x) FROM test_replica_id_full");
is($result, qq(212), 'ensure subscriber has the correct data at the end of the test');

$node_subscriber->poll_query_until(
    'postgres', q{select sum(x)=212 AND count(*)=21 AND count(DISTINCT x)=20 FROM test_replica_id_full;}
) or die "ensure subscriber has the correct data at the end of the test";
```

I think first one is not needed.


15 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS

```
# insert some initial data within the range 0-20
$node_publisher->safe_psql('postgres',
    "INSERT INTO test_replica_id_full SELECT i%20 FROM generate_series(0,1000)i;"
);
```

I think data is within the range 0-19.
(There are some mistakes)

===
16 test/subscription/meson.build

Your test 't/032_subscribe_use_index.pl' must be added in the 'tests' for meson build system.
(I checked on my env, and your test works well)

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Extend win32 error codes to errno mapping in win32error.c
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: A doubt about a newly added errdetail