RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers
From | kuroda.hayato@fujitsu.com |
---|---|
Subject | RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher |
Date | |
Msg-id | TYAPR01MB58663E624B35CD2C5BEC0595F5549@TYAPR01MB5866.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher (Önder Kalacı <onderkalaci@gmail.com>) |
Responses |
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
|
List | pgsql-hackers |
Dear Önder: Thank you for updating patch! Your documentation seems OK, and I could not find any other places to be added Followings are my comments. ==== 01 relation.c - general Many files are newly included. I was not sure but some codes related with planner may be able to move to src/backend/optimizer/plan. How do you and any other one think? 02 relation.c - FindLogicalRepUsableIndex ``` +/* + * Returns an index oid if we can use an index for the apply side. If not, + * returns InvalidOid. + */ +static Oid +FindLogicalRepUsableIndex(Relation localrel, LogicalRepRelation *remoterel) ``` I grepped files, but I cannot find the word "apply side". How about "subscriber" instead? 03 relation.c - FindLogicalRepUsableIndex ``` + /* Simple case, we already have an identity or pkey */ + idxoid = GetRelationIdentityOrPK(localrel); + if (OidIsValid(idxoid)) + return idxoid; + + /* + * If index scans are disabled, use a sequential scan. + * + * Note that we still allowed index scans above when there is a primary + * key or unique index replica identity, but that is the legacy behaviour + * (even when enable_indexscan is false), so we hesitate to move this + * enable_indexscan check to be done earlier in this function. + */ + if (!enable_indexscan) + return InvalidOid; ``` a. I think "identity or pkey" should be "replica identity key or primary key" or "RI or PK" b. Later part should be at around GetRelationIdentityOrPK. 04 relation.c - FindUsableIndexForReplicaIdentityFull ``` + MemoryContext usableIndexContext; ... + usableIndexContext = AllocSetContextCreate(CurrentMemoryContext, + "usableIndexContext", + ALLOCSET_DEFAULT_SIZES); ``` I grepped other sources, and I found that the name like "tmpcxt" is used for the temporary MemoryContext. 05 relation.c - SuitablePathsForRepIdentFull ``` + indexRelation = index_open(idxoid, AccessShareLock); + indexInfo = BuildIndexInfo(indexRelation); + is_btree = (indexInfo->ii_Am == BTREE_AM_OID); + is_partial = (indexInfo->ii_Predicate != NIL); + is_only_on_expression = IndexOnlyOnExpression(indexInfo); + index_close(indexRelation, NoLock); ``` Why the index is closed with NoLock? AccessShareLock is acquired, so shouldn't same lock be released? 06 relation.c - GetCheapestReplicaIdentityFullPath IIUC a query like "SELECT tbl WHERE attr1 = $1 AND attr2 = $2 ... AND attrN = $N" is emulated, right? you can write explicitly it as comment 07 relation.c - GetCheapestReplicaIdentityFullPath ``` + Path *path = (Path *) lfirst(lc); + Oid idxoid = GetIndexOidFromPath(path); + + if (!OidIsValid(idxoid)) + { + /* Unrelated Path, skip */ + suitableIndexList = lappend(suitableIndexList, path); + } ``` I was not clear about here. IIUC in the function we want to extract "good" scan plan and based on that the cheapest one ischosen. GetIndexOidFromPath() seems to return InvalidOid when the input path is not index scan, so why is it appended to the suitablelist? === 08 worker.c - usable_indexoid_internal I think this is not "internal" function, such name should be used for like "apply_handle_commit" - "apply_handle_commit_internal",or "apply_handle_insert" - "apply_handle_insert_internal". How about "get_usable_index" or something? 09 worker.c - usable_indexoid_internal ``` + Oid targetrelid = targetResultRelInfo->ri_RelationDesc->rd_rel->oid; + Oid localrelid = relinfo->ri_RelationDesc->rd_id; + + if (targetrelid != localrelid) ``` I think these lines are very confusable. IIUC targetrelid is corresponded to the "parent", and localrelid is corresponded to the "child", right? How about changing name to "partitionedoid" and "leadoid" or something? === 10 032_subscribe_use_index.pl ``` # create tables pub and sub $node_publisher->safe_psql('postgres', "CREATE TABLE test_replica_id_full (x int)"); $node_publisher->safe_psql('postgres', "ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;"); $node_subscriber->safe_psql('postgres', "CREATE TABLE test_replica_id_full (x int)"); $node_subscriber->safe_psql('postgres', "CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x)"); ``` In many places same table is defined, altered as "REPLICA IDENTITY FULL", and index is created. Could you combine them into function? 11 032_subscribe_use_index.pl ``` # wait until the index is used on the subscriber $node_subscriber->poll_query_until( 'postgres', q{select (idx_scan = 1) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';} ) or die "Timed out while waiting for check subscriber tap_sub_rep_full_0 updates one row via index"; ``` In many places this check is done. Could you combine them into function? 12 032_subscribe_use_index.pl ``` # create pub/sub $node_publisher->safe_psql('postgres', "CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full"); $node_subscriber->safe_psql('postgres', "CREATE SUBSCRIPTION tap_sub_rep_full CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub_rep_full" ); ``` Same as above 13 032_subscribe_use_index.pl ``` # cleanup pub $node_publisher->safe_psql('postgres', "DROP PUBLICATION tap_pub_rep_full"); $node_publisher->safe_psql('postgres', "DROP TABLE test_replica_id_full"); # cleanup sub $node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_rep_full"); $node_subscriber->safe_psql('postgres', "DROP TABLE test_replica_id_full"); ``` Same as above 14 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX ``` # make sure that the subscriber has the correct data my $result = $node_subscriber->safe_psql('postgres', "SELECT sum(x) FROM test_replica_id_full"); is($result, qq(212), 'ensure subscriber has the correct data at the end of the test'); $node_subscriber->poll_query_until( 'postgres', q{select sum(x)=212 AND count(*)=21 AND count(DISTINCT x)=20 FROM test_replica_id_full;} ) or die "ensure subscriber has the correct data at the end of the test"; ``` I think first one is not needed. 15 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS ``` # insert some initial data within the range 0-20 $node_publisher->safe_psql('postgres', "INSERT INTO test_replica_id_full SELECT i%20 FROM generate_series(0,1000)i;" ); ``` I think data is within the range 0-19. (There are some mistakes) === 16 test/subscription/meson.build Your test 't/032_subscribe_use_index.pl' must be added in the 'tests' for meson build system. (I checked on my env, and your test works well) Best Regards, Hayato Kuroda FUJITSU LIMITED
pgsql-hackers by date: