Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From Önder Kalacı
Subject Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id CACawEhW1q2Qx53JFDoObccP9d4HuTJxb6dA+b4qmHnJjFh68hg@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
Hi Peter, all


src/backend/replication/logical/relation.c

1. FindUsableIndexForReplicaIdentityFull

+static Oid
+FindUsableIndexForReplicaIdentityFull(Relation localrel)
+{
+ List    *indexlist = RelationGetIndexList(localrel);
+ Oid usableIndex = InvalidOid;
+ ListCell   *lc;
+
+ foreach(lc, indexlist)
+ {
+ Oid idxoid = lfirst_oid(lc);
+ bool isUsableIndex;
+ Relation indexRelation = index_open(idxoid, AccessShareLock);
+ IndexInfo  *indexInfo = BuildIndexInfo(indexRelation);
+
+ isUsableIndex = IsIndexUsableForReplicaIdentityFull(indexInfo);
+
+ index_close(indexRelation, AccessShareLock);
+
+ if (isUsableIndex)
+ {
+ /* we found one eligible index, don't need to continue */
+ usableIndex = idxoid;
+ break;
+ }
+ }
+
+ return usableIndex;
+}

This comment is not functional -- if you prefer the code as-is, then
ignore this comment.

But, personally I would:
- Move some of that code from the declarations. I feel it would be
better if the index_open/index_close were both in the code-body
instead of half in declarations and half not.
- Remove the 'usableIndex' variable, and just return directly.
- Shorten all the long names (and use consistent 'idx' instead of
sometimes 'idx' and sometimes 'index')

SUGGESTION (YMMV)

static Oid
FindUsableIndexForReplicaIdentityFull(Relation localrel)
{
List    *idxlist = RelationGetIndexList(localrel);
ListCell   *lc;

foreach(lc, idxlist)
{
Oid idxoid = lfirst_oid(lc);
bool isUsableIdx;
Relation idxRel;
IndexInfo  *idxInfo;

idxRel = index_open(idxoid, AccessShareLock);
idxInfo = BuildIndexInfo(idxRel);
isUsableIdx = IsIndexUsableForReplicaIdentityFull(idxInfo);
index_close(idxRel, AccessShareLock);

/* Return the first eligible index found */
if (isUsableIdx)
return idxoid;
}

return InvalidOid;
}

applied your suggestion. I think it made it slightly easier to follow.
 

======
.../subscription/t/032_subscribe_use_index.pl

2. SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES

2a.
# Testcase start: SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES
#
# This test ensures that after CREATE INDEX, the subscriber can automatically
# use one of the indexes (provided that it fulfils the requirements).
# Similarly, after DROP index, the subscriber can automatically switch to
# sequential scan


The last sentence is missing full-stop.


fixed
 
~

2b.
# now, create index and see that the index is used
$node_subscriber->safe_psql('postgres',
    "CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x)");

Don't say "and see that the index is used". Yes, that is what this
whole test is doing, but it is not what the psql following this
comment is doing.

 true


~

2c.
$node_publisher->safe_psql('postgres',
    "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 15;");
$node_publisher->wait_for_catchup($appname);


# wait until the index is used on the subscriber

The double blank lines here should be single.

~

fixed, 
 

2d.
# now, the update could either use the test_replica_id_full_idx or
# test_replica_id_full_idy index it is not possible for user to control
# which index to use

This sentence should break at "it".

Aso "user" --> "the user"
SUGGESTION
# now, the update could either use the test_replica_id_full_idx or
# test_replica_id_full_idy index; it is not possible for the user to control
# which index to use


looks good, thanks
 
~

2e.
# let's also test dropping test_replica_id_full_idy and
# hence use test_replica_id_full_idx


I think you ought to have dropped the other (first) index because we
already know that the first index had been used (from earlier), but we
are not 100% sure if the 'y' index has been chosen yet.

 make sense. Though in general it is hard to check pg_stat_all_indexes
for any of the indexes on this test, as we don't know the exact number
of tuples for each. Just wanted to explicitly note



~~~~

3. SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS

3a.
# deletes 20 rows
$node_publisher->safe_psql('postgres',
    "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");

# updates 20 rows
$node_publisher->safe_psql('postgres',
    "UPDATE test_replica_id_full SET x = 100, y = '200' WHERE x IN (1, 2);");


"deletes" --> "delete"

"updates" --> "update"

Well, I thought the command deletes but I guess delete looks better
 

~~~

4. SUBSCRIPTION USES INDEX WITH DROPPED COLUMNS

# updates 200 rows
$node_publisher->safe_psql('postgres',
    "UPDATE test_replica_id_full SET x = x + 1 WHERE x IN (5, 6);");


"updates" --> "update"

"200 rows" ??? is that right --  20 maybe ???


I guess this is from an earlier version of the patch, I fixed these types
of errors.
 
~~~

5. SUBSCRIPTION USES INDEX ON PARTITIONED TABLES

5a.
# updates rows and moves between partitions
$node_publisher->safe_psql('postgres',
    "UPDATE users_table_part SET value_1 = 0 WHERE user_id = 4;");

"updates rows and moves between partitions" --> "update rows, moving
them to other partitions"


fixed, thanks
 
~

5b.
# deletes rows from different partitions


"deletes" --> "delete"


fixed, and searched for similar errors but couldn't see any more

~~~
 

6. SUBSCRIPTION DOES NOT USE INDEXES WITH ONLY EXPRESSIONS OR PARTIAL INDEX

6a.
# update 2 rows
$node_publisher->safe_psql('postgres',
    "UPDATE people SET firstname = 'Nan' WHERE firstname = 'first_name_1';");
$node_publisher->safe_psql('postgres',
    "UPDATE people SET firstname = 'Nan' WHERE firstname =
'first_name_2' AND lastname = 'last_name_2';");

IMO 'Nan' seemed a curious name to assign as a test value, because it
seems like it has a special meaning but in reality, I don't think it
does. Even 'xxx' would be better.

changed to "no-name" as "xxx" also looks not so good 
 

~

6b.
# make sure none of the indexes is not used on the subscriber
$result = $node_subscriber->safe_psql('postgres',
    "select sum(idx_scan) from pg_stat_all_indexes where indexrelname
IN ('people_names_expr_only', 'people_names_partial')");
is($result, qq(0), 'ensure subscriber tap_sub_rep_full updates two
rows via seq. scan with index on expressions');

~

Looks like a typo in this comment: "none of the indexes is not used" ???

dropped "not"

 
~~~

7. SUBSCRIPTION CAN USE INDEXES WITH EXPRESSIONS AND COLUMNS

7a.
# update 2 rows
$node_publisher->safe_psql('postgres',
    "UPDATE people SET firstname = 'Nan' WHERE firstname = 'first_name_1';");
$node_publisher->safe_psql('postgres',
    "UPDATE people SET firstname = 'Nan' WHERE firstname =
'first_name_3' AND lastname = 'last_name_3';");

Same as #6a, 'Nan' seems like a strange test value to assign to a name.


similarly, changed to no-name
 
~~~

8. Some NULL values

$node_publisher->safe_psql('postgres',
    "INSERT INTO test_replica_id_full VALUES (1), (2), (3);");
$node_publisher->safe_psql('postgres',
    "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;");
$node_publisher->safe_psql('postgres',
    "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 3;");

~

8a.
For some reason, this insert/update psql was not commented.


added some
 
~

8b.
Maybe this test data could be more obvious by explicitly inserting the NULLs?


Well, that's a bit hard. We merged a few tests for perf reasons. And, I merged this
test with "missing column" test. Now, the NULL values are triggered due to
missing column on the source.
 
~~~

9. Unique index that is not primary key or replica identity

9a.
Why are other "Testcase start" comments all uppercase but not this one?


fixed, there was one more
 
~~~

10. SUBSCRIPTION USES INDEX WITH PUB/SUB different data

Why is there a mixed case in this "Test case:" comment?

no specific reason, fixed
 

~~~

11. PUBLICATION LACKS THE COLUMN ON THE SUBS INDEX

11a.
# The subsriber has an additional column compared to publisher,
# and the index is on that column. We still pick the index scan
# on the subscriber even though it is practically similar to
# sequential scan

Typo "subsriber"

I guess I fixed this in a recent iteration, I cannot find it.
 

Missing full-stop on last sentence.

Similarly, probably merged into another test.

Still went over the all such explanations in the test, and ensured
we have the full stop
 

~

11b.
# make sure that the subscriber has the correct data
# we only deleted 1 row
$result = $node_subscriber->safe_psql('postgres',
    "SELECT sum(x) FROM test_replica_id_full");
is($result, qq(232), 'ensure subscriber has the correct data at the
end of the test');


Why does that say "deleted 1 row", when the previous operation was not a DELETE?

 
Probably due to merging multiple tests into one. Fixed now.


Again, thanks for thorough review. Attached v41. 

See the reason for 0001 patch in [1].


Onder KALACI. 


 
Attachment

pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Support logical replication of DDLs
Next
From: Bharath Rupireddy
Date:
Subject: Re: Combine pg_walinspect till_end_of_wal functions with others