Thread: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Hi hackers,

It is often not feasible to use `REPLICA IDENTITY FULL` on the publication, because it leads to full table scan
per tuple change on the subscription. This makes `REPLICA IDENTITY FULL` impracticable -- probably other
than some small number of use cases.

With this patch, I'm proposing the following change: If there is an index on the subscriber, use the index
as long as the planner sub-modules picks any index over sequential scan.

Majority of the logic on the subscriber side has already existed in the code. The subscriber is already
capable of doing (unique) index scans. With this patch, we are allowing the index to iterate over the 
tuples fetched and only act when tuples are equal. The ones familiar with this part of the code could
realize that the sequential scan code on the subscriber already implements the `tuples_equal()` function.
In short, the changes on the subscriber are mostly combining parts of (unique) index scan and
sequential scan codes.

The decision on whether to use an index (or which index) is mostly derived from planner infrastructure. 
The idea is that on the subscriber we have all the columns. So, construct all the `Path`s with the
restrictions on all columns, such as `col_1 = $1 AND col_2 = $2 ... AND col_n = $N`. Finally, let
the planner sub-module -- `create_index_paths()` -- to give us the relevant  index `Path`s. On top of
that adds the sequential scan `Path` as well. Finally, pick the cheapest `Path` among.

From the performance point of view, there are few things to note. First, the patch aims not to
change the behavior when PRIMARY KEY or UNIQUE INDEX is used. Second, when REPLICA IDENTITY
IS FULL on the publisher and an index is used on the subscriber, the difference mostly comes down
to `index scan` vs `sequential scan`. That's why it is hard to claim a certain number of improvements.
It mostly depends on the data size, index and the data distribution.

Still, below I try to showcase the potential improvements using an index on the subscriber
`pgbench_accounts(bid)`. With the index, the replication catches up around ~5 seconds.
When the index is dropped, the replication takes around ~300 seconds.

// init source db
pgbench -i -s 100 -p 5432 postgres
psql -c "ALTER TABLE pgbench_accounts DROP CONSTRAINT pgbench_accounts_pkey;" -p 5432 postgres
psql -c "CREATE INDEX i1 ON pgbench_accounts(aid);" -p 5432 postgres
psql -c "ALTER TABLE pgbench_accounts REPLICA IDENTITY FULL;" -p 5432 postgres
psql -c "CREATE PUBLICATION pub_test_1 FOR TABLE pgbench_accounts;" -p 5432 postgres

// init target db, drop existing primary key
pgbench -i -p 9700 postgres
psql -c "truncate pgbench_accounts;" -p 9700 postgres
psql -c "ALTER TABLE pgbench_accounts DROP CONSTRAINT pgbench_accounts_pkey;" -p 9700 postgres
psql -c "CREATE SUBSCRIPTION sub_test_1 CONNECTION 'host=localhost port=5432 user=onderkalaci dbname=postgres' PUBLICATION pub_test_1;" -p 9700 postgres

// create one index, even on a low cardinality column
psql -c "CREATE INDEX i2 ON pgbench_accounts(bid);" -p 9700 postgres

// now, run some pgbench tests and observe replication
pgbench -t 500 -b tpcb-like -p 5432 postgres


What do hackers think about this change?

Thanks,
Onder Kalaci & Developing the Citus extension for PostgreSQL
Attachment
On Tue, Jul 12, 2022 at 7:07 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Hi hackers,
>
>
> It is often not feasible to use `REPLICA IDENTITY FULL` on the publication, because it leads to full table scan
>
> per tuple change on the subscription. This makes `REPLICA IDENTITY FULL` impracticable -- probably other
>
> than some small number of use cases.
>

IIUC, this proposal is to optimize cases where users can't have a
unique/primary key for a relation on the subscriber and those
relations receive lots of updates or deletes?

> With this patch, I'm proposing the following change: If there is an index on the subscriber, use the index
>
> as long as the planner sub-modules picks any index over sequential scan.
>
> Majority of the logic on the subscriber side has already existed in the code. The subscriber is already
>
> capable of doing (unique) index scans. With this patch, we are allowing the index to iterate over the
>
> tuples fetched and only act when tuples are equal. The ones familiar with this part of the code could
>
> realize that the sequential scan code on the subscriber already implements the `tuples_equal()` function.
>
> In short, the changes on the subscriber are mostly combining parts of (unique) index scan and
>
> sequential scan codes.
>
> The decision on whether to use an index (or which index) is mostly derived from planner infrastructure.
>
> The idea is that on the subscriber we have all the columns. So, construct all the `Path`s with the
>
> restrictions on all columns, such as `col_1 = $1 AND col_2 = $2 ... AND col_n = $N`. Finally, let
>
> the planner sub-module -- `create_index_paths()` -- to give us the relevant  index `Path`s. On top of
>
> that adds the sequential scan `Path` as well. Finally, pick the cheapest `Path` among.
>
> From the performance point of view, there are few things to note. First, the patch aims not to
> change the behavior when PRIMARY KEY or UNIQUE INDEX is used. Second, when REPLICA IDENTITY
> IS FULL on the publisher and an index is used on the subscriber, the difference mostly comes down
> to `index scan` vs `sequential scan`. That's why it is hard to claim a certain number of improvements.
> It mostly depends on the data size, index and the data distribution.
>

It seems that in favorable cases it will improve performance but we
should consider unfavorable cases as well. Two things that come to
mind in that regard are (a) while choosing index/seq. scan paths, the
patch doesn't account for cost for tuples_equal() which needs to be
performed for index scans, (b) it appears to me that the patch decides
which index to use the first time it opens the rel (or if the rel gets
invalidated) on subscriber and then for all consecutive operations it
uses the same index. It is quite possible that after some more
operations on the table, using the same index will actually be
costlier than a sequence scan or some other index scan.

--
With Regards,
Amit Kapila.



On Mon, Jul 18, 2022 at 11:59 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Jul 12, 2022 at 7:07 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> >
> > Hi hackers,
> >
> >
> > It is often not feasible to use `REPLICA IDENTITY FULL` on the publication, because it leads to full table scan
> >
> > per tuple change on the subscription. This makes `REPLICA IDENTITY FULL` impracticable -- probably other
> >
> > than some small number of use cases.
> >
>
> IIUC, this proposal is to optimize cases where users can't have a
> unique/primary key for a relation on the subscriber and those
> relations receive lots of updates or deletes?
>
> > With this patch, I'm proposing the following change: If there is an index on the subscriber, use the index
> >
> > as long as the planner sub-modules picks any index over sequential scan.
> >
> > Majority of the logic on the subscriber side has already existed in the code. The subscriber is already
> >
> > capable of doing (unique) index scans. With this patch, we are allowing the index to iterate over the
> >
> > tuples fetched and only act when tuples are equal. The ones familiar with this part of the code could
> >
> > realize that the sequential scan code on the subscriber already implements the `tuples_equal()` function.
> >
> > In short, the changes on the subscriber are mostly combining parts of (unique) index scan and
> >
> > sequential scan codes.
> >
> > The decision on whether to use an index (or which index) is mostly derived from planner infrastructure.
> >
> > The idea is that on the subscriber we have all the columns. So, construct all the `Path`s with the
> >
> > restrictions on all columns, such as `col_1 = $1 AND col_2 = $2 ... AND col_n = $N`. Finally, let
> >
> > the planner sub-module -- `create_index_paths()` -- to give us the relevant  index `Path`s. On top of
> >
> > that adds the sequential scan `Path` as well. Finally, pick the cheapest `Path` among.
> >
> > From the performance point of view, there are few things to note. First, the patch aims not to
> > change the behavior when PRIMARY KEY or UNIQUE INDEX is used. Second, when REPLICA IDENTITY
> > IS FULL on the publisher and an index is used on the subscriber, the difference mostly comes down
> > to `index scan` vs `sequential scan`. That's why it is hard to claim a certain number of improvements.
> > It mostly depends on the data size, index and the data distribution.
> >
>
> It seems that in favorable cases it will improve performance but we
> should consider unfavorable cases as well. Two things that come to
> mind in that regard are (a) while choosing index/seq. scan paths, the
> patch doesn't account for cost for tuples_equal() which needs to be
> performed for index scans, (b) it appears to me that the patch decides
> which index to use the first time it opens the rel (or if the rel gets
> invalidated) on subscriber and then for all consecutive operations it
> uses the same index. It is quite possible that after some more
> operations on the table, using the same index will actually be
> costlier than a sequence scan or some other index scan.
>

Point (a) won't matter because we perform tuples_equal both for
sequence and index scans. So, we can ignore point (a).

--
With Regards,
Amit Kapila.



Hi, thanks for your reply.

Amit Kapila <amit.kapila16@gmail.com>, 18 Tem 2022 Pzt, 08:29 tarihinde şunu yazdı:
On Tue, Jul 12, 2022 at 7:07 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Hi hackers,
>
>
> It is often not feasible to use `REPLICA IDENTITY FULL` on the publication, because it leads to full table scan
>
> per tuple change on the subscription. This makes `REPLICA IDENTITY FULL` impracticable -- probably other
>
> than some small number of use cases.
>

IIUC, this proposal is to optimize cases where users can't have a
unique/primary key for a relation on the subscriber and those
relations receive lots of updates or deletes?

Yes, that is right. 

In a similar perspective, I see this patch useful for reducing the "use primary key/unique index" requirement to "use any index" for a reasonably performant logical replication with updates/deletes.
 

It seems that in favorable cases it will improve performance but we
should consider unfavorable cases as well. Two things that come to
mind in that regard are (a) while choosing index/seq. scan paths, the
patch doesn't account for cost for tuples_equal() which needs to be
performed for index scans, (b) it appears to me that the patch decides
which index to use the first time it opens the rel (or if the rel gets
invalidated) on subscriber and then for all consecutive operations it
uses the same index. It is quite possible that after some more
operations on the table, using the same index will actually be
costlier than a sequence scan or some other index scan

Regarding (b), yes that is a concern I share. And, I was actually considering sending another patch regarding this.

Currently, I can see two options and happy to hear your take on these (or maybe another idea?)

- Add a new class of invalidation callbacks: Today, if we do ALTER TABLE or CREATE INDEX on a table, the CacheRegisterRelcacheCallback helps us to re-create the cache entries. In this case, as far as I can see, we need a callback that is called when table "ANALYZE"d, because that is when the statistics change. That is the time picking a new index makes sense. 
However, that seems like adding another dimension to this patch, which I can try but also see that committing becomes even harder. So, please see the next idea as well.

- Ask users to manually pick the index they want to use: Currently, the main complexity of the patch comes with the planner related code. In fact, if you look into the logical replication related changes, those are relatively modest changes. If we can drop the feature that Postgres picks the index, and provide a user interface to set the indexes per table in the subscription, we can probably have an easier patch to review & test. For example, we could add `ALTER SUBSCRIPTION sub ALTER TABLE t USE INDEX i` type of an API. This also needs some coding, but probably much simpler than the current code. And, obviously, this pops up the question of can users pick the right index? Probably not always, but at least that seems like a good start to use this performance improvement.

Thoughts?

Thanks,
Onder Kalaci

 
 
On Tue, Jul 19, 2022 at 1:46 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Hi, thanks for your reply.
>
> Amit Kapila <amit.kapila16@gmail.com>, 18 Tem 2022 Pzt, 08:29 tarihinde şunu yazdı:
>>
>> >
>>
>> IIUC, this proposal is to optimize cases where users can't have a
>> unique/primary key for a relation on the subscriber and those
>> relations receive lots of updates or deletes?
>
>
> Yes, that is right.
>
> In a similar perspective, I see this patch useful for reducing the "use primary key/unique index" requirement to "use
anyindex" for a reasonably performant logical replication with updates/deletes. 
>

Agreed. BTW, have you seen any such requirements from users where this
will be useful for them?

>>
>>
>> It seems that in favorable cases it will improve performance but we
>> should consider unfavorable cases as well. Two things that come to
>> mind in that regard are (a) while choosing index/seq. scan paths, the
>> patch doesn't account for cost for tuples_equal() which needs to be
>> performed for index scans, (b) it appears to me that the patch decides
>> which index to use the first time it opens the rel (or if the rel gets
>> invalidated) on subscriber and then for all consecutive operations it
>> uses the same index. It is quite possible that after some more
>> operations on the table, using the same index will actually be
>> costlier than a sequence scan or some other index scan
>
>
> Regarding (b), yes that is a concern I share. And, I was actually considering sending another patch regarding this.
>
> Currently, I can see two options and happy to hear your take on these (or maybe another idea?)
>
> - Add a new class of invalidation callbacks: Today, if we do ALTER TABLE or CREATE INDEX on a table, the
CacheRegisterRelcacheCallbackhelps us to re-create the cache entries. In this case, as far as I can see, we need a
callbackthat is called when table "ANALYZE"d, because that is when the statistics change. That is the time picking a
newindex makes sense. 
> However, that seems like adding another dimension to this patch, which I can try but also see that committing becomes
evenharder. 
>

This idea sounds worth investigating. I see that this will require
more work but OTOH, we can't allow the existing system to regress
especially because depending on workload it might regress badly. We
can create a patch for this atop the base patch for easier review/test
but I feel we need some way to address this point.

 So, please see the next idea as well.
>
> - Ask users to manually pick the index they want to use: Currently, the main complexity of the patch comes with the
plannerrelated code. In fact, if you look into the logical replication related changes, those are relatively modest
changes.If we can drop the feature that Postgres picks the index, and provide a user interface to set the indexes per
tablein the subscription, we can probably have an easier patch to review & test. For example, we could add `ALTER
SUBSCRIPTIONsub ALTER TABLE t USE INDEX i` type of an API. This also needs some coding, but probably much simpler than
thecurrent code. And, obviously, this pops up the question of can users pick the right index? 
>

I think picking the right index is one point and another is what if
the subscription has many tables (say 10K or more), doing it for
individual tables per subscription won't be fun. Also, users need to
identify which tables belong to a particular subscription, now, users
can find the same via pg_subscription_rel or some other way but doing
this won't be straightforward for users. So, my inclination would be
to pick the right index automatically rather than getting the input
from the user.

Now, your point related to planner code in the patch bothers me as
well but I haven't studied the patch in detail to provide any
alternatives at this stage. Do you have any other ideas to make it
simpler or solve this problem in some other way?

--
With Regards,
Amit Kapila.



On Mon, Jul 18, 2022 at 8:29 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> IIUC, this proposal is to optimize cases where users can't have a
> unique/primary key for a relation on the subscriber and those
> relations receive lots of updates or deletes?

I think this patch optimizes for all non-trivial cases of update/delete replication (e.g. >1000 rows in the table, >1000 rows per hour updated) without a primary key. For instance, it's quite common to have a large append-mostly events table without a primary key (e.g. because of partitioning, or insertion speed), which will still have occasional batch updates/deletes.

Imagine an update of a table or partition with 1 million rows and a typical scan speed of 1M rows/sec. An update on the whole table takes maybe 1-2 seconds. Replicating the update using a sequential scan per row can take on the order of ~12 days ≈ 1M seconds.

The current implementation makes using REPLICA IDENTITY FULL a huge liability/ impractical for scenarios where you want to replicate an arbitrary set of user-defined tables, such as upgrades, migrations, shard moves. We generally recommend users to tolerate update/delete errors in such scenarios.

If the apply worker can use an index, the data migration tool can tactically create one on a high cardinality column, which would practically always be better than doing a sequential scan for non-trivial workloads.

cheers,
Marco
Hi,
 
> - Add a new class of invalidation callbacks: Today, if we do ALTER TABLE or CREATE INDEX on a table, the CacheRegisterRelcacheCallback helps us to re-create the cache entries. In this case, as far as I can see, we need a callback that is called when table "ANALYZE"d, because that is when the statistics change. That is the time picking a new index makes sense.
> However, that seems like adding another dimension to this patch, which I can try but also see that committing becomes even harder.
>

This idea sounds worth investigating. I see that this will require
more work but OTOH, we can't allow the existing system to regress
especially because depending on workload it might regress badly.

Just to note if that is not clear: This patch avoids (or at least aims to avoid assuming no bugs) changing the behavior of the existing systems with PRIMARY KEY or UNIQUE index. In that case, we still use the relevant indexes.
 
We
can create a patch for this atop the base patch for easier review/test
but I feel we need some way to address this point.


One another idea could be to re-calculate the index, say after N updates/deletes for the table. We may consider using subscription_parameter for getting N -- with a good default, or even hard-code into the code. I think the cost of re-calculating should really be pretty small compared to the other things happening during logical replication. So, a sane default might work?

If you think the above doesn't work, I can try to work on a separate patch which adds something like "analyze invalidation callback".

 
>
> - Ask users to manually pick the index they want to use: Currently, the main complexity of the patch comes with the planner related code. In fact, if you look into the logical replication related changes, those are relatively modest changes. If we can drop the feature that Postgres picks the index, and provide a user interface to set the indexes per table in the subscription, we can probably have an easier patch to review & test. For example, we could add `ALTER SUBSCRIPTION sub ALTER TABLE t USE INDEX i` type of an API. This also needs some coding, but probably much simpler than the current code. And, obviously, this pops up the question of can users pick the right index?
>

I think picking the right index is one point and another is what if
the subscription has many tables (say 10K or more), doing it for
individual tables per subscription won't be fun. Also, users need to
identify which tables belong to a particular subscription, now, users
can find the same via pg_subscription_rel or some other way but doing
this won't be straightforward for users. So, my inclination would be
to pick the right index automatically rather than getting the input
from the user.

Yes, all makes sense.
 

Now, your point related to planner code in the patch bothers me as
well but I haven't studied the patch in detail to provide any
alternatives at this stage. Do you have any other ideas to make it
simpler or solve this problem in some other way?


One idea I tried earlier was to go over the existing indexes and on the table, then get the IndexInfo via BuildIndexInfo(). And then, try to find a good heuristic to pick an index. In the end, I felt like that is doing a sub-optimal job, requiring a similar amount of code of the current patch, and still using the similar infrastructure.
 
My conclusion for that route was I should either use a very simple heuristic (like pick the index with the most columns) and have a suboptimal index pick, OR use a complex heuristic with a reasonable index pick. And, the latter approach converged to the planner code in the patch. Do you think the former approach is acceptable? 

Thanks,
Onder
On Wed, Jul 20, 2022 at 8:19 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>>
>> > - Add a new class of invalidation callbacks: Today, if we do ALTER TABLE or CREATE INDEX on a table, the
CacheRegisterRelcacheCallbackhelps us to re-create the cache entries. In this case, as far as I can see, we need a
callbackthat is called when table "ANALYZE"d, because that is when the statistics change. That is the time picking a
newindex makes sense. 
>> > However, that seems like adding another dimension to this patch, which I can try but also see that committing
becomeseven harder. 
>> >
>>
>> This idea sounds worth investigating. I see that this will require
>> more work but OTOH, we can't allow the existing system to regress
>> especially because depending on workload it might regress badly.
>
>
> Just to note if that is not clear: This patch avoids (or at least aims to avoid assuming no bugs) changing the
behaviorof the existing systems with PRIMARY KEY or UNIQUE index. In that case, we still use the relevant indexes. 
>
>>
>> We
>> can create a patch for this atop the base patch for easier review/test
>> but I feel we need some way to address this point.
>>
>
> One another idea could be to re-calculate the index, say after N updates/deletes for the table. We may consider using
subscription_parameterfor getting N -- with a good default, or even hard-code into the code. I think the cost of
re-calculatingshould really be pretty small compared to the other things happening during logical replication. So, a
sanedefault might work? 
>

One difficulty in deciding the value of N for the user or choosing a
default would be that we need to probably consider the local DML
operations on the table as well.

> If you think the above doesn't work, I can try to work on a separate patch which adds something like "analyze
invalidationcallback". 
>

I suggest we should give this a try and if this turns out to be
problematic or complex then we can think of using some heuristic as
you are suggesting above.

>
>>
>>
>> Now, your point related to planner code in the patch bothers me as
>> well but I haven't studied the patch in detail to provide any
>> alternatives at this stage. Do you have any other ideas to make it
>> simpler or solve this problem in some other way?
>>
>
> One idea I tried earlier was to go over the existing indexes and on the table, then get the IndexInfo via
BuildIndexInfo().And then, try to find a good heuristic to pick an index. In the end, I felt like that is doing a
sub-optimaljob, requiring a similar amount of code of the current patch, and still using the similar infrastructure. 
>
> My conclusion for that route was I should either use a very simple heuristic (like pick the index with the most
columns)and have a suboptimal index pick, 
>

Not only that but say all index have same number of columns then we
need to probably either pick the first such index or use some other
heuristic.

>
> OR use a complex heuristic with a reasonable index pick. And, the latter approach converged to the planner code in
thepatch. Do you think the former approach is acceptable? 
>

In this regard, I was thinking in which cases a sequence scan can be
better than the index scan (considering one is available). I think if
a certain column has a lot of duplicates (for example, a column has a
boolean value) then probably doing a sequence scan is better. Now,
considering this even though your other approach sounds simpler but
could lead to unpredictable results. So, I think the latter approach
is preferable.

BTW, do we want to consider partial indexes for the scan in this
context? I mean it may not have data of all rows so how that would be
usable?

Few comments:
===============
1.
static List *
+CreateReplicaIdentityFullPaths(Relation localrel)
{
...
+ /*
+ * Rather than doing all the pushups that would be needed to use
+ * set_baserel_size_estimates, just do a quick hack for rows and width.
+ */
+ rel->rows = rel->tuples;

Is it a good idea to set rows without any selectivity estimation?
Won't this always set the entire rows in a relation? Also, if we don't
want to use set_baserel_size_estimates(), how will we compute
baserestrictcost which will later be used in the costing of paths (for
example, costing of seqscan path (cost_seqscan) uses it)?

In general, I think it will be better to consider calling some
top-level planner functions even for paths. Can we consider using
make_one_rel() instead of building individual paths? On similar lines,
in function PickCheapestIndexPathIfExists(), can we use
set_cheapest()?

2.
@@ -57,9 +60,6 @@ build_replindex_scan_key(ScanKey skey, Relation rel,
Relation idxrel,
  int2vector *indkey = &idxrel->rd_index->indkey;
  bool hasnulls = false;

- Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel) ||
-    RelationGetPrimaryKeyIndex(rel) == RelationGetRelid(idxrel));

You have removed this assertion but there is a comment ("This is not
generic routine, it expects the idxrel to be replication identity of a
rel and meet all limitations associated with that.") atop this
function which either needs to be changed/removed and probably we
should think if the function needs some change after removing that
restriction.

--
With Regards,
Amit Kapila.



Hi,

>
> One another idea could be to re-calculate the index, say after N updates/deletes for the table. We may consider using subscription_parameter for getting N -- with a good default, or even hard-code into the code. I think the cost of re-calculating should really be pretty small compared to the other things happening during logical replication. So, a sane default might work?
>

One difficulty in deciding the value of N for the user or choosing a
default would be that we need to probably consider the local DML
operations on the table as well.


Fair enough, it is not easy to find a good default.
 
> If you think the above doesn't work, I can try to work on a separate patch which adds something like "analyze invalidation callback".
>

I suggest we should give this a try and if this turns out to be
problematic or complex then we can think of using some heuristic as
you are suggesting above.

Alright, I'll try this and respond shortly back.
 
>
> OR use a complex heuristic with a reasonable index pick. And, the latter approach converged to the planner code in the patch. Do you think the former approach is acceptable?
>

In this regard, I was thinking in which cases a sequence scan can be
better than the index scan (considering one is available). I think if
a certain column has a lot of duplicates (for example, a column has a
boolean value) then probably doing a sequence scan is better. Now,
considering this even though your other approach sounds simpler but
could lead to unpredictable results. So, I think the latter approach
is preferable.

Yes, it makes sense. I also considered this during the development of the patch, but forgot to mention :) 
 

BTW, do we want to consider partial indexes for the scan in this
context? I mean it may not have data of all rows so how that would be
usable?


As far as I can see, check_index_predicates() never picks a partial index for the baserestrictinfos we create in CreateReplicaIdentityFullPaths(). The reason is that we have roughly the following call stack:

-check_index_predicates
 --predicate_implied_by
---predicate_implied_by_recurse
----predicate_implied_by_simple_clause
-----operator_predicate_proof

And, inside operator_predicate_proof(), there is never going to be an equality. Because, we push `Param`s to the baserestrictinfos whereas the index predicates are always `Const`.

If we want to make it even more explicit, I can filter out `Path`s with partial indexes. But that seems redundant to me. For now, I pushed the commit with an assertion that we never pick partial indexes and also added a test. 

If you think it is better to explicitly filter out partial indexes, I can do that as well. 

 
Few comments:
===============
1.
static List *
+CreateReplicaIdentityFullPaths(Relation localrel)
{
...
+ /*
+ * Rather than doing all the pushups that would be needed to use
+ * set_baserel_size_estimates, just do a quick hack for rows and width.
+ */
+ rel->rows = rel->tuples;

Is it a good idea to set rows without any selectivity estimation?
Won't this always set the entire rows in a relation? Also, if we don't
want to use set_baserel_size_estimates(), how will we compute
baserestrictcost which will later be used in the costing of paths (for
example, costing of seqscan path (cost_seqscan) uses it)?

In general, I think it will be better to consider calling some
top-level planner functions even for paths. Can we consider using
make_one_rel() instead of building individual paths?

Thanks, this looks like a good suggestion/simplification. I wanted to use the least amount of code possible, and make_one_rel() does either what I exactly need or slightly more, which is great.

Note that make_one_rel() also follows the same call stack that I noted above. So, I cannot spot any problems with partial indexes. Maybe am I missing something here?
 
On similar lines,
in function PickCheapestIndexPathIfExists(), can we use
set_cheapest()?


Yes, make_one_rel() + set_cheapest() sounds better. Changed.
 
2.
@@ -57,9 +60,6 @@ build_replindex_scan_key(ScanKey skey, Relation rel,
Relation idxrel,
  int2vector *indkey = &idxrel->rd_index->indkey;
  bool hasnulls = false;

- Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel) ||
-    RelationGetPrimaryKeyIndex(rel) == RelationGetRelid(idxrel));

You have removed this assertion but there is a comment ("This is not
generic routine, it expects the idxrel to be replication identity of a
rel and meet all limitations associated with that.") atop this
function which either needs to be changed/removed and probably we
should think if the function needs some change after removing that
restriction.


Ack, I can see your point. I think, for example, we should skip index attributes that are not simple column references. And, probably whatever other restrictions that PRIMARY has, should be here.

I'll read some more Postgres code & test before pushing a revision for this part. In the meantime, if you have any suggestions/pointers for me to look into, please note here.

Attached v2 of the patch with addressing some of the comments you had. I'll work on the remaining shortly.

Thanks,
Onder


Attachment
Hi,

2.
@@ -57,9 +60,6 @@ build_replindex_scan_key(ScanKey skey, Relation rel,
Relation idxrel,
  int2vector *indkey = &idxrel->rd_index->indkey;
  bool hasnulls = false;

- Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel) ||
-    RelationGetPrimaryKeyIndex(rel) == RelationGetRelid(idxrel));

You have removed this assertion but there is a comment ("This is not
generic routine, it expects the idxrel to be replication identity of a
rel and meet all limitations associated with that.") atop this
function which either needs to be changed/removed and probably we
should think if the function needs some change after removing that
restriction.


Ack, I can see your point. I think, for example, we should skip index attributes that are not simple column references. And, probably whatever other restrictions that PRIMARY has, should be here.

Primary keys require:
- Unique: We don't need uniqueness, that's the point of this patch
- Valid index: Should not be an issue in this case, because planner would not pick non-valid index anyway.
- Non-Partial index: As discussed earlier in this thread, I really don't see any problems with partial indexes for this use-case. Please let me know if there is anything I miss.
- Deferrable - Immediate: As far as I can see, there is no such concepts for regular indexes, so does not apply here 
- Indexes with no expressions: This is the point where we require some minor changes inside/around `build_replindex_scan_key `. Previously, indexes on expressions could not be replica indexes. And, with this patch they can. However, the expressions cannot be used for filtering the tuples because of the way we create the restrictinfos. We essentially create  `WHERE col_1 = $1 AND col_2 = $2 .. col_n = $n` for the columns with equality operators available. In the case of expressions on the indexes, the planner would never pick such indexes with these restrictions. I changed `build_replindex_scan_key ` to reflect that, added a new assert and pushed tests with the following schema, and make sure the code behaves as expected:

CREATE TABLE people (firstname text, lastname text);
CREATE INDEX people_names_expr_only ON people ((firstname || ' ' || lastname));
CREATE INDEX people_names_expr_and_columns ON people ((firstname || ' ' || lastname), firstname, lastname);

Also did similar tests with indexes on jsonb fields. Does that help you avoid the concerns regarding indexes with expressions? 

I'll work on one of the other open items in the thread (e.g., analyze invalidation callback) separately.

Thanks,
Onder KALACI

Attachment
Hi,

As far as I can see, the following is the answer to the only remaining open discussion in this thread. Let me know if anything is missed.

(b) it appears to me that the patch decides
>> which index to use the first time it opens the rel (or if the rel gets
>> invalidated) on subscriber and then for all consecutive operations it
>> uses the same index. It is quite possible that after some more
>> operations on the table, using the same index will actually be
>> costlier than a sequence scan or some other index scan
>
>
> Regarding (b), yes that is a concern I share. And, I was actually considering sending another patch regarding this.
>
> Currently, I can see two options and happy to hear your take on these (or maybe another idea?)
>
> - Add a new class of invalidation callbacks: Today, if we do ALTER TABLE or CREATE INDEX on a table, the CacheRegisterRelcacheCallback helps us to re-create the cache entries. In this case, as far as I can see, we need a callback that is called when table "ANALYZE"d, because that is when the statistics change. That is the time picking a new index makes sense.
> However, that seems like adding another dimension to this patch, which I can try but also see that committing becomes even harder.
>

This idea sounds worth investigating. I see that this will require
more work but OTOH, we can't allow the existing system to regress
especially because depending on workload it might regress badly. We
can create a patch for this atop the base patch for easier review/test
but I feel we need some way to address this point.


It turns out that we already invalidate the relevant entries in LogicalRepRelMap/LogicalRepPartMap when "ANALYZE" (or VACUUM) updates any of the statistics in pg_class.

The call-stack for analyze is roughly:
do_analyze_rel()
   -> vac_update_relstats()
     -> heap_inplace_update()
         -> if needs to apply any statistical change
             -> CacheInvalidateHeapTuple()

And, we register for those invalidations already:
logicalrep_relmap_init() / logicalrep_partmap_init()
   -> CacheRegisterRelcacheCallback()



Added a test which triggers this behavior. The test is as follows:
- Create two indexes on the target, on column_a and column_b
- Initially load data such that the column_a has a high cardinality
- Show that we use the index on column_a
- Load more data such that the column_b has higher cardinality
- ANALYZE on the target table
- Show that we use the index on column_b afterwards

Thanks,
Onder KALACI
 
Attachment
On Mon, Aug 1, 2022 at 9:52 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Hi,
>
> As far as I can see, the following is the answer to the only remaining open discussion in this thread. Let me know if
anythingis missed. 
>
>> (b) it appears to me that the patch decides
>> >> which index to use the first time it opens the rel (or if the rel gets
>> >> invalidated) on subscriber and then for all consecutive operations it
>> >> uses the same index. It is quite possible that after some more
>> >> operations on the table, using the same index will actually be
>> >> costlier than a sequence scan or some other index scan
>> >
>> >
>> > Regarding (b), yes that is a concern I share. And, I was actually considering sending another patch regarding
this.
>> >
>> > Currently, I can see two options and happy to hear your take on these (or maybe another idea?)
>> >
>> > - Add a new class of invalidation callbacks: Today, if we do ALTER TABLE or CREATE INDEX on a table, the
CacheRegisterRelcacheCallbackhelps us to re-create the cache entries. In this case, as far as I can see, we need a
callbackthat is called when table "ANALYZE"d, because that is when the statistics change. That is the time picking a
newindex makes sense. 
>> > However, that seems like adding another dimension to this patch, which I can try but also see that committing
becomeseven harder. 
>> >
>>
>> This idea sounds worth investigating. I see that this will require
>> more work but OTOH, we can't allow the existing system to regress
>> especially because depending on workload it might regress badly. We
>> can create a patch for this atop the base patch for easier review/test
>> but I feel we need some way to address this point.
>>
>
> It turns out that we already invalidate the relevant entries in LogicalRepRelMap/LogicalRepPartMap when "ANALYZE" (or
VACUUM)updates any of the statistics in pg_class. 
>
> The call-stack for analyze is roughly:
> do_analyze_rel()
>    -> vac_update_relstats()
>      -> heap_inplace_update()
>          -> if needs to apply any statistical change
>              -> CacheInvalidateHeapTuple()
>

Yeah, it appears that this will work but I see that we don't update
here for inherited stats, how does it work for such cases?

--
With Regards,
Amit Kapila.



On Fri, Jul 22, 2022 at 9:45 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>>
>>
>> BTW, do we want to consider partial indexes for the scan in this
>> context? I mean it may not have data of all rows so how that would be
>> usable?
>>
>
> As far as I can see, check_index_predicates() never picks a partial index for the baserestrictinfos we create in
CreateReplicaIdentityFullPaths().The reason is that we have roughly the following call stack: 
>
> -check_index_predicates
>  --predicate_implied_by
> ---predicate_implied_by_recurse
> ----predicate_implied_by_simple_clause
> -----operator_predicate_proof
>
> And, inside operator_predicate_proof(), there is never going to be an equality. Because, we push `Param`s to the
baserestrictinfoswhereas the index predicates are always `Const`. 
>

I agree that the way currently baserestrictinfos are formed by patch,
it won't select the partial path, and chances are that that will be
true in future as well but I think it is better to be explicit in this
case to avoid creating a dependency between two code paths.

Few other comments:
==================
1. Why is it a good idea to choose the index selected even for the
bitmap path (T_BitmapHeapScan or T_BitmapIndexScan)? We use index scan
during update/delete, so not sure how we can conclude to use index for
bitmap paths.

2. The index info is built even on insert, so workload, where there
are no updates/deletes or those are not published then this index
selection work will go waste. Will it be better to do it at first
update/delete? One can say that it is not worth the hassle as anyway
it will be built the first time we perform an operation on the
relation or after the relation gets invalidated. If we think so, then
probably adding a comment could be useful.

3.
+my $synced_query =
+  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT
IN ('r', 's');";
...
...
+# wait for initial table synchronization to finish
+$node_subscriber->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";

You can avoid such instances in the test by using the new
infrastructure added in commit 0c20dd33db.

4.
  LogicalRepRelation *remoterel = &root->remoterel;
+
  Oid partOid = RelationGetRelid(partrel);

Spurious line addition.

--
With Regards,
Amit Kapila.



Hi,

Thanks for the feedback, see my reply below.

>
> It turns out that we already invalidate the relevant entries in LogicalRepRelMap/LogicalRepPartMap when "ANALYZE" (or VACUUM) updates any of the statistics in pg_class.
>
> The call-stack for analyze is roughly:
> do_analyze_rel()
>    -> vac_update_relstats()
>      -> heap_inplace_update()
>          -> if needs to apply any statistical change
>              -> CacheInvalidateHeapTuple()
>
Yeah, it appears that this will work but I see that we don't update
here for inherited stats, how does it work for such cases?

There, the expansion of the relation list to partitions happens one level above on the call stack. So, the call stack looks like the following:

autovacuum_do_vac_analyze() (or ExecVacuum)
  -> vacuum()
     -> expand_vacuum_rel()
          -> rel_list=parent+children partitions
      -> for rel in rel_list
          ->analyze_rel()
              ->do_analyze_rel
                  ... (and the same call stack as above)

I also added one variation of a similar test for partitioned tables, which I earlier added for non-partitioned tables as well:

Added a test which triggers this behavior. The test is as follows:
- Create two indexes on the target, on column_a and column_b
- Initially load data such that the column_a has a high cardinality
- Show that we use the index on column_a on a child table 
- Load more data such that the column_b has higher cardinality
- ANALYZE on the parent table
- Show that we use the index on column_b afterwards on the child table

My answer for the above assumes that your question is regarding what happens if you ANALYZE on a partitioned table. If your question is something different, please let me know.


>> BTW, do we want to consider partial indexes for the scan in this
>> context? I mean it may not have data of all rows so how that would be
>> usable?
>>
>
> As far as I can see, check_index_predicates() never picks a partial index for the baserestrictinfos we create in CreateReplicaIdentityFullPaths(). The reason is that we have roughly the following call stack:
>
> -check_index_predicates
>  --predicate_implied_by
> ---predicate_implied_by_recurse
> ----predicate_implied_by_simple_clause
> -----operator_predicate_proof
>
> And, inside operator_predicate_proof(), there is never going to be an equality. Because, we push `Param`s to the baserestrictinfos whereas the index predicates are always `Const`.
>

I agree that the way currently baserestrictinfos are formed by patch,
it won't select the partial path, and chances are that that will be
true in future as well but I think it is better to be explicit in this
case to avoid creating a dependency between two code paths.


Yes, it makes sense. So, I changed Assert into a function where we filter partial indexes and indexes on only expressions, so that we do not create such dependencies between the planner and here.

If one day planner supports using column values on index with expressions, this code would only not be able to use the optimization until we do some improvements in this code-path. I think that seems like a fair trade-off for now.

Few other comments:
==================
1. Why is it a good idea to choose the index selected even for the
bitmap path (T_BitmapHeapScan or T_BitmapIndexScan)? We use index scan
during update/delete, so not sure how we can conclude to use index for
bitmap paths.

In our case, during update/delete we are searching for a single tuple on the target. And, it seems like using an index is probably going to be cheaper for finding the single tuple. In general, I thought we should use an index if the planner ever decides to use it with the given restrictions.

Also, for the majority of the use-cases, I think we'd probably expect an index on a column with high cardinality -- hence use index scan. So, bitmap index scans are probably not going to be that much common.

Still, I don't see a problem with using such indexes. Of course, it is possible that I might be missing something. Do you have any specific concerns in this area? 
 

2. The index info is built even on insert, so workload, where there
are no updates/deletes or those are not published then this index
selection work will go waste. Will it be better to do it at first
update/delete? One can say that it is not worth the hassle as anyway
it will be built the first time we perform an operation on the
relation or after the relation gets invalidated.

With the current approach, the index (re)-calculation is coupled with (in)validation of the relevant cache entries. So, I'd argue for the simplicity of the code, we could afford to waste this small overhead? According to my local measurements, especially for large tables, the index oid calculation is mostly insignificant compared to the rest of the steps. Does that sound OK to you?

 
If we think so, then
probably adding a comment could be useful.

Yes, that is useful if you are OK with the above, added.
 
3.
+my $synced_query =
+  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT
IN ('r', 's');";
...
...
+# wait for initial table synchronization to finish
+$node_subscriber->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";

You can avoid such instances in the test by using the new
infrastructure added in commit 0c20dd33db.

 Cool, applied changes.


4.
  LogicalRepRelation *remoterel = &root->remoterel;
+
  Oid partOid = RelationGetRelid(partrel);

Spurious line addition.

 
Fixed, went over the code and couldn't find other. 
 

Attaching v5 of the patch which reflects the review on this email, also few minor test improvements.

Thanks,
Onder

Attachment

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"osumi.takamichi@fujitsu.com"
Date:
On Tuesday, August 9, 2022 12:59 AM Önder Kalacı <onderkalaci@gmail.com> wrote:
> Attaching v5 of the patch which reflects the review on this email, also few
> minor test improvements.
Hi,


Thank you for the updated patch.
FYI, I noticed that v5 causes cfbot failure in [1].
Could you please fix it in the next version ?


[19:44:38.420] execReplication.c: In function ‘RelationFindReplTupleByIndex’:
[19:44:38.420] execReplication.c:186:24: error: ‘eq’ may be used uninitialized in this function
[-Werror=maybe-uninitialized]
[19:44:38.420]   186 |   if (!indisunique && !tuples_equal(outslot, searchslot, eq))
[19:44:38.420]       |                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[19:44:38.420] cc1: all warnings being treated as errors



[1] - https://cirrus-ci.com/task/6544573026533376



Best Regards,
    Takamichi Osumi


Hi,
 

FYI, I noticed that v5 causes cfbot failure in [1].
Could you please fix it in the next version ?

Thanks for letting me know!



[19:44:38.420] execReplication.c: In function ‘RelationFindReplTupleByIndex’:
[19:44:38.420] execReplication.c:186:24: error: ‘eq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
[19:44:38.420]   186 |   if (!indisunique && !tuples_equal(outslot, searchslot, eq))
[19:44:38.420]       |                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[19:44:38.420] cc1: all warnings being treated as errors


It is kind of interesting that the compiler cannot understand that `eq` is only used when !indisunique. Anyway, now I've sent v6 where I avoid the warning with a slight refactor to avoid the compile warning.

Thanks,
Onder KALACI  

  
Attachment
On Mon, Aug 8, 2022 at 9:29 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Hi,
>
> Thanks for the feedback, see my reply below.
>
>> >
>> > It turns out that we already invalidate the relevant entries in LogicalRepRelMap/LogicalRepPartMap when "ANALYZE"
(orVACUUM) updates any of the statistics in pg_class. 
>> >
>> > The call-stack for analyze is roughly:
>> > do_analyze_rel()
>> >    -> vac_update_relstats()
>> >      -> heap_inplace_update()
>> >          -> if needs to apply any statistical change
>> >              -> CacheInvalidateHeapTuple()
>> >
>>
>> Yeah, it appears that this will work but I see that we don't update
>> here for inherited stats, how does it work for such cases?
>
>
> There, the expansion of the relation list to partitions happens one level above on the call stack. So, the call stack
lookslike the following: 
>
> autovacuum_do_vac_analyze() (or ExecVacuum)
>   -> vacuum()
>      -> expand_vacuum_rel()
>           -> rel_list=parent+children partitions
>       -> for rel in rel_list
>           ->analyze_rel()
>               ->do_analyze_rel
>                   ... (and the same call stack as above)
>
> I also added one variation of a similar test for partitioned tables, which I earlier added for non-partitioned tables
aswell: 
>
>> Added a test which triggers this behavior. The test is as follows:
>> - Create two indexes on the target, on column_a and column_b
>> - Initially load data such that the column_a has a high cardinality
>> - Show that we use the index on column_a on a child table
>> - Load more data such that the column_b has higher cardinality
>> - ANALYZE on the parent table
>> - Show that we use the index on column_b afterwards on the child table
>
>
> My answer for the above assumes that your question is regarding what happens if you ANALYZE on a partitioned table.
Ifyour question is something different, please let me know. 
>

I was talking about inheritance cases, something like:
create table tbl1 (a int);
create table tbl1_part1 (b int) inherits (tbl1);
create table tbl1_part2 (c int) inherits (tbl1);

What we do in such cases is documented as: "if the table being
analyzed has inheritance children, ANALYZE gathers two sets of
statistics: one on the rows of the parent table only, and a second
including rows of both the parent table and all of its children. This
second set of statistics is needed when planning queries that process
the inheritance tree as a whole. The child tables themselves are not
individually analyzed in this case."

Now, the point I was worried about was what if the changes in child
tables (*_part1, *_part2) are much more than in tbl1? In such cases,
we may not invalidate child rel entries, so how will logical
replication behave for updates/deletes on child tables? There may not
be any problem here but it is better to do some analysis of such cases
to see how it behaves.

>>
>> >> BTW, do we want to consider partial indexes for the scan in this
>> >> context? I mean it may not have data of all rows so how that would be
>> >> usable?
>> >>
>> >
>
>> Few other comments:
>> ==================
>> 1. Why is it a good idea to choose the index selected even for the
>> bitmap path (T_BitmapHeapScan or T_BitmapIndexScan)? We use index scan
>> during update/delete, so not sure how we can conclude to use index for
>> bitmap paths.
>
>
> In our case, during update/delete we are searching for a single tuple on the target. And, it seems like using an
indexis probably going to be cheaper for finding the single tuple. In general, I thought we should use an index if the
plannerever decides to use it with the given restrictions. 
>

What about the case where the index has a lot of duplicate values? We
may need to retrieve multiple tuples in such cases.

> Also, for the majority of the use-cases, I think we'd probably expect an index on a column with high cardinality --
henceuse index scan. So, bitmap index scans are probably not going to be that much common. 
>

You are probably right here but I don't think we can make such
assumptions. I think the safest way to avoid any regression here is to
choose an index when the planner selects an index scan. We can always
extend it later to bitmap scans if required. We can add a comment
indicating the same.

> Still, I don't see a problem with using such indexes. Of course, it is possible that I might be missing something. Do
youhave any specific concerns in this area? 
>
>>
>>
>> 2. The index info is built even on insert, so workload, where there
>> are no updates/deletes or those are not published then this index
>> selection work will go waste. Will it be better to do it at first
>> update/delete? One can say that it is not worth the hassle as anyway
>> it will be built the first time we perform an operation on the
>> relation or after the relation gets invalidated.
>
>
> With the current approach, the index (re)-calculation is coupled with (in)validation of the relevant cache entries.
So,I'd argue for the simplicity of the code, we could afford to waste this small overhead? According to my local
measurements,especially for large tables, the index oid calculation is mostly insignificant compared to the rest of the
steps.Does that sound OK to you? 
>
>
>>
>> If we think so, then
>> probably adding a comment could be useful.
>
>
> Yes, that is useful if you are OK with the above, added.
>

*
+ /*
+ * For insert-only workloads, calculating the index is not necessary.
+ * As the calculation is not expensive, we are fine to do here (instead
+ * of during first update/delete processing).
+ */

I think here instead of talking about cost, we should mention that it
is quite an infrequent operation i.e performed only when we first time
performs an operation on the relation or after invalidation. This is
because I think the cost is relative.

*
+
+ /*
+ * Although currently it is not possible for planner to pick a
+ * partial index or indexes only on expressions,

It may be better to expand this comment by describing a bit why it is
not possible in our case. You might want to give the function
reference where it is decided.

--
With Regards,
Amit Kapila.



Hi,

I'm a little late to catch up with your comments, but here are my replies:

> My answer for the above assumes that your question is regarding what happens if you ANALYZE on a partitioned table. If your question is something different, please let me know.
>

I was talking about inheritance cases, something like:
create table tbl1 (a int);
create table tbl1_part1 (b int) inherits (tbl1);
create table tbl1_part2 (c int) inherits (tbl1);

What we do in such cases is documented as: "if the table being
analyzed has inheritance children, ANALYZE gathers two sets of
statistics: one on the rows of the parent table only, and a second
including rows of both the parent table and all of its children. This
second set of statistics is needed when planning queries that process
the inheritance tree as a whole. The child tables themselves are not
individually analyzed in this case."

Oh, I haven't considered inherited tables. That seems right, the statistics of the children are not updated when the parent is analyzed. 
 
Now, the point I was worried about was what if the changes in child
tables (*_part1, *_part2) are much more than in tbl1? In such cases,
we may not invalidate child rel entries, so how will logical
replication behave for updates/deletes on child tables? There may not
be any problem here but it is better to do some analysis of such cases
to see how it behaves.

I also haven't observed any specific issues. In the end, when the user (or autovacuum) does ANALYZE on the child, it is when the statistics are updated for the child. Although I do not have much experience with inherited tables, this sounds like the expected behavior?

I also pushed a test covering inherited tables. First, a basic test on the parent. Then, show that updates on the parent can also use indexes of the children. Also, after an ANALYZE on the child, we can re-calculate the index and use the index with a higher cardinality column.


> Also, for the majority of the use-cases, I think we'd probably expect an index on a column with high cardinality -- hence use index scan. So, bitmap index scans are probably not going to be that much common.
>

You are probably right here but I don't think we can make such
assumptions. I think the safest way to avoid any regression here is to
choose an index when the planner selects an index scan. We can always
extend it later to bitmap scans if required. We can add a comment
indicating the same.


Alright, I got rid of the bitmap scans. 

Though, it caused few of the new tests to fail. I think because of the data size/distribution, the planner picks bitmap scans. To make the tests consistent and small, I added `enable_bitmapscan to off` for this new test file. Does that sound ok to you? Or, should we change the tests to make sure they genuinely use index scans?

*
+ /*
+ * For insert-only workloads, calculating the index is not necessary.
+ * As the calculation is not expensive, we are fine to do here (instead
+ * of during first update/delete processing).
+ */

I think here instead of talking about cost, we should mention that it
is quite an infrequent operation i.e performed only when we first time
performs an operation on the relation or after invalidation. This is
because I think the cost is relative.

Changed, does that look better? 

+
+ /*
+ * Although currently it is not possible for planner to pick a
+ * partial index or indexes only on expressions,

It may be better to expand this comment by describing a bit why it is
not possible in our case. You might want to give the function
reference where it is decided.

Make sense, added some more information.

Thanks,
Onder 
 
Attachment

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"shiy.fnst@fujitsu.com"
Date:
On Sat, Aug 20, 2022 7:02 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> Hi,
> 
> I'm a little late to catch up with your comments, but here are my replies:

Thanks for your patch. Here are some comments.

1.
In FilterOutNotSuitablePathsForReplIdentFull(), is "nonPartialIndexPathList" a
good name for the list? Indexes on only expressions are also be filtered.

+static List *
+FilterOutNotSuitablePathsForReplIdentFull(List *pathlist)
+{
+    ListCell   *lc;
+    List *nonPartialIndexPathList = NIL;

2.
+typedef struct LogicalRepPartMapEntry
+{
+    Oid            partoid;        /* LogicalRepPartMap's key */
+    LogicalRepRelMapEntry relmapentry;
+    Oid            usableIndexOid; /* which index to use? (Invalid when no index
+                                 * used) */
+} LogicalRepPartMapEntry;

For partition tables, is it possible to use relmapentry->usableIndexOid to mark
which index to use? Which means we don't need to add "usableIndexOid" to
LogicalRepPartMapEntry.

3.
It looks we should change the comment for FindReplTupleInLocalRel() in this
patch.

/*
 * Try to find a tuple received from the publication side (in 'remoteslot') in
 * the corresponding local relation using either replica identity index,
 * primary key or if needed, sequential scan.
 *
 * Local tuple, if found, is returned in '*localslot'.
 */
static bool
FindReplTupleInLocalRel(EState *estate, Relation localrel,

4.
@@ -2030,16 +2017,19 @@ apply_handle_delete_internal(ApplyExecutionData *edata,
 {
     EState       *estate = edata->estate;
     Relation    localrel = relinfo->ri_RelationDesc;
-    LogicalRepRelation *remoterel = &edata->targetRel->remoterel;
+    LogicalRepRelMapEntry *targetRel = edata->targetRel;
+    LogicalRepRelation *remoterel = &targetRel->remoterel;
     EPQState    epqstate;
     TupleTableSlot *localslot;

Do we need this change? I didn't see any place to use the variable targetRel
afterwards.

5.
+        if (!AttributeNumberIsValid(mainattno))
+        {
+            /*
+             * There are two cases to consider. First, if the index is a primary or
+             * unique key, we cannot have any indexes with expressions. So, at this
+             * point we are sure that the index we deal is not these.
+             */
+            Assert(RelationGetReplicaIndex(rel) != RelationGetRelid(idxrel) &&
+                   RelationGetPrimaryKeyIndex(rel) != RelationGetRelid(idxrel));
+
+            /*
+             * For a non-primary/unique index with an expression, we are sure that
+             * the expression cannot be used for replication index search. The
+             * reason is that we create relevant index paths by providing column
+             * equalities. And, the planner does not pick expression indexes via
+             * column equality restrictions in the query.
+             */
+            continue;
+        }

Is it possible that it is a usable index with an expression? I think indexes
with an expression has been filtered in 
FilterOutNotSuitablePathsForReplIdentFull(). If it can't be a usable index with
an expression, maybe we shouldn't use "continue" here.

6.
In the following case, I got a result which is different from HEAD, could you
please look into it?

-- publisher
CREATE TABLE test_replica_id_full (x int); 
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL; 
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;

-- subscriber
CREATE TABLE test_replica_id_full (x int, y int); 
CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x,y); 
CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres port=5432' PUBLICATION tap_pub_rep_full;

-- publisher
INSERT INTO test_replica_id_full VALUES (1);
UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;

The data in subscriber:
on HEAD:
postgres=# select * from test_replica_id_full ;
 x | y
---+---
 2 |
(1 row)

After applying the patch:
postgres=# select * from test_replica_id_full ;
 x | y
---+---
 1 |
(1 row)

Regards,
Shi yu

Hi,

Thanks for the review!



1.
In FilterOutNotSuitablePathsForReplIdentFull(), is "nonPartialIndexPathList" a
good name for the list? Indexes on only expressions are also be filtered.

+static List *
+FilterOutNotSuitablePathsForReplIdentFull(List *pathlist)
+{
+       ListCell   *lc;
+       List *nonPartialIndexPathList = NIL;


Yes, true. We only started filtering the non-partial ones first. Now changed to suitableIndexList, does that look right?
 
2.
+typedef struct LogicalRepPartMapEntry
+{
+       Oid                     partoid;                /* LogicalRepPartMap's key */
+       LogicalRepRelMapEntry relmapentry;
+       Oid                     usableIndexOid; /* which index to use? (Invalid when no index
+                                                                * used) */
+} LogicalRepPartMapEntry;

For partition tables, is it possible to use relmapentry->usableIndexOid to mark
which index to use? Which means we don't need to add "usableIndexOid" to
LogicalRepPartMapEntry.


My intention was to make this explicit so that it is clear that partitions can explicitly own indexes. 

But I tried your suggested refactor, which looks good. So, I changed it.

Also, I realized that I do not have a test where the partition has an index (not inherited from the parent), which I also added now.
 
3.
It looks we should change the comment for FindReplTupleInLocalRel() in this
patch.

/*
 * Try to find a tuple received from the publication side (in 'remoteslot') in
 * the corresponding local relation using either replica identity index,
 * primary key or if needed, sequential scan.
 *
 * Local tuple, if found, is returned in '*localslot'.
 */
static bool
FindReplTupleInLocalRel(EState *estate, Relation localrel,


I made a small change, just adding "index". Do you expect a larger change?
 
4.
@@ -2030,16 +2017,19 @@ apply_handle_delete_internal(ApplyExecutionData *edata,
 {
        EState     *estate = edata->estate;
        Relation        localrel = relinfo->ri_RelationDesc;
-       LogicalRepRelation *remoterel = &edata->targetRel->remoterel;
+       LogicalRepRelMapEntry *targetRel = edata->targetRel;
+       LogicalRepRelation *remoterel = &targetRel->remoterel;
        EPQState        epqstate;
        TupleTableSlot *localslot;

Do we need this change? I didn't see any place to use the variable targetRel
afterwards.

Seems so, changed it back.


5.
+               if (!AttributeNumberIsValid(mainattno))
+               {
+                       /*
+                        * There are two cases to consider. First, if the index is a primary or
+                        * unique key, we cannot have any indexes with expressions. So, at this
+                        * point we are sure that the index we deal is not these.
+                        */
+                       Assert(RelationGetReplicaIndex(rel) != RelationGetRelid(idxrel) &&
+                                  RelationGetPrimaryKeyIndex(rel) != RelationGetRelid(idxrel));
+
+                       /*
+                        * For a non-primary/unique index with an expression, we are sure that
+                        * the expression cannot be used for replication index search. The
+                        * reason is that we create relevant index paths by providing column
+                        * equalities. And, the planner does not pick expression indexes via
+                        * column equality restrictions in the query.
+                        */
+                       continue;
+               }

Is it possible that it is a usable index with an expression? I think indexes
with an expression has been filtered in
FilterOutNotSuitablePathsForReplIdentFull(). If it can't be a usable index with
an expression, maybe we shouldn't use "continue" here.

Ok, I think there are some confusing comments in the code, which I updated. Also, added one more explicit Assert to make the code a little more readable.

We can support indexes involving expressions but not indexes that are only consisting of expressions. FilterOutNotSuitablePathsForReplIdentFull() filters out the latter, see IndexOnlyOnExpression(). 

So, for example, if we have an index as below, we are skipping the expression while building the index scan keys:

CREATE INDEX people_names ON people (firstname, lastname, (id || '_' || sub_id));

We can consider removing `continue`, but that'd mean we should also adjust the following code-block to handle indexprs. To me, that seems like an edge case to implement at this point, given such an index is probably not common. Do you think should I try to use the indexprs as well while building the scan key?

I'm mostly trying to keep the complexity small. If you suggest this limitation should be lifted, I can give it a shot. I think the limitation I leave here is with a single sentence: The index on the subscriber can only use simple column references.  


6.
In the following case, I got a result which is different from HEAD, could you
please look into it?

-- publisher
CREATE TABLE test_replica_id_full (x int);
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;

-- subscriber
CREATE TABLE test_replica_id_full (x int, y int);
CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x,y);
CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres port=5432' PUBLICATION tap_pub_rep_full;

-- publisher
INSERT INTO test_replica_id_full VALUES (1);
UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;

The data in subscriber:
on HEAD:
postgres=# select * from test_replica_id_full ;
 x | y
---+---
 2 |
(1 row)

After applying the patch:
postgres=# select * from test_replica_id_full ;
 x | y
---+---
 1 |
(1 row)


Ops, good catch. it seems we forgot to have:

skey[scankey_attoff].sk_flags |= SK_SEARCHNULL; 

On head, the index used for this purpose could only be the primary key or unique key on NOT NULL columns. Now,  we do allow NULL values, and need to search for them. Added that (and your test) to the updated patch.

As a semi-related note, tuples_equal() decides `true` for (NULL = NULL). I have not changed that, and it seems right in this context. Do you see any issues with that?

Also, I realized that the functions in the execReplication.c expect only btree indexes. So, I skipped others as well. If that makes sense, I can work on a follow-up patch after we can merge this, to remove some of the limitations mentioned here.

Thanks,
Onder

 
Attachment
Here are some review comments for the patch v8-0001:

======

1. Commit message

1a.
Majority of the logic on the subscriber side has already existed in the code.

SUGGESTION
The majority of the logic on the subscriber side already exists in the code.

~

1b.
Second, when REPLICA IDENTITY IS FULL on the publisher and an index is
used on the subscriber...

SUGGESTION
Second, when REPLICA IDENTITY FULL is on the publisher and an index is
used on the subscriber...

~

1c.
Still, below I try to show case the potential improvements using an
index on the subscriber
`pgbench_accounts(bid)`. With the index, the replication catches up
around ~5 seconds.
When the index is dropped, the replication takes around ~300 seconds.

"show case" -> "showcase"

~

1d.
In above text, what was meant by "catches up around ~5 seconds"?
e.g. Did it mean *improves* by ~5 seconds, or *takes* ~5 seconds?

~

1e.
// create one indxe, even on a low cardinality column

typo "indxe"

======

2. GENERAL

2a.
There are lots of single-line comments that start lowercase, but by
convention, I think they should start uppercase.

e.g. + /* we should always use at least one attribute for the index scan */
e.g. + /* we might not need this if the index is unique */
e.g. + /* avoid expensive equality check if index is unique */
e.g. + /* unrelated Path, skip */
e.g. + /* simple case, we already have an identity or pkey */
e.g. + /* indexscans are disabled, use seq. scan */
e.g. + /* target is a regular table */

~~

2b.
There are some excess blank lines between the function. By convention,
I think 1 blank line is normal, but here there are sometimes 2.

~~

2c.
There are some new function comments which include their function name
in the comment. It seemed unnecessary.

e.g. GetCheapestReplicaIdentityFullPath
e.g. FindUsableIndexForReplicaIdentityFull
e.g. LogicalRepUsableIndex

======

3. src/backend/executor/execReplication.c - build_replindex_scan_key

- int attoff;
+ int index_attoff;
+ int scankey_attoff;
  bool isnull;
  Datum indclassDatum;
  oidvector  *opclass;
  int2vector *indkey = &idxrel->rd_index->indkey;
- bool hasnulls = false;
-
- Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel) ||
-    RelationGetPrimaryKeyIndex(rel) == RelationGetRelid(idxrel));

  indclassDatum = SysCacheGetAttr(INDEXRELID, idxrel->rd_indextuple,
  Anum_pg_index_indclass, &isnull);
  Assert(!isnull);
  opclass = (oidvector *) DatumGetPointer(indclassDatum);
+ scankey_attoff = 0;

Maybe just assign scankey_attoff = 0 at the declaration?

~~~

4.

+ /*
+ * There are two cases to consider. First, if the index is a primary or
+ * unique key, we cannot have any indexes with expressions. So, at this
+ * point we are sure that the index we deal is not these.
+ */

"we deal" -> "we are dealing with" ?

~~~

5.

+ /*
+ * For a non-primary/unique index with an additional expression, do
+ * not have to continue at this point. However, the below code
+ * assumes the index scan is only done for simple column references.
+ */
+ continue;

Is this one of those comments that ought to have a "XXX" prefix as a
note for the future?

~~~

6.

- int pkattno = attoff + 1;
...
  /* Initialize the scankey. */
- ScanKeyInit(&skey[attoff],
- pkattno,
+ ScanKeyInit(&skey[scankey_attoff],
+ index_attoff + 1,
  BTEqualStrategyNumber,
Wondering if it would have been simpler if you just did:
int pkattno = index_attoff + 1;

~~~

7.

- skey[attoff].sk_flags |= SK_ISNULL;
+ skey[scankey_attoff].sk_flags |= SK_ISNULL;
+ skey[scankey_attoff].sk_flags |= SK_SEARCHNULL;

SUGGESTION
skey[scankey_attoff].sk_flags |= (SK_ISNULL | SK_SEARCHNULL)

~~~

8. src/backend/executor/execReplication.c - RelationFindReplTupleByIndex

@@ -128,28 +171,44 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
  TransactionId xwait;
  Relation idxrel;
  bool found;
+ TypeCacheEntry **eq;
+ bool indisunique;
+ int scankey_attoff;

  /* Open the index. */
  idxrel = index_open(idxoid, RowExclusiveLock);
+ indisunique = idxrel->rd_index->indisunique;
+
+ /* we might not need this if the index is unique */
+ eq = NULL;

Maybe just default assign eq = NULL in the declaration?

~~~

9.

+ scan = index_beginscan(rel, idxrel, &snap,
+    scankey_attoff, 0);

Unnecessary wrapping?

~~~

10.

+ /* we only need to allocate once */
+ if (eq == NULL)
+ eq = palloc0(sizeof(*eq) * outslot->tts_tupleDescriptor->natts);

But shouldn't you also free this 'eq' before the function returns, to
prevent leaking memory?

======

11. src/backend/replication/logical/relation.c - logicalrep_rel_open

+ /*
+ * Finding a usable index is an infrequent operation, it is performed
+ * only when first time an operation is performed on the relation or
+ * after invalidation of the relation cache entry (e.g., such as ANALYZE).
+ */

SUGGESTION (minor rewording)
Finding a usable index is an infrequent task. It is performed only
when an operation is first performed on the relation, or after
invalidation of the relation cache entry (e.g., such as ANALYZE).

~~~

12. src/backend/replication/logical/relation.c - logicalrep_partition_open

Same as comment #11 above.

~~~

13. src/backend/replication/logical/relation.c - GetIndexOidFromPath

+static
+Oid
+GetIndexOidFromPath(Path *path)

Typically I think 'static Oid' should be on one line.

~~~

14.

+ switch (path->pathtype)
+ {
+ case T_IndexScan:
+ case T_IndexOnlyScan:
+ {
+ IndexPath  *index_sc = (IndexPath *) path;
+ indexOid = index_sc->indexinfo->indexoid;
+
+ break;
+ }
+
+ default:
+ indexOid = InvalidOid;
+ }

Is there any point in using a switch statement when there is only one
functional code block?

Why not just do:

if (path->pathtype == T_IndexScan || path->pathtype == T_IndexOnlyScan)
{
...
}

return InvalidOid;

~~~

15. src/backend/replication/logical/relation.c - IndexOnlyOnExpression

+ * Returns true if the given index consist only of expressions such as:
+ * CREATE INDEX idx ON table(foo(col));

"consist" -> "consists"

~~~

16.

+IndexOnlyOnExpression(IndexInfo *indexInfo)
+{
+ int i=0;
+ for (i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)

Don't initialise 'i' twice.

~~~

17.

+ AttrNumber attnum = indexInfo->ii_IndexAttrNumbers[i];
+ if (AttributeNumberIsValid(attnum))
+ return false;
+
+ }

Spurious blank line

~~~

18. src/backend/replication/logical/relation.c -
GetCheapestReplicaIdentityFullPath

+/*
+ * Iterates over the input path list, and returns another path list
+ * where paths with non-btree indexes, partial indexes or
+ * indexes on only expressions are eliminated from the list.
+ */

"path list, and" -> "path list and"

~~~

19.

+ if (!OidIsValid(indexOid))
+ {
+ /* unrelated Path, skip */
+ suitableIndexList = lappend(suitableIndexList, path);
+ continue;
+ }
+
+ indexRelation = index_open(indexOid, AccessShareLock);
+ indexInfo = BuildIndexInfo(indexRelation);
+ is_btree_index = (indexInfo->ii_Am == BTREE_AM_OID);
+ is_partial_index = (indexInfo->ii_Predicate != NIL);
+ is_index_only_on_expression = IndexOnlyOnExpression(indexInfo);
+ index_close(indexRelation, NoLock);
+
+ if (!is_btree_index || is_partial_index || is_index_only_on_expression)
+ continue;

Maybe better to change this logic using if/else and changing the last
condition so them you can avoid having any of those 'continue' in this
loop.

~~~

20. src/backend/replication/logical/relation.c -
GetCheapestReplicaIdentityFullPath

+/*
+ * GetCheapestReplicaIdentityFullPath generates all the possible paths
+ * for the given subscriber relation, assuming that the source relation
+ * is replicated via REPLICA IDENTITY FULL.
+ *
+ * The function assumes that all the columns will be provided during
+ * the execution phase, given that REPLICA IDENTITY FULL gurantees
+ * that.
+ */

20a.
typo "gurantees"

~

20b.
The function comment neglects to say that after getting all these
paths the final function return is the cheapest one that it found.

~~~

21.

+ for (attno = 0; attno < RelationGetNumberOfAttributes(localrel); attno++)
+ {
+ Form_pg_attribute attr = TupleDescAttr(localrel->rd_att, attno);
+
+ if (attr->attisdropped)
+ {
+ continue;
+ }
+ else
+ {
+ Expr    *eq_op;

Maybe simplify by just removing the 'else' or instead just reverse the
condition of the 'if'.

~~~

22.

+ /*
+ * A sequential scan has could have been dominated by
+ * by an index scan during make_one_rel(). We should always
+ * have a sequential scan before set_cheapest().
+ */

"has could have been" -> "could have been"

~~~

23. src/backend/replication/logical/relation.c - LogicalRepUsableIndex

+static Oid
+LogicalRepUsableIndex(Relation localrel, LogicalRepRelation *remoterel)
+{
+ Oid idxoid;
+
+ /*
+ * We never need index oid for partitioned tables, always rely on leaf
+ * partition's index.
+ */
+ if (localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return InvalidOid;
+
+ /* simple case, we already have an identity or pkey */
+ idxoid = GetRelationIdentityOrPK(localrel);
+ if (OidIsValid(idxoid))
+ return idxoid;
+
+ /* indexscans are disabled, use seq. scan */
+ if (!enable_indexscan)
+ return InvalidOid;

I thought the (!enable_indexscan) fast exit perhaps should be done
first, or at least before calling GetRelationIdentityOrPK.

======

24. src/backend/replication/logical/worker.c - apply_handle_delete_internal

@@ -2034,12 +2021,14 @@ apply_handle_delete_internal(ApplyExecutionData *edata,
  EPQState epqstate;
  TupleTableSlot *localslot;
  bool found;
+ Oid usableIndexOid = usable_indexoid_internal(edata, relinfo);

  EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
  ExecOpenIndices(relinfo, false);

- found = FindReplTupleInLocalRel(estate, localrel, remoterel,
- remoteslot, &localslot);
+
+ found = FindReplTupleInLocalRel(estate, localrel, usableIndexOid,
+ remoterel, remoteslot, &localslot);

24a.
Excess blank line above FindReplTupleInLocalRel call.

~

24b.
This code is almost same in function handle_update_internal(), except
the wrapping of the params is different. Better to keep everything
consistent looking.

~~~

25. src/backend/replication/logical/worker.c - usable_indexoid_internal

+/*
+ * Decide whether we can pick an index for the relinfo (e.g., the relation)
+ * we're actually deleting/updating from. If it is a child partition of
+ * edata->targetRelInfo, find the index on the partition.
+ */
+static Oid
+usable_indexoid_internal(ApplyExecutionData *edata, ResultRelInfo *relinfo)

I'm not sure is this can maybe return InvalidOid? The function comment
should clarify it.

~~~

26.

I might be mistaken, but somehow I feel this function can be
simplified. e.g. If you have a var 'relmapentry' and let the normal
table use the initial value of that. Then I think you only need to
test for the partitioned table and reassign that var as appropriate.
It also removes the need for having 'usableIndexOid' var.

FOR EXAMPLE,

static Oid
usable_indexoid_internal(ApplyExecutionData *edata, ResultRelInfo *relinfo)
{
ResultRelInfo *targetResultRelInfo = edata->targetRelInfo;
LogicalRepRelMapEntry *relmapentry = edata->targetRel;
Oid targetrelid = targetResultRelInfo->ri_RelationDesc->rd_rel->oid;
Oid localrelid = relinfo->ri_RelationDesc->rd_id;

if (targetrelid != localrelid)
{
/*
* Target is a partitioned table, so find relmapentry of the partition.
*/
TupleConversionMap *map = relinfo->ri_RootToPartitionMap;
AttrMap    *attrmap = map ? map->attrMap : NULL;
LogicalRepPartMapEntry *part_entry =
logicalrep_partition_open(relmapentry, relinfo->ri_RelationDesc,
attrmap);

Assert(targetResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
   RELKIND_PARTITIONED_TABLE);

relmapentry = part_entry->relmapentry;
}
return relmapentry->usableIndexOid;
}

~~~

27.

+ /*
+ * Target is a partitioned table, get the index oid the partition.
+ */

SUGGESTION
Target is a partitioned table, so get the index oid of the partition.

or (see the example of comment @26)

~~~

28. src/backend/replication/logical/worker.c - FindReplTupleInLocalRel

@@ -2093,12 +2125,11 @@ FindReplTupleInLocalRel(EState *estate,
Relation localrel,

  *localslot = table_slot_create(localrel, &estate->es_tupleTable);

I think this might have been existing functionality...

The comment says "* Local tuple, if found, is returned in
'*localslot'." But the code is unconditionally doing
table_slot_create() before it even knows if a tuple was found or not.
So what about when it is NOT found - in that case shouldn't there be
some cleaning up that (unused?) table slot that got unconditionally
created?

~~~

29. src/backend/replication/logical/worker.c - apply_handle_tuple_routing

@@ -2202,13 +2233,17 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
  * suitable partition.
  */
  {
+ LogicalRepRelMapEntry *entry;
  TupleTableSlot *localslot;
  ResultRelInfo *partrelinfo_new;
  bool found;

+ entry = &part_entry->relmapentry;

Maybe just do this assignment at the entry declaration time?

~~~

30.

  /* Get the matching local tuple from the partition. */
  found = FindReplTupleInLocalRel(estate, partrel,
- &part_entry->remoterel,
+ part_entry->relmapentry.usableIndexOid,
+ &entry->remoterel,
  remoteslot_part, &localslot);
Why not use the new 'entry' var just assigned instead of repeating
part_entry->relmapentry?

SUGGESTION
found = FindReplTupleInLocalRel(estate, partrel,
entry->usableIndexOid,
&entry->remoterel,
remoteslot_part, &localslot);

~~~

31.

+ slot_modify_data(remoteslot_part, localslot, entry,
  newtup);

Unnecessary wrapping.

======

32. src/include/replication/logicalrelation.h

+typedef struct LogicalRepPartMapEntry
+{
+ Oid partoid; /* LogicalRepPartMap's key */
+ LogicalRepRelMapEntry relmapentry;
+} LogicalRepPartMapEntry;

IIUC this struct has been moved from relation.c to here. But I think
there was a large comment about this struct which maybe needs to be
moved with it (see the original relation.c).

/*
 * Partition map (LogicalRepPartMap)
 *
 * When a partitioned table is used as replication target, replicated
 * operations are actually performed on its leaf partitions, which requires
 * the partitions to also be mapped to the remote relation.  Parent's entry
 * (LogicalRepRelMapEntry) cannot be used as-is for all partitions, because
 * individual partitions may have different attribute numbers, which means
 * attribute mappings to remote relation's attributes must be maintained
 * separately for each partition.
 */

======

33. .../subscription/t/032_subscribe_use_index.pl

Typo "MULTIPILE"

This typo occurs several times...

e.g. # Testcase start: SUBSCRIPTION USES INDEX MODIFIES MULTIPILE ROWS
e.g. # Testcase end: SUBSCRIPTION USES INDEX MODIFIES MULTIPILE ROWS
e.g. # Testcase start: SUBSCRIPTION USES INDEX WITH MULTIPILE COLUMNS
e.g. # Testcase end: SUBSCRIPTION USES INDEX MODIFIES MULTIPILE ROWS

~~~

34.

# Basic test where the subscriber uses index
# and only touches multiple rows

What does "only ... multiple" mean?

This occurs several times also.

~~~

35.

+# wait for initial table synchronization to finish
+$node_subscriber->wait_for_subscription_sync;
+$node_subscriber->wait_for_subscription_sync;
+$node_subscriber->wait_for_subscription_sync;

That triple wait looks unusual. Is it deliberate?

------
Kind Regards,
Peter Smith.
Fujitsu Australia



Hi Peter, all

Thanks for the detailed review!


1. Commit message

1a.
Majority of the logic on the subscriber side has already existed in the code.

1b.
Second, when REPLICA IDENTITY IS FULL on the publisher and an index is
used on the subscriber...


1c.
Still, below I try to show case the potential improvements using an
index on the subscriber
`pgbench_accounts(bid)`. With the index, the replication catches up
around ~5 seconds.
When the index is dropped, the replication takes around ~300 seconds.

"show case" -> "showcase"

Applied your suggestions to 1a/1b/1c/

~

1d.
In above text, what was meant by "catches up around ~5 seconds"?
e.g. Did it mean *improves* by ~5 seconds, or *takes* ~5 seconds?


It "takes" 5 seconds to replicate all the changes. To be specific, I execute 'SELECT sum(abalance) FROM pgbench_accounts' on the subscriber, and try to measure the time until when all the changes are replicated. I do use the same query on the publisher to check what the query result should be when replication is done.

I updated the relevant text, does that look better?
 
~

1e.
// create one indxe, even on a low cardinality column

typo "indxe"

======

fixed.

Also, I realized that some of the comments on the commit message are stale, updated those as well.

 

2. GENERAL

2a.
There are lots of single-line comments that start lowercase, but by
convention, I think they should start uppercase.

e.g. + /* we should always use at least one attribute for the index scan */
e.g. + /* we might not need this if the index is unique */
e.g. + /* avoid expensive equality check if index is unique */
e.g. + /* unrelated Path, skip */
e.g. + /* simple case, we already have an identity or pkey */
e.g. + /* indexscans are disabled, use seq. scan */
e.g. + /* target is a regular table */

~~

Thanks for noting this, I didn't realize that there is a strict requirement on this. Updated all of your suggestions, and realized one more such case.

Is there documentation where such conventions are listed? I couldn't find any.
 

2b.
There are some excess blank lines between the function. By convention,
I think 1 blank line is normal, but here there are sometimes 2.

~~

Updated as well.
 

2c.
There are some new function comments which include their function name
in the comment. It seemed unnecessary.

e.g. GetCheapestReplicaIdentityFullPath
e.g. FindUsableIndexForReplicaIdentityFull
e.g. LogicalRepUsableIndex

======

Fixed this as well.
 

3. src/backend/executor/execReplication.c - build_replindex_scan_key

- int attoff;
+ int index_attoff;
+ int scankey_attoff;
  bool isnull;
  Datum indclassDatum;
  oidvector  *opclass;
  int2vector *indkey = &idxrel->rd_index->indkey;
- bool hasnulls = false;
-
- Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel) ||
-    RelationGetPrimaryKeyIndex(rel) == RelationGetRelid(idxrel));

  indclassDatum = SysCacheGetAttr(INDEXRELID, idxrel->rd_indextuple,
  Anum_pg_index_indclass, &isnull);
  Assert(!isnull);
  opclass = (oidvector *) DatumGetPointer(indclassDatum);
+ scankey_attoff = 0;

Maybe just assign scankey_attoff = 0 at the declaration?


Again, lack of coding convention knowledge :/ My observation is that it is often not assigned during the declaration. But, changed this one.
 
~~~

4.

+ /*
+ * There are two cases to consider. First, if the index is a primary or
+ * unique key, we cannot have any indexes with expressions. So, at this
+ * point we are sure that the index we deal is not these.
+ */

"we deal" -> "we are dealing with" ?

makes sense
 
~~~

5.

+ /*
+ * For a non-primary/unique index with an additional expression, do
+ * not have to continue at this point. However, the below code
+ * assumes the index scan is only done for simple column references.
+ */
+ continue;

Is this one of those comments that ought to have a "XXX" prefix as a
note for the future?

Makes sense
 

~~~

6.

- int pkattno = attoff + 1;
...
  /* Initialize the scankey. */
- ScanKeyInit(&skey[attoff],
- pkattno,
+ ScanKeyInit(&skey[scankey_attoff],
+ index_attoff + 1,
  BTEqualStrategyNumber,
Wondering if it would have been simpler if you just did:
int pkattno = index_attoff + 1;


The index is not necessarily the primary key at this point, that's why I removed it. 

There are already 3 variables in the same function index_attoff, scankey_attoff and table_attno, which are hard to avoid. But, this one seemed ok to avoid, mostly to simplify the readability. Do you think it is better with the additional variable? Still, I think we need a better name as "pk" is not relevant anymore.
 

~~~

7.

- skey[attoff].sk_flags |= SK_ISNULL;
+ skey[scankey_attoff].sk_flags |= SK_ISNULL;
+ skey[scankey_attoff].sk_flags |= SK_SEARCHNULL;

SUGGESTION
skey[scankey_attoff].sk_flags |= (SK_ISNULL | SK_SEARCHNULL)


looks good, changed
 
~~~

8. src/backend/executor/execReplication.c - RelationFindReplTupleByIndex

@@ -128,28 +171,44 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
  TransactionId xwait;
  Relation idxrel;
  bool found;
+ TypeCacheEntry **eq;
+ bool indisunique;
+ int scankey_attoff;

  /* Open the index. */
  idxrel = index_open(idxoid, RowExclusiveLock);
+ indisunique = idxrel->rd_index->indisunique;
+
+ /* we might not need this if the index is unique */
+ eq = NULL;

Maybe just default assign eq = NULL in the declaration?


Again, I wasn't sure if it is OK regarding the coding convention to assign during the declaration. Changed now.
 
~~~

9.

+ scan = index_beginscan(rel, idxrel, &snap,
+    scankey_attoff, 0);

Unnecessary wrapping?


Seems so, changed
 
~~~

10.

+ /* we only need to allocate once */
+ if (eq == NULL)
+ eq = palloc0(sizeof(*eq) * outslot->tts_tupleDescriptor->natts);

But shouldn't you also free this 'eq' before the function returns, to
prevent leaking memory?


Two notes here. First, this is allocated inside ApplyMessageContext, which seems to be reset per tuple change. So, that seems like a good boundary to keep this allocation in memory.

Second, RelationFindReplTupleSeq() doesn't free the same allocation roughly at a very similar call stack. That's why I decided not to pfree. Do you see strong reason to pfree at this point? Then we should probably change that for RelationFindReplTupleSeq() as well.

 
======

11. src/backend/replication/logical/relation.c - logicalrep_rel_open

+ /*
+ * Finding a usable index is an infrequent operation, it is performed
+ * only when first time an operation is performed on the relation or
+ * after invalidation of the relation cache entry (e.g., such as ANALYZE).
+ */

SUGGESTION (minor rewording)
Finding a usable index is an infrequent task. It is performed only
when an operation is first performed on the relation, or after
invalidation of the relation cache entry (e.g., such as ANALYZE).

~~~

makes sense, applied
 
12. src/backend/replication/logical/relation.c - logicalrep_partition_open

Same as comment #11 above.

 
done

 
~~~

13. src/backend/replication/logical/relation.c - GetIndexOidFromPath

+static
+Oid
+GetIndexOidFromPath(Path *path)

Typically I think 'static Oid' should be on one line.

done 


~~~

14.

+ switch (path->pathtype)
+ {
+ case T_IndexScan:
+ case T_IndexOnlyScan:
+ {
+ IndexPath  *index_sc = (IndexPath *) path;
+ indexOid = index_sc->indexinfo->indexoid;
+
+ break;
+ }
+
+ default:
+ indexOid = InvalidOid;
+ }

Is there any point in using a switch statement when there is only one
functional code block?

Why not just do:

if (path->pathtype == T_IndexScan || path->pathtype == T_IndexOnlyScan)
{
...
}

return InvalidOid;

~~~

Good point, in the first iterations of the patch, we also had Bitmap scans here. Now, the switch is redundant, applied your suggestion.
 

15. src/backend/replication/logical/relation.c - IndexOnlyOnExpression

+ * Returns true if the given index consist only of expressions such as:
+ * CREATE INDEX idx ON table(foo(col));

"consist" -> "consists"

~~~

fixed
 

16.

+IndexOnlyOnExpression(IndexInfo *indexInfo)
+{
+ int i=0;
+ for (i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)

Don't initialise 'i' twice.

~~~

fixed 
 

17.

+ AttrNumber attnum = indexInfo->ii_IndexAttrNumbers[i];
+ if (AttributeNumberIsValid(attnum))
+ return false;
+
+ }

Spurious blank line

~~~

fixed
 

18. src/backend/replication/logical/relation.c -
GetCheapestReplicaIdentityFullPath

+/*
+ * Iterates over the input path list, and returns another path list
+ * where paths with non-btree indexes, partial indexes or
+ * indexes on only expressions are eliminated from the list.
+ */

"path list, and" -> "path list and"

~~~

fixed
 

19.

+ if (!OidIsValid(indexOid))
+ {
+ /* unrelated Path, skip */
+ suitableIndexList = lappend(suitableIndexList, path);
+ continue;
+ }
+
+ indexRelation = index_open(indexOid, AccessShareLock);
+ indexInfo = BuildIndexInfo(indexRelation);
+ is_btree_index = (indexInfo->ii_Am == BTREE_AM_OID);
+ is_partial_index = (indexInfo->ii_Predicate != NIL);
+ is_index_only_on_expression = IndexOnlyOnExpression(indexInfo);
+ index_close(indexRelation, NoLock);
+
+ if (!is_btree_index || is_partial_index || is_index_only_on_expression)
+ continue;

Maybe better to change this logic using if/else and changing the last
condition so them you can avoid having any of those 'continue' in this
loop.

Yes, it makes sense. It is good to avoid `continue` in the loop.
 

~~~

20. src/backend/replication/logical/relation.c -
GetCheapestReplicaIdentityFullPath

+/*
+ * GetCheapestReplicaIdentityFullPath generates all the possible paths
+ * for the given subscriber relation, assuming that the source relation
+ * is replicated via REPLICA IDENTITY FULL.
+ *
+ * The function assumes that all the columns will be provided during
+ * the execution phase, given that REPLICA IDENTITY FULL gurantees
+ * that.
+ */

20a.
typo "gurantees"

~
 
Fixed, for future patches I'll do a more thorough review on these myself. Sorry for all these typos & convention errors!
 
20b.
The function comment neglects to say that after getting all these
paths the final function return is the cheapest one that it found.

~~~

Improved the comment a bit
 

21.

+ for (attno = 0; attno < RelationGetNumberOfAttributes(localrel); attno++)
+ {
+ Form_pg_attribute attr = TupleDescAttr(localrel->rd_att, attno);
+
+ if (attr->attisdropped)
+ {
+ continue;
+ }
+ else
+ {
+ Expr    *eq_op;

Maybe simplify by just removing the 'else' or instead just reverse the
condition of the 'if'.

~~~

I like the second suggestion more, as the `!attr->attisdropped` code block has local declarations, so keeping them local to that block seems easier to follow.
 

22.

+ /*
+ * A sequential scan has could have been dominated by
+ * by an index scan during make_one_rel(). We should always
+ * have a sequential scan before set_cheapest().
+ */

"has could have been" -> "could have been"

~~~

An interesting grammar I had :) Fixed 
 

23. src/backend/replication/logical/relation.c - LogicalRepUsableIndex

+static Oid
+LogicalRepUsableIndex(Relation localrel, LogicalRepRelation *remoterel)
+{
+ Oid idxoid;
+
+ /*
+ * We never need index oid for partitioned tables, always rely on leaf
+ * partition's index.
+ */
+ if (localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+ return InvalidOid;
+
+ /* simple case, we already have an identity or pkey */
+ idxoid = GetRelationIdentityOrPK(localrel);
+ if (OidIsValid(idxoid))
+ return idxoid;
+
+ /* indexscans are disabled, use seq. scan */
+ if (!enable_indexscan)
+ return InvalidOid;

I thought the (!enable_indexscan) fast exit perhaps should be done
first, or at least before calling GetRelationIdentityOrPK.

This is actually a point where I need some more feedback. On HEAD, even if the index scan is disabled, we use the index. For this one, (a) I didn't want to change the behavior for existing users (b) want to have a way to disable this feature, and enable_indexscan seems like a good one.

Do you think I should dare to move it above GetRelationIdentityOrPK()? Or, maybe I just need more comments? I improved the comment, and it would be nice to hear your thoughts on this.


======

24. src/backend/replication/logical/worker.c - apply_handle_delete_internal

@@ -2034,12 +2021,14 @@ apply_handle_delete_internal(ApplyExecutionData *edata,
  EPQState epqstate;
  TupleTableSlot *localslot;
  bool found;
+ Oid usableIndexOid = usable_indexoid_internal(edata, relinfo);

  EvalPlanQualInit(&epqstate, estate, NULL, NIL, -1);
  ExecOpenIndices(relinfo, false);

- found = FindReplTupleInLocalRel(estate, localrel, remoterel,
- remoteslot, &localslot);
+
+ found = FindReplTupleInLocalRel(estate, localrel, usableIndexOid,
+ remoterel, remoteslot, &localslot);

24a.
Excess blank line above FindReplTupleInLocalRel call.

Fixed 
~

24b.
This code is almost same in function handle_update_internal(), except
the wrapping of the params is different. Better to keep everything
consistent looking.


Hmm, I have not changed how they look because they have one variable difference (&relmapentry->remoterel vs remoterel), which requires the indentation to be slightly difference. So, I either need a new variable or keep them as-is?

 
~~~

25. src/backend/replication/logical/worker.c - usable_indexoid_internal

+/*
+ * Decide whether we can pick an index for the relinfo (e.g., the relation)
+ * we're actually deleting/updating from. If it is a child partition of
+ * edata->targetRelInfo, find the index on the partition.
+ */
+static Oid
+usable_indexoid_internal(ApplyExecutionData *edata, ResultRelInfo *relinfo)

I'm not sure is this can maybe return InvalidOid? The function comment
should clarify it.


Improved the comment
 
~~~

26.

I might be mistaken, but somehow I feel this function can be
simplified. e.g. If you have a var 'relmapentry' and let the normal
table use the initial value of that. Then I think you only need to
test for the partitioned table and reassign that var as appropriate.
It also removes the need for having 'usableIndexOid' var.

FOR EXAMPLE,

static Oid
usable_indexoid_internal(ApplyExecutionData *edata, ResultRelInfo *relinfo)
{
ResultRelInfo *targetResultRelInfo = edata->targetRelInfo;
LogicalRepRelMapEntry *relmapentry = edata->targetRel;
Oid targetrelid = targetResultRelInfo->ri_RelationDesc->rd_rel->oid;
Oid localrelid = relinfo->ri_RelationDesc->rd_id;

if (targetrelid != localrelid)
{
/*
* Target is a partitioned table, so find relmapentry of the partition.
*/
TupleConversionMap *map = relinfo->ri_RootToPartitionMap;
AttrMap    *attrmap = map ? map->attrMap : NULL;
LogicalRepPartMapEntry *part_entry =
logicalrep_partition_open(relmapentry, relinfo->ri_RelationDesc,
attrmap);

Assert(targetResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
   RELKIND_PARTITIONED_TABLE);

relmapentry = part_entry->relmapentry;
}
return relmapentry->usableIndexOid;
}

~~~

True, that simplifies the function, applied.
 

27.

+ /*
+ * Target is a partitioned table, get the index oid the partition.
+ */

SUGGESTION
Target is a partitioned table, so get the index oid of the partition.

or (see the example of comment @26)


Applied
 
~~~

28. src/backend/replication/logical/worker.c - FindReplTupleInLocalRel

@@ -2093,12 +2125,11 @@ FindReplTupleInLocalRel(EState *estate,
Relation localrel,

  *localslot = table_slot_create(localrel, &estate->es_tupleTable);

I think this might have been existing functionality...

The comment says "* Local tuple, if found, is returned in
'*localslot'." But the code is unconditionally doing
table_slot_create() before it even knows if a tuple was found or not.
So what about when it is NOT found - in that case shouldn't there be
some cleaning up that (unused?) table slot that got unconditionally
created?


This sounds accurate. But I guess it may not have been considered critical as we are operating in the ApplyMessageContext? Tha is going to be freed once a single tuple is dispatched.

I have a slight preference not to do it in this patch, but if you think otherwise let me know.
 
~~~

29. src/backend/replication/logical/worker.c - apply_handle_tuple_routing

@@ -2202,13 +2233,17 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
  * suitable partition.
  */
  {
+ LogicalRepRelMapEntry *entry;
  TupleTableSlot *localslot;
  ResultRelInfo *partrelinfo_new;
  bool found;

+ entry = &part_entry->relmapentry;

Maybe just do this assignment at the entry declaration time?


done
 
~~~

30.

  /* Get the matching local tuple from the partition. */
  found = FindReplTupleInLocalRel(estate, partrel,
- &part_entry->remoterel,
+ part_entry->relmapentry.usableIndexOid,
+ &entry->remoterel,
  remoteslot_part, &localslot);
Why not use the new 'entry' var just assigned instead of repeating
part_entry->relmapentry?

SUGGESTION
found = FindReplTupleInLocalRel(estate, partrel,
entry->usableIndexOid,
&entry->remoterel,
remoteslot_part, &localslot);

~~~

Yes, looks better, changed
 
31.

+ slot_modify_data(remoteslot_part, localslot, entry,
  newtup);

Unnecessary wrapping.

======

I think I have not changed this, but fixed anyway
 

32. src/include/replication/logicalrelation.h

+typedef struct LogicalRepPartMapEntry
+{
+ Oid partoid; /* LogicalRepPartMap's key */
+ LogicalRepRelMapEntry relmapentry;
+} LogicalRepPartMapEntry;

IIUC this struct has been moved from relation.c to here. But I think
there was a large comment about this struct which maybe needs to be
moved with it (see the original relation.c).

/*
 * Partition map (LogicalRepPartMap)
 *
 * When a partitioned table is used as replication target, replicated
 * operations are actually performed on its leaf partitions, which requires
 * the partitions to also be mapped to the remote relation.  Parent's entry
 * (LogicalRepRelMapEntry) cannot be used as-is for all partitions, because
 * individual partitions may have different attribute numbers, which means
 * attribute mappings to remote relation's attributes must be maintained
 * separately for each partition.
 */

======
Oh, seems so, moved.
 

33. .../subscription/t/032_subscribe_use_index.pl

Typo "MULTIPILE"

This typo occurs several times...

e.g. # Testcase start: SUBSCRIPTION USES INDEX MODIFIES MULTIPILE ROWS
e.g. # Testcase end: SUBSCRIPTION USES INDEX MODIFIES MULTIPILE ROWS
e.g. # Testcase start: SUBSCRIPTION USES INDEX WITH MULTIPILE COLUMNS
e.g. # Testcase end: SUBSCRIPTION USES INDEX MODIFIES MULTIPILE ROWS

~~~

 
Yep :/ Fixed now
 
34.

# Basic test where the subscriber uses index
# and only touches multiple rows

What does "only ... multiple" mean?

This occurs several times also.


Ah, in the earlier iterations, the tests were updating/deleting 1 row. Lately, I changed it to multiple rows, just to have more coverage. I guess the discrepancy is because of that. Updated now. 
 
~~~

35.

+# wait for initial table synchronization to finish
+$node_subscriber->wait_for_subscription_sync;
+$node_subscriber->wait_for_subscription_sync;
+$node_subscriber->wait_for_subscription_sync;

That triple wait looks unusual. Is it deliberate?

Ah, not really. Removed.

Thanks,
Onder
 
Attachment
Hi Onder,

Since you ask me several questions [1], this post is just for answering those.

I have looked again at the latest v9 patch, but I will post my review
comments for that separately.


On Thu, Aug 25, 2022 at 7:09 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>> 1d.
>> In above text, what was meant by "catches up around ~5 seconds"?
>> e.g. Did it mean *improves* by ~5 seconds, or *takes* ~5 seconds?
>>
>
> It "takes" 5 seconds to replicate all the changes. To be specific, I execute 'SELECT sum(abalance) FROM
pgbench_accounts'on the subscriber, and try to measure the time until when all the changes are replicated. I do use the
samequery on the publisher to check what the query result should be when replication is done. 
>
> I updated the relevant text, does that look better?

Yes.

>> 2. GENERAL
>>
>> 2a.
>> There are lots of single-line comments that start lowercase, but by
>> convention, I think they should start uppercase.
>>
>> e.g. + /* we should always use at least one attribute for the index scan */
>> e.g. + /* we might not need this if the index is unique */
>> e.g. + /* avoid expensive equality check if index is unique */
>> e.g. + /* unrelated Path, skip */
>> e.g. + /* simple case, we already have an identity or pkey */
>> e.g. + /* indexscans are disabled, use seq. scan */
>> e.g. + /* target is a regular table */
>>
>> ~~
>
>
> Thanks for noting this, I didn't realize that there is a strict requirement on this. Updated all of your suggestions,
andrealized one more such case. 
>
> Is there documentation where such conventions are listed? I couldn't find any.

I don’t know of any strict requirements, but I did think it was the
more common practice to make the comments look like proper sentences.
However, when I tried to prove that by counting the single-line
comments in PG code it seems to be split almost 50:50
lowercase/uppercase, so I guess you should just do whatever is most
sensible or is most consistent with the surrounding code ….

Counts for single line /* */ comments:
regex ^\s*\/\*\s[a-z]+.*\*\/$  = 18222 results
regex ^\s*\/\*\s[A-Z]+.*\*\/$ = 20252 results

>> 3. src/backend/executor/execReplication.c - build_replindex_scan_key
>>
>> - int attoff;
>> + int index_attoff;
>> + int scankey_attoff;
>>   bool isnull;
>>   Datum indclassDatum;
>>   oidvector  *opclass;
>>   int2vector *indkey = &idxrel->rd_index->indkey;
>> - bool hasnulls = false;
>> -
>> - Assert(RelationGetReplicaIndex(rel) == RelationGetRelid(idxrel) ||
>> -    RelationGetPrimaryKeyIndex(rel) == RelationGetRelid(idxrel));
>>
>>   indclassDatum = SysCacheGetAttr(INDEXRELID, idxrel->rd_indextuple,
>>   Anum_pg_index_indclass, &isnull);
>>   Assert(!isnull);
>>   opclass = (oidvector *) DatumGetPointer(indclassDatum);
>> + scankey_attoff = 0;
>>
>> Maybe just assign scankey_attoff = 0 at the declaration?
>>
>
> Again, lack of coding convention knowledge :/ My observation is that it is often not assigned during the declaration.
But,changed this one. 
>

I don’t know of any convention. Probably this is just my own
preference to keep the simple default assignments with the declaration
to reduce the LOC. YMMV.

>>
>> 6.
>>
>> - int pkattno = attoff + 1;
>> ...
>>   /* Initialize the scankey. */
>> - ScanKeyInit(&skey[attoff],
>> - pkattno,
>> + ScanKeyInit(&skey[scankey_attoff],
>> + index_attoff + 1,
>>   BTEqualStrategyNumber,
>> Wondering if it would have been simpler if you just did:
>> int pkattno = index_attoff + 1;
>
>
>
> The index is not necessarily the primary key at this point, that's why I removed it.
>
> There are already 3 variables in the same function index_attoff, scankey_attoff and table_attno, which are hard to
avoid.But, this one seemed ok to avoid, mostly to simplify the readability. Do you think it is better with the
additionalvariable? Still, I think we need a better name as "pk" is not relevant anymore. 
>

Your code is fine. Leave it as-is.

>> 8. src/backend/executor/execReplication.c - RelationFindReplTupleByIndex
>>
>> @@ -128,28 +171,44 @@ RelationFindReplTupleByIndex(Relation rel, Oid idxoid,
>>   TransactionId xwait;
>>   Relation idxrel;
>>   bool found;
>> + TypeCacheEntry **eq;
>> + bool indisunique;
>> + int scankey_attoff;
>>
>>   /* Open the index. */
>>   idxrel = index_open(idxoid, RowExclusiveLock);
>> + indisunique = idxrel->rd_index->indisunique;
>> +
>> + /* we might not need this if the index is unique */
>> + eq = NULL;
>>
>> Maybe just default assign eq = NULL in the declaration?
>>
>
> Again, I wasn't sure if it is OK regarding the coding convention to assign during the declaration. Changed now.
>

Same as #3.

>> 10.
>>
>> + /* we only need to allocate once */
>> + if (eq == NULL)
>> + eq = palloc0(sizeof(*eq) * outslot->tts_tupleDescriptor->natts);
>>
>> But shouldn't you also free this 'eq' before the function returns, to
>> prevent leaking memory?
>>
>
> Two notes here. First, this is allocated inside ApplyMessageContext, which seems to be reset per tuple change. So,
thatseems like a good boundary to keep this allocation in memory. 
>

OK, fair enough. Is it worth adding a comment to say that or not?

> Second, RelationFindReplTupleSeq() doesn't free the same allocation roughly at a very similar call stack. That's why
Idecided not to pfree. Do you see strong reason to pfree at this point? Then we should probably change that for
RelationFindReplTupleSeq()as well. 
>
>>
>> 23. src/backend/replication/logical/relation.c - LogicalRepUsableIndex
>>
>> +static Oid
>> +LogicalRepUsableIndex(Relation localrel, LogicalRepRelation *remoterel)
>> +{
>> + Oid idxoid;
>> +
>> + /*
>> + * We never need index oid for partitioned tables, always rely on leaf
>> + * partition's index.
>> + */
>> + if (localrel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
>> + return InvalidOid;
>> +
>> + /* simple case, we already have an identity or pkey */
>> + idxoid = GetRelationIdentityOrPK(localrel);
>> + if (OidIsValid(idxoid))
>> + return idxoid;
>> +
>> + /* indexscans are disabled, use seq. scan */
>> + if (!enable_indexscan)
>> + return InvalidOid;
>>
>> I thought the (!enable_indexscan) fast exit perhaps should be done
>> first, or at least before calling GetRelationIdentityOrPK.
>
>
> This is actually a point where I need some more feedback. On HEAD, even if the index scan is disabled, we use the
index.For this one, (a) I didn't want to change the behavior for existing users (b) want to have a way to disable this
feature,and enable_indexscan seems like a good one. 
>
> Do you think I should dare to move it above GetRelationIdentityOrPK()? Or, maybe I just need more comments? I
improvedthe comment, and it would be nice to hear your thoughts on this. 

I agree with you it is maybe best not to cause any changes in
behaviour. If the behaviour is unwanted then it should be changed
independently of this patch anyhow.

>> 24b.
>> This code is almost same in function handle_update_internal(), except
>> the wrapping of the params is different. Better to keep everything
>> consistent looking.
>>
>
> Hmm, I have not changed how they look because they have one variable difference (&relmapentry->remoterel vs
remoterel),which requires the indentation to be slightly difference. So, I either need a new variable or keep them
as-is?

OK. Keep code as-is.

>>
>> 28. src/backend/replication/logical/worker.c - FindReplTupleInLocalRel
>>
>> @@ -2093,12 +2125,11 @@ FindReplTupleInLocalRel(EState *estate,
>> Relation localrel,
>>
>>   *localslot = table_slot_create(localrel, &estate->es_tupleTable);
>>
>> I think this might have been existing functionality...
>>
>> The comment says "* Local tuple, if found, is returned in
>> '*localslot'." But the code is unconditionally doing
>> table_slot_create() before it even knows if a tuple was found or not.
>> So what about when it is NOT found - in that case shouldn't there be
>> some cleaning up that (unused?) table slot that got unconditionally
>> created?
>>
>
> This sounds accurate. But I guess it may not have been considered critical as we are operating in the
ApplyMessageContext?Tha is going to be freed once a single tuple is dispatched. 
>
> I have a slight preference not to do it in this patch, but if you think otherwise let me know.

I agree. Maybe this is not even a leak worth bothering about if it is
only in the short-lived ApplyMessageContext like you say. Anyway,
AFAIK this was already in existing code, so a fix (if any) would
belong in a different patch to this one.

>> 31.
>>
>> + slot_modify_data(remoteslot_part, localslot, entry,
>>   newtup);
>>
>> Unnecessary wrapping.
>>
>> ======
>
>
> I think I have not changed this, but fixed anyway

Hmm - I don't see that you changed this, but anyway I guess you
shouldn't be fixing wrapping problems unless this patch caused them.

------
[1] https://www.postgresql.org/message-id/CACawEhXbw%3D%3DK02v3%3DnHFEAFJqegx0b4r2J%2BFtXtKFkJeE6R95Q%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



Here are some review comments for the patch v9-0001:

======

1. Commit message

1a.
With this patch, I'm proposing the following change: If there is an
index on the subscriber, use the index as long as the planner
sub-modules picks any index over sequential scan. The index should be
a btree index, not a partital index. Finally, the index should have at
least one column reference (e.g., cannot consists of only
expressions).

SUGGESTION
With this patch, I'm proposing the following change: If there is any
index on the subscriber, let the planner sub-modules compare the costs
of index versus sequential scan and choose the cheapest. The index
should be a btree index, not a partial index, and it should have at
least one column reference (e.g., cannot consist of only expressions).

~

1b.
The Majority of the logic on the subscriber side exists in the code.

"exists" -> "already exists"

~

1c.
psql -c "truncate pgbench_accounts;" -p 9700 postgres

"truncate" -> "TRUNCATE"

~

1d.
Try to wrap this message text at 80 char width.

======

2. src/backend/replication/logical/relation.c - logicalrep_rel_open

+ /*
+ * Finding a usable index is an infrequent task. It is performed
+ * when an operation is first performed on the relation, or after
+ * invalidation of the relation cache entry (e.g., such as ANALYZE).
+ */
+ entry->usableIndexOid = LogicalRepUsableIndex(entry->localrel, remoterel);

Seemed a bit odd to say "performed" 2x in the same sentence.

"It is performed when..." -> "It occurs when...” (?)

~~~

3. src/backend/replication/logical/relation.c - logicalrep_partition_open

+ /*
+ * Finding a usable index is an infrequent task. It is performed
+ * when an operation is first performed on the relation, or after
+ * invalidation of the relation cache entry (e.g., such as ANALYZE).
+ */
+ part_entry->relmapentry.usableIndexOid =
+ LogicalRepUsableIndex(partrel, remoterel);

3a.
Same as comment #2 above.

~

3b.
The jumping between 'part_entry' and 'entry' is confusing. Since
'entry' is already assigned to be &part_entry->relmapentry can't you
use that here?

SUGGESTION
entry->usableIndexOid = LogicalRepUsableIndex(partrel, remoterel);

~~~

4. src/backend/replication/logical/relation.c - GetIndexOidFromPath

+/*
+ * Returns a valid index oid if the input path is an index path.
+ * Otherwise, return invalid oid.
+ */
+static Oid
+GetIndexOidFromPath(Path *path)

Perhaps may this function comment more consistent with others (like
GetRelationIdentityOrPK, LogicalRepUsableIndex) and refer to the
InvalidOid.

SUGGESTION
/*
 * Returns a valid index oid if the input path is an index path.
 *
 * Otherwise, returns InvalidOid.
 */

~~~

5. src/backend/replication/logical/relation.c - IndexOnlyOnExpression

+bool
+IndexOnlyOnExpression(IndexInfo *indexInfo)
+{
+ int i;
+ for (i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
+ {
+ AttrNumber attnum = indexInfo->ii_IndexAttrNumbers[i];
+ if (AttributeNumberIsValid(attnum))
+ return false;
+ }
+
+ return true;
+}

5a.
Add a blank line after those declarations.

~

5b.
AFAIK the C99 style for loop declarations should be OK [1] for new
code, so declaring like below would be cleaner:

for (int i = 0; ...

~~~

6. src/backend/replication/logical/relation.c -
FilterOutNotSuitablePathsForReplIdentFull

+/*
+ * Iterates over the input path list and returns another path list
+ * where paths with non-btree indexes, partial indexes or
+ * indexes on only expressions are eliminated from the list.
+ */
+static List *
+FilterOutNotSuitablePathsForReplIdentFull(List *pathlist)

"are eliminated from the list." -> "have been removed."

~~~

7.

+ foreach(lc, pathlist)
+ {
+ Path    *path = (Path *) lfirst(lc);
+ Oid indexOid = GetIndexOidFromPath(path);
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ bool is_btree;
+ bool is_partial;
+ bool is_only_on_expression;
+
+ if (!OidIsValid(indexOid))
+ {
+ /* Unrelated Path, skip */
+ suitableIndexList = lappend(suitableIndexList, path);
+ }
+ else
+ {
+ indexRelation = index_open(indexOid, AccessShareLock);
+ indexInfo = BuildIndexInfo(indexRelation);
+ is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
+ is_partial = (indexInfo->ii_Predicate != NIL);
+ is_only_on_expression = IndexOnlyOnExpression(indexInfo);
+ index_close(indexRelation, NoLock);
+
+ if (is_btree && !is_partial && !is_only_on_expression)
+ suitableIndexList = lappend(suitableIndexList, path);
+ }
+ }

I think most of those variables are only used in the "else" block so
maybe it's better to declare them at that scope.

+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ bool is_btree;
+ bool is_partial;
+ bool is_only_on_expression;

~~~

8. src/backend/replication/logical/relation.c -
GetCheapestReplicaIdentityFullPath

+ * Indexes that consists of only expressions (e.g.,
+ * no simple column references on the index) are also
+ * eliminated with a similar reasoning.

"consists" -> "consist"

"with a similar reasoning" -> "with similar reasoning"

~~~

9.

+ * We also eliminate non-btree indexes, which could be relaxed
+ * if needed. If we allow non-btree indexes, we should adjust
+ * RelationFindReplTupleByIndex() to support such indexes.

This looks like another of those kinds of comments that should have
"XXX" prefix as a note to the future.

~~~

10. src/backend/replication/logical/relation.c -
FindUsableIndexForReplicaIdentityFull

+/*
+ * Returns an index oid if the planner submodules picks index scans
+ * over sequential scan.

10a
"picks" -> "pick"

~

10b.
Maybe this should also say ", otherwise returns InvalidOid" (?)

~~~

11.

+FindUsableIndexForReplicaIdentityFull(Relation localrel)
+{
+ MemoryContext usableIndexContext;
+ MemoryContext oldctx;
+ Path *cheapest_total_path;
+ Oid indexOid;

In the following function, and in the one after that, you've named the
index Oid as 'idxoid' (not 'indexOid'). IMO it's better to use
consistent naming everywhere.

~~~

12. src/backend/replication/logical/relation.c - GetRelationIdentityOrPK

12a.
I wondered what is the benefit of having this function. IIUC it is
only called from one place (LogicalRepUsableIndex) and IMO the code
would probably be easier if you just inline this logic in that
function...

~

12b.
+/*
+ * Get replica identity index or if it is not defined a primary key.
+ *
+ * If neither is defined, returns InvalidOid
+ */

If you want to keep the function for some reason (e.g. see #12a) then
I thought the function comment could be better.

SUGGESTION
/*
 * Returns OID of the relation's replica identity index, or OID of the
 * relation's primary key index.
 *
 * If neither is defined, returns InvalidOid.
 */

~~~

13. src/backend/replication/logical/relation.c - LogicalRepUsableIndex

For some reason, I feel this function should be called
FindLogicalRepUsableIndex (or similar), because it seems more
consistent with the others which might return the Oid or might return
InvalidOid...

~~~

14.

+ /*
+ * Index scans are disabled, use sequential scan. Note that we do allow
+ * index scans when there is a primary key or unique index replica
+ * identity. That is the legacy behavior so we hesitate to move this check
+ * above.
+ */

Perhaps a slight rephrasing of that comment?

SUGGESTION
If index scans are disabled, use a sequential scan.

Note that we still allowed index scans above when there is a primary
key or unique index replica identity, but that is the legacy behaviour
(even when enable_indexscan is false), so we hesitate to move this
enable_indexscan check to be done earlier in this function.

~~~

15.

+ * If we had a primary key or relation identity with a unique index,
+ * we would have already found a valid oid. At this point, the remote
+ * relation has replica identity full and we have at least one local
+ * index defined.

"would have already found a valid oid." -> "would have already found
and returned that oid."

======

16. src/backend/replication/logical/worker.c - usable_indexoid_internal

+/*
+ * Decide whether we can pick an index for the relinfo (e.g., the relation)
+ * we're actually deleting/updating from. If it is a child partition of
+ * edata->targetRelInfo, find the index on the partition.
+ *
+ * Note that if the corresponding relmapentry has InvalidOid usableIndexOid,
+ * the function returns InvalidOid. In that case, the tuple is used via
+ * sequential execution.
+ */
+static Oid
+usable_indexoid_internal(ApplyExecutionData *edata, ResultRelInfo *relinfo)

I am not sure this is the right place to be saying that last sentence
("In that case, the tuple is used via sequential execution.") because
it's up to the *calling* code to decide what to do if InvalidOid is
returned

======

17. src/include/replication/logicalrelation.h

@ -31,20 +32,40 @@ typedef struct LogicalRepRelMapEntry
  Relation localrel; /* relcache entry (NULL when closed) */
  AttrMap    *attrmap; /* map of local attributes to remote ones */
  bool updatable; /* Can apply updates/deletes? */
+ Oid usableIndexOid; /* which index to use? (Invalid when no index
+ * used) */

SUGGESTION (for the comment)
which index to use, or InvalidOid if none

~~~

18.

+/*
+ * Partition map (LogicalRepPartMap)
+ *
+ * When a partitioned table is used as replication target, replicated
+ * operations are actually performed on its leaf partitions, which requires
+ * the partitions to also be mapped to the remote relation.  Parent's entry
+ * (LogicalRepRelMapEntry) cannot be used as-is for all partitions, because
+ * individual partitions may have different attribute numbers, which means
+ * attribute mappings to remote relation's attributes must be maintained
+ * separately for each partition.
+ */
+typedef struct LogicalRepPartMapEntry

Something feels not quite right using the (unchanged) comment about
the Partition map which was removed from where it was originally in
relation.c.

The reason I am unsure is that this comment is still referring to the
"LogicalRepPartMap", which is not here but is declared static in
relation.c. Maybe the quick/easy fix would be to just change the first
line to say: "Partition map (see LogicalRepPartMap in relation.c)".
OTOH, I'm not sure if some part of this comment still needs to be left
in relation.c (??)


------
[1] https://www.postgresql.org/docs/devel/source-conventions.html

Kind Regards,
Peter Smith.
Fujitsu Australia



Hi Peter,

Thanks for the reviews! I'll reply to both of your reviews separately. 


>> 10.
>>
>> + /* we only need to allocate once */
>> + if (eq == NULL)
>> + eq = palloc0(sizeof(*eq) * outslot->tts_tupleDescriptor->natts);
>>
>> But shouldn't you also free this 'eq' before the function returns, to
>> prevent leaking memory?
>>
>
> Two notes here. First, this is allocated inside ApplyMessageContext, which seems to be reset per tuple change. So, that seems like a good boundary to keep this allocation in memory.
>

OK, fair enough. Is it worth adding a comment to say that or not?

Yes, sounds good. Added 1 sentence comment, I'll push this along with my other changes on v10.
 

Thanks,
Onder
Hi again,


======

1. Commit message

1a.
With this patch, I'm proposing the following change: If there is an
index on the subscriber, use the index as long as the planner
sub-modules picks any index over sequential scan. The index should be
a btree index, not a partital index. Finally, the index should have at
least one column reference (e.g., cannot consists of only
expressions).

SUGGESTION
With this patch, I'm proposing the following change: If there is any
index on the subscriber, let the planner sub-modules compare the costs
of index versus sequential scan and choose the cheapest. The index
should be a btree index, not a partial index, and it should have at
least one column reference (e.g., cannot consist of only expressions).


makes sense.
 
~

1b.
The Majority of the logic on the subscriber side exists in the code.

"exists" -> "already exists"

fixed 

~

1c.
psql -c "truncate pgbench_accounts;" -p 9700 postgres

"truncate" -> "TRUNCATE"

fixed 


~

1d.
Try to wrap this message text at 80 char width.
 
fixed
 

======

2. src/backend/replication/logical/relation.c - logicalrep_rel_open

+ /*
+ * Finding a usable index is an infrequent task. It is performed
+ * when an operation is first performed on the relation, or after
+ * invalidation of the relation cache entry (e.g., such as ANALYZE).
+ */
+ entry->usableIndexOid = LogicalRepUsableIndex(entry->localrel, remoterel);

Seemed a bit odd to say "performed" 2x in the same sentence.

"It is performed when..." -> "It occurs when...” (?)


fixed
 
~~~

3. src/backend/replication/logical/relation.c - logicalrep_partition_open

+ /*
+ * Finding a usable index is an infrequent task. It is performed
+ * when an operation is first performed on the relation, or after
+ * invalidation of the relation cache entry (e.g., such as ANALYZE).
+ */
+ part_entry->relmapentry.usableIndexOid =
+ LogicalRepUsableIndex(partrel, remoterel);

3a.
Same as comment #2 above.

done
 

~

3b.
The jumping between 'part_entry' and 'entry' is confusing. Since
'entry' is already assigned to be &part_entry->relmapentry can't you
use that here?

SUGGESTION
entry->usableIndexOid = LogicalRepUsableIndex(partrel, remoterel);

Yes, sure it makes sense.
 
~~~

4. src/backend/replication/logical/relation.c - GetIndexOidFromPath

+/*
+ * Returns a valid index oid if the input path is an index path.
+ * Otherwise, return invalid oid.
+ */
+static Oid
+GetIndexOidFromPath(Path *path)

Perhaps may this function comment more consistent with others (like
GetRelationIdentityOrPK, LogicalRepUsableIndex) and refer to the
InvalidOid.

SUGGESTION
/*
 * Returns a valid index oid if the input path is an index path.
 *
 * Otherwise, returns InvalidOid.
 */

sounds good
 
~~~

5. src/backend/replication/logical/relation.c - IndexOnlyOnExpression

+bool
+IndexOnlyOnExpression(IndexInfo *indexInfo)
+{
+ int i;
+ for (i = 0; i < indexInfo->ii_NumIndexKeyAttrs; i++)
+ {
+ AttrNumber attnum = indexInfo->ii_IndexAttrNumbers[i];
+ if (AttributeNumberIsValid(attnum))
+ return false;
+ }
+
+ return true;
+}

5a.
Add a blank line after those declarations.

 
Done, also went over all the functions and ensured we don't have this anymore
 
~

5b.
AFAIK the C99 style for loop declarations should be OK [1] for new
code, so declaring like below would be cleaner:

for (int i = 0; ...

Done 
~~~

6. src/backend/replication/logical/relation.c -
FilterOutNotSuitablePathsForReplIdentFull

+/*
+ * Iterates over the input path list and returns another path list
+ * where paths with non-btree indexes, partial indexes or
+ * indexes on only expressions are eliminated from the list.
+ */
+static List *
+FilterOutNotSuitablePathsForReplIdentFull(List *pathlist)

"are eliminated from the list." -> "have been removed."

Done
 
~~~

7.

+ foreach(lc, pathlist)
+ {
+ Path    *path = (Path *) lfirst(lc);
+ Oid indexOid = GetIndexOidFromPath(path);
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ bool is_btree;
+ bool is_partial;
+ bool is_only_on_expression;
+
+ if (!OidIsValid(indexOid))
+ {
+ /* Unrelated Path, skip */
+ suitableIndexList = lappend(suitableIndexList, path);
+ }
+ else
+ {
+ indexRelation = index_open(indexOid, AccessShareLock);
+ indexInfo = BuildIndexInfo(indexRelation);
+ is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
+ is_partial = (indexInfo->ii_Predicate != NIL);
+ is_only_on_expression = IndexOnlyOnExpression(indexInfo);
+ index_close(indexRelation, NoLock);
+
+ if (is_btree && !is_partial && !is_only_on_expression)
+ suitableIndexList = lappend(suitableIndexList, path);
+ }
+ }

I think most of those variables are only used in the "else" block so
maybe it's better to declare them at that scope.

+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ bool is_btree;
+ bool is_partial;
+ bool is_only_on_expression;


Makes sense
 
~~~

8. src/backend/replication/logical/relation.c -
GetCheapestReplicaIdentityFullPath

+ * Indexes that consists of only expressions (e.g.,
+ * no simple column references on the index) are also
+ * eliminated with a similar reasoning.

"consists" -> "consist"

"with a similar reasoning" -> "with similar reasoning"

fixed 
~~~

9.

+ * We also eliminate non-btree indexes, which could be relaxed
+ * if needed. If we allow non-btree indexes, we should adjust
+ * RelationFindReplTupleByIndex() to support such indexes.

This looks like another of those kinds of comments that should have
"XXX" prefix as a note to the future.

added
 

~~~

10. src/backend/replication/logical/relation.c -
FindUsableIndexForReplicaIdentityFull

+/*
+ * Returns an index oid if the planner submodules picks index scans
+ * over sequential scan.

10a
"picks" -> "pick"


done 
 
~

10b.
Maybe this should also say ", otherwise returns InvalidOid" (?)


Makes sense, added similar to above suggestion
 
~~~

11.

+FindUsableIndexForReplicaIdentityFull(Relation localrel)
+{
+ MemoryContext usableIndexContext;
+ MemoryContext oldctx;
+ Path *cheapest_total_path;
+ Oid indexOid;

In the following function, and in the one after that, you've named the
index Oid as 'idxoid' (not 'indexOid'). IMO it's better to use
consistent naming everywhere.

 Ok, existing functions use idxoid, switched to that.

~~~

12. src/backend/replication/logical/relation.c - GetRelationIdentityOrPK

12a.
I wondered what is the benefit of having this function. IIUC it is
only called from one place (LogicalRepUsableIndex) and IMO the code
would probably be easier if you just inline this logic in that
function...


I just moved that from src/backend/replication/logical/worker.c, so probably better not to remove it in this patch?

Tbh, I like the simplicity it provides.
  
~

12b.
+/*
+ * Get replica identity index or if it is not defined a primary key.
+ *
+ * If neither is defined, returns InvalidOid
+ */

If you want to keep the function for some reason (e.g. see #12a) then
I thought the function comment could be better.

SUGGESTION
/*
 * Returns OID of the relation's replica identity index, or OID of the
 * relation's primary key index.
 *
 * If neither is defined, returns InvalidOid.
 */


As I noted, I just moved this function. So, left as-is for now.
 
~~~

13. src/backend/replication/logical/relation.c - LogicalRepUsableIndex

For some reason, I feel this function should be called
FindLogicalRepUsableIndex (or similar), because it seems more
consistent with the others which might return the Oid or might return
InvalidOid...


Makes sense, changed
 
~~~

14.

+ /*
+ * Index scans are disabled, use sequential scan. Note that we do allow
+ * index scans when there is a primary key or unique index replica
+ * identity. That is the legacy behavior so we hesitate to move this check
+ * above.
+ */

Perhaps a slight rephrasing of that comment?

SUGGESTION
If index scans are disabled, use a sequential scan.

Note that we still allowed index scans above when there is a primary
key or unique index replica identity, but that is the legacy behaviour
(even when enable_indexscan is false), so we hesitate to move this
enable_indexscan check to be done earlier in this function.

~~~

Sounds good, changed 

15.

+ * If we had a primary key or relation identity with a unique index,
+ * we would have already found a valid oid. At this point, the remote
+ * relation has replica identity full and we have at least one local
+ * index defined.

"would have already found a valid oid." -> "would have already found
and returned that oid."

Done
 

======

16. src/backend/replication/logical/worker.c - usable_indexoid_internal

+/*
+ * Decide whether we can pick an index for the relinfo (e.g., the relation)
+ * we're actually deleting/updating from. If it is a child partition of
+ * edata->targetRelInfo, find the index on the partition.
+ *
+ * Note that if the corresponding relmapentry has InvalidOid usableIndexOid,
+ * the function returns InvalidOid. In that case, the tuple is used via
+ * sequential execution.
+ */
+static Oid
+usable_indexoid_internal(ApplyExecutionData *edata, ResultRelInfo *relinfo)

I am not sure this is the right place to be saying that last sentence
("In that case, the tuple is used via sequential execution.") because
it's up to the *calling* code to decide what to do if InvalidOid is
returned

 Right, for now this is true, but could change in the future. Removed.


======

17. src/include/replication/logicalrelation.h

@ -31,20 +32,40 @@ typedef struct LogicalRepRelMapEntry
  Relation localrel; /* relcache entry (NULL when closed) */
  AttrMap    *attrmap; /* map of local attributes to remote ones */
  bool updatable; /* Can apply updates/deletes? */
+ Oid usableIndexOid; /* which index to use? (Invalid when no index
+ * used) */

SUGGESTION (for the comment)
which index to use, or InvalidOid if none

makes sense
 

~~~

18.

+/*
+ * Partition map (LogicalRepPartMap)
+ *
+ * When a partitioned table is used as replication target, replicated
+ * operations are actually performed on its leaf partitions, which requires
+ * the partitions to also be mapped to the remote relation.  Parent's entry
+ * (LogicalRepRelMapEntry) cannot be used as-is for all partitions, because
+ * individual partitions may have different attribute numbers, which means
+ * attribute mappings to remote relation's attributes must be maintained
+ * separately for each partition.
+ */
+typedef struct LogicalRepPartMapEntry

Something feels not quite right using the (unchanged) comment about
the Partition map which was removed from where it was originally in
relation.c.

The reason I am unsure is that this comment is still referring to the
"LogicalRepPartMap", which is not here but is declared static in
relation.c. Maybe the quick/easy fix would be to just change the first
line to say: "Partition map (see LogicalRepPartMap in relation.c)".
OTOH, I'm not sure if some part of this comment still needs to be left
in relation.c (??)

Hmm, I agree that we need some extra comments pointing where this is used (I followed something similar to your suggestion).

However, I also think that it is nicer to keep this comment here because that seems more common in the code-base that the comments are on the MapEntry, not on the Map itself, no?

Thanks,
Onder


Attachment
On Sat, Aug 20, 2022 at 4:32 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> I'm a little late to catch up with your comments, but here are my replies:
>
>> > My answer for the above assumes that your question is regarding what happens if you ANALYZE on a partitioned
table.If your question is something different, please let me know. 
>> >
>>
>> I was talking about inheritance cases, something like:
>> create table tbl1 (a int);
>> create table tbl1_part1 (b int) inherits (tbl1);
>> create table tbl1_part2 (c int) inherits (tbl1);
>>
>> What we do in such cases is documented as: "if the table being
>> analyzed has inheritance children, ANALYZE gathers two sets of
>> statistics: one on the rows of the parent table only, and a second
>> including rows of both the parent table and all of its children. This
>> second set of statistics is needed when planning queries that process
>> the inheritance tree as a whole. The child tables themselves are not
>> individually analyzed in this case."
>
>
> Oh, I haven't considered inherited tables. That seems right, the statistics of the children are not updated when the
parentis analyzed. 
>
>>
>> Now, the point I was worried about was what if the changes in child
>> tables (*_part1, *_part2) are much more than in tbl1? In such cases,
>> we may not invalidate child rel entries, so how will logical
>> replication behave for updates/deletes on child tables? There may not
>> be any problem here but it is better to do some analysis of such cases
>> to see how it behaves.
>
>
> I also haven't observed any specific issues. In the end, when the user (or autovacuum) does ANALYZE on the child, it
iswhen the statistics are updated for the child. 
>

Right, I also think that should be the behavior but I have not
verified it. However, I think it should be easy to verify if
autovacuum updates the stats for child tables when we operate on only
one of such tables and whether that will invalidate the cache for our
case.

> Although I do not have much experience with inherited tables, this sounds like the expected behavior?
>
> I also pushed a test covering inherited tables. First, a basic test on the parent. Then, show that updates on the
parentcan also use indexes of the children. Also, after an ANALYZE on the child, we can re-calculate the index and use
theindex with a higher cardinality column. 
>
>>
>> > Also, for the majority of the use-cases, I think we'd probably expect an index on a column with high cardinality
--hence use index scan. So, bitmap index scans are probably not going to be that much common. 
>> >
>>
>> You are probably right here but I don't think we can make such
>> assumptions. I think the safest way to avoid any regression here is to
>> choose an index when the planner selects an index scan. We can always
>> extend it later to bitmap scans if required. We can add a comment
>> indicating the same.
>>
>
> Alright, I got rid of the bitmap scans.
>
> Though, it caused few of the new tests to fail. I think because of the data size/distribution, the planner picks
bitmapscans. To make the tests consistent and small, I added `enable_bitmapscan to off` for this new test file. Does
thatsound ok to you? Or, should we change the tests to make sure they genuinely use index scans? 
>

That sounds okay to me.

--
With Regards,
Amit Kapila.



Hi,


>
> Oh, I haven't considered inherited tables. That seems right, the statistics of the children are not updated when the parent is analyzed.
>
>>
>> Now, the point I was worried about was what if the changes in child
>> tables (*_part1, *_part2) are much more than in tbl1? In such cases,
>> we may not invalidate child rel entries, so how will logical
>> replication behave for updates/deletes on child tables? There may not
>> be any problem here but it is better to do some analysis of such cases
>> to see how it behaves.
>
>
> I also haven't observed any specific issues. In the end, when the user (or autovacuum) does ANALYZE on the child, it is when the statistics are updated for the child.
>

Right, I also think that should be the behavior but I have not
verified it. However, I think it should be easy to verify if
autovacuum updates the stats for child tables when we operate on only
one of such tables and whether that will invalidate the cache for our
case.


I already added a regression test for this with the title: # Testcase start: SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE - INHERITED TABLE 

I realized that the comments on the test case were confusing, and clarified those. Attached the new version also rebased onto the master branch.

Thanks,
Onder
Attachment

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"kuroda.hayato@fujitsu.com"
Date:
Dear  Önder,

Thank you for proposing good feature. I'm also interested in the patch, 
So I started to review this. Followings are initial comments.

===
For execRelation.c

01. RelationFindReplTupleByIndex()

```
        /* Start an index scan. */
        InitDirtySnapshot(snap);
-       scan = index_beginscan(rel, idxrel, &snap,
-                                                  IndexRelationGetNumberOfKeyAttributes(idxrel),
-                                                  0);
 
        /* Build scan key. */
-       build_replindex_scan_key(skey, rel, idxrel, searchslot);
+       scankey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);
 
+       scan = index_beginscan(rel, idxrel, &snap, scankey_attoff, 0);
```

I think "/* Start an index scan. */" should be just above index_beginscan().

===
For worker.c

02. sable_indexoid_internal()

```
+ * Note that if the corresponding relmapentry has InvalidOid usableIndexOid,
+ * the function returns InvalidOid.
+ */
+static Oid
+usable_indexoid_internal(ApplyExecutionData *edata, ResultRelInfo *relinfo)
```

"InvalidOid usableIndexOid" should be "invalid usableIndexOid,"

03. check_relation_updatable()

```
         * We are in error mode so it's fine this is somewhat slow. It's better to
         * give user correct error.
         */
-       if (OidIsValid(GetRelationIdentityOrPK(rel->localrel)))
+       if (OidIsValid(rel->usableIndexOid))
        {
```

Shouldn't we change the above comment to? The check is no longer slow.

===
For relation.c

04. GetCheapestReplicaIdentityFullPath()

```
+static Path *
+GetCheapestReplicaIdentityFullPath(Relation localrel)
+{
+       PlannerInfo *root;
+       Query      *query;
+       PlannerGlobal *glob;
+       RangeTblEntry *rte;
+       RelOptInfo *rel;
+       int     attno;
+       RangeTblRef *rt;
+       List *joinList;
+       Path *seqScanPath;
```

I think the part that constructs dummy-planner state can be move to another function
because that part is not meaningful for this.
Especially line 824-846 can. 


===
For 032_subscribe_use_index.pl

05. general

```
+# insert some initial data within the range 0-1000
+$node_publisher->safe_psql('postgres',
+       "INSERT INTO test_replica_id_full SELECT i%20 FROM generate_series(0,1000)i;"
+);
```

It seems that the range of initial data seems [0, 19].
Same mistake-candidates are found many place.

06. general

```
+# updates 1000 rows
+$node_publisher->safe_psql('postgres',
+       "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 15;");
```

Only 50 tuples are modified here.
Same mistake-candidates are found many place.

07. general

```
+# we check if the index is used or not
+$node_subscriber->poll_query_until(
+       'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes where indexrelname =
'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full_3 updates 200 rows via index";    
```
The query will be executed until the index scan is finished, but it may be not commented.
How about changing it to "we wait until the index used on the subscriber-side." or something?
Same comments are found in many place.

08. test related with ANALYZE

```
+# Testcase start: SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE - PARTITIONED TABLE
+# ====================================================================
```

"Testcase start:" should be "Testcase end:" here.

09. general

In some tests results are confirmed but in other test they are not.
I think you can make sure results are expected in any case if there are no particular reasons.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Here are some review comments for the latest v10 patch.

(Mostly these are just nitpick wording/comments etc)

======

1. Commit message

It is often not feasible to use `REPLICA IDENTITY FULL` on the publication
because it leads to full table scan per tuple change on the subscription.
This makes `REPLICA IDENTITY FULL` impracticable -- probably other than
some small number of use cases.

~

The "often not feasible" part seems repeated by the "impracticable" part.

SUGGESTION
Using `REPLICA IDENTITY FULL` on the publication leads to a full table
scan per tuple change on the subscription. This makes `REPLICA
IDENTITY FULL` impracticable -- probably other than some small number
of use cases.

~~~

2.

The Majority of the logic on the subscriber side already exists in
the code.

"Majority" -> "majority"

~~~

3.

The ones familiar
with this part of the code could realize that the sequential scan
code on the subscriber already implements the `tuples_equal()`
function.

SUGGESTION
Anyone familiar with this part of the code might recognize that...

~~~

4.

In short, the changes on the subscriber is mostly
combining parts of (unique) index scan and sequential scan codes.

"is mostly" -> "are mostly"

~~~

5.

From the performance point of view, there are few things to note.

"are few" -> "are a few"

======

6. src/backend/executor/execReplication.c - build_replindex_scan_key

+static int
 build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
  TupleTableSlot *searchslot)
 {
- int attoff;
+ int index_attoff;
+ int scankey_attoff = 0;

Should it be called 'skey_attoff' for consistency with the param 'skey'?

~~~

7.

  Oid operator;
  Oid opfamily;
  RegProcedure regop;
- int pkattno = attoff + 1;
- int mainattno = indkey->values[attoff];
- Oid optype = get_opclass_input_type(opclass->values[attoff]);
+ int table_attno = indkey->values[index_attoff];
+ Oid optype = get_opclass_input_type(opclass->values[index_attoff]);

Maybe the 'optype' should be adjacent to the other Oid opXXX
declarations just to keep them all together?

~~~

8.

+ if (!AttributeNumberIsValid(table_attno))
+ {
+ IndexInfo *indexInfo PG_USED_FOR_ASSERTS_ONLY;
+
+ /*
+ * There are two cases to consider. First, if the index is a primary or
+ * unique key, we cannot have any indexes with expressions. So, at this
+ * point we are sure that the index we are dealing with is not these.
+ */
+ Assert(RelationGetReplicaIndex(rel) != RelationGetRelid(idxrel) &&
+    RelationGetPrimaryKeyIndex(rel) != RelationGetRelid(idxrel));
+
+ /*
+ * At this point, we are also sure that the index is not consisting
+ * of only expressions.
+ */
+#ifdef USE_ASSERT_CHECKING
+ indexInfo = BuildIndexInfo(idxrel);
+ Assert(!IndexOnlyOnExpression(indexInfo));
+#endif

I was a bit confused by the comment. IIUC the code has already called
the FilterOutNotSuitablePathsForReplIdentFull some point prior so all
the unwanted indexes are already filtered out. Therefore these
assertions are just for no reason, other than sanity checking that
fact, right? If my understand is correct perhaps a simpler single
comment is possible:

SUGGESTION (or something like this)
This attribute is an expression, however
FilterOutNotSuitablePathsForReplIdentFull was called earlier during
[...] and the indexes comprising only expressions have already been
eliminated. We sanity check this now. Furthermore, because primary key
and unique key indexes can't include expressions we also sanity check
the index is neither of those kinds.

~~~

9.
- return hasnulls;
+ /* We should always use at least one attribute for the index scan */
+ Assert (scankey_attoff > 0);

SUGGESTION
There should always be at least one attribute for the index scan.

~~~

10. src/backend/executor/execReplication.c - RelationFindReplTupleByIndex

ScanKeyData skey[INDEX_MAX_KEYS];
IndexScanDesc scan;
SnapshotData snap;
TransactionId xwait;
Relation idxrel;
bool found;
TypeCacheEntry **eq = NULL; /* only used when the index is not unique */
bool indisunique;
int scankey_attoff;

10a.
Should 'scankey_attoff' be called 'skey_attoff' for consistency with
the 'skey' array?

~

10b.
Also, it might be tidier to declare the 'skey_attoff' adjacent to the 'skey'.

======

11. src/backend/replication/logical/relation.c

For LogicalRepPartMap, I was wondering if it should keep a small
comment to xref back to the long comment which was moved to
logicalreplication.h

e.g.
/* Refer to the LogicalRepPartMapEntry comment in logicalrelation.h */

~~~

12. src/backend/replication/logical/relation.c - logicalrep_partition_open

+ /*
+ * Finding a usable index is an infrequent task. It occurs when
+ * an operation is first performed on the relation, or after
+ * invalidation of the relation cache entry (e.g., such as ANALYZE).
+ */
+ entry->usableIndexOid = FindLogicalRepUsableIndex(entry->localrel, remoterel);
  entry->localrelvalid = true;

Should there be a blank line between those assignments? (just for
consistency with the other code of this patch in a later function that
does exactly the same assignments).

~~~

13. src/backend/replication/logical/relation.c -
FilterOutNotSuitablePathsForReplIdentFull

Not sure about this function name. Maybe should be something like
'FilterOutUnsuitablePathsForReplIdentFull', or just
'SuitablePathsForReplIdentFull'

~~~

14.

+ else
+ {
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ bool is_btree;
+ bool is_partial;
+ bool is_only_on_expression;

Is that another var that could be renamed 'idxoid' like all the others?

~~~

15. src/backend/replication/logical/relation.c -
GetCheapestReplicaIdentityFullPath

+ typentry = lookup_type_cache(attr->atttypid,
+ TYPECACHE_EQ_OPR_FINFO);

Seems unnecessary wrapping.

~~~

15.

+ /*
+ * Currently it is not possible for planner to pick a
+ * partial index or indexes only on expressions. We
+ * still want to be explicit and eliminate such
+ * paths proactively.
...
...
+ */

This large comment seems unusually skinny. Needs pg_indent.

~~~

16. src/backend/replication/logical/worker.c - check_relation_updatable

@@ -1753,7 +1738,7 @@ check_relation_updatable(LogicalRepRelMapEntry *rel)
  * We are in error mode so it's fine this is somewhat slow. It's better to
  * give user correct error.
  */
- if (OidIsValid(GetRelationIdentityOrPK(rel->localrel)))
+ if (OidIsValid(rel->usableIndexOid))

The original comment about it being "somewhat slow" does not seem
relevant anymore because it is no longer calling a function in this
condition.

~~~

17. src/backend/replication/logical/worker.c - usable_indexoid_internal

+ relmapentry = &(part_entry->relmapentry);

The parentheses seem overkill, and code is not written like this
elsewhere in the same patch.

~~~

18. src/backend/replication/logical/worker.c - apply_handle_tuple_routing

@@ -2202,13 +2225,15 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
  * suitable partition.
  */
  {
+ LogicalRepRelMapEntry *entry = &part_entry->relmapentry;

I think elsewhere in the patch the same variable is called
'relmapentry' (which seems a bit better than just 'entry')

======

19. .../subscription/t/032_subscribe_use_index.pl

+# ANALYZING child will change the index used on child_1 and going to
use index_on_child_1_b
+$node_subscriber->safe_psql('postgres', "ANALYZE child_1");

19a.
"ANALYZING child" ? Should that be worded differently? There is
nothing named 'child' that I could see.

~

19b.
"and going to use" ? wording ? "which will be used for " ??

------
Kind Regards,
Peter Smith.
Fujitsu Australia



Hi Hayato Kuroda, 

Thanks for the review, please see my reply below:


===
For execRelation.c

01. RelationFindReplTupleByIndex()

```
        /* Start an index scan. */
        InitDirtySnapshot(snap);
-       scan = index_beginscan(rel, idxrel, &snap,
-                                                  IndexRelationGetNumberOfKeyAttributes(idxrel),
-                                                  0);

        /* Build scan key. */
-       build_replindex_scan_key(skey, rel, idxrel, searchslot);
+       scankey_attoff = build_replindex_scan_key(skey, rel, idxrel, searchslot);

+       scan = index_beginscan(rel, idxrel, &snap, scankey_attoff, 0);
```

I think "/* Start an index scan. */" should be just above index_beginscan().

moved there
 

===
For worker.c

02. sable_indexoid_internal()

```
+ * Note that if the corresponding relmapentry has InvalidOid usableIndexOid,
+ * the function returns InvalidOid.
+ */
+static Oid
+usable_indexoid_internal(ApplyExecutionData *edata, ResultRelInfo *relinfo)
```

"InvalidOid usableIndexOid" should be "invalid usableIndexOid,"

makes sense, updated
 

03. check_relation_updatable()

```
         * We are in error mode so it's fine this is somewhat slow. It's better to
         * give user correct error.
         */
-       if (OidIsValid(GetRelationIdentityOrPK(rel->localrel)))
+       if (OidIsValid(rel->usableIndexOid))
        {
```

Shouldn't we change the above comment to? The check is no longer slow.

Hmm, I couldn't realize this comment earlier. So you suggest "slow" here refers to the additional function call "GetRelationIdentityOrPK"? If so, yes I'll update that.
 

===
For relation.c

04. GetCheapestReplicaIdentityFullPath()

```
+static Path *
+GetCheapestReplicaIdentityFullPath(Relation localrel)
+{
+       PlannerInfo *root;
+       Query      *query;
+       PlannerGlobal *glob;
+       RangeTblEntry *rte;
+       RelOptInfo *rel;
+       int     attno;
+       RangeTblRef *rt;
+       List *joinList;
+       Path *seqScanPath;
```

I think the part that constructs dummy-planner state can be move to another function
because that part is not meaningful for this.
Especially line 824-846 can.


Makes sense, simplified the function. Though, it is always hard to pick good names for these kinds of helper functions. I picked GenerateDummySelectPlannerInfoForRelation(), does that sound good to you as well? 
 

===
For 032_subscribe_use_index.pl

05. general

```
+# insert some initial data within the range 0-1000
+$node_publisher->safe_psql('postgres',
+       "INSERT INTO test_replica_id_full SELECT i%20 FROM generate_series(0,1000)i;"
+);
```

It seems that the range of initial data seems [0, 19].
Same mistake-candidates are found many place.

Ah, several copy & paste errors. Fixed (hopefully) all.
 

06. general

```
+# updates 1000 rows
+$node_publisher->safe_psql('postgres',
+       "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 15;");
```

Only 50 tuples are modified here.
Same mistake-candidates are found many place.

Alright, yes there were several wrong comments in the tests. I went over the tests once more to fix those and improve comments.
 

07. general

```
+# we check if the index is used or not
+$node_subscriber->poll_query_until(
+       'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full_3 updates 200 rows via index";
```
The query will be executed until the index scan is finished, but it may be not commented.
How about changing it to "we wait until the index used on the subscriber-side." or something?
Same comments are found in many place.

Makes sense, updated
 

08. test related with ANALYZE

```
+# Testcase start: SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE - PARTITIONED TABLE
+# ====================================================================
```

"Testcase start:" should be "Testcase end:" here.

thanks, fixed
 

09. general

In some tests results are confirmed but in other test they are not.
I think you can make sure results are expected in any case if there are no particular reasons.


Alright, yes I also don't see a reason not to do that. Added to all cases.


I'll attach the patch with the next email as I also want to incorporate the other comments. Hope this is not going to be confusing.

Thanks,
Onder
Hi Peter,

Thanks again for the review, see my comments below:



======

1. Commit message

It is often not feasible to use `REPLICA IDENTITY FULL` on the publication
because it leads to full table scan per tuple change on the subscription.
This makes `REPLICA IDENTITY FULL` impracticable -- probably other than
some small number of use cases.

~

The "often not feasible" part seems repeated by the "impracticable" part.
 
SUGGESTION
Using `REPLICA IDENTITY FULL` on the publication leads to a full table
scan per tuple change on the subscription. This makes `REPLICA
IDENTITY FULL` impracticable -- probably other than some small number
of use cases.

~~~

Sure, this is easier to follow, updated.
 

2.

The Majority of the logic on the subscriber side already exists in
the code.

"Majority" -> "majority"

 
fixed
 
~~~

3.

The ones familiar
with this part of the code could realize that the sequential scan
code on the subscriber already implements the `tuples_equal()`
function.

SUGGESTION
Anyone familiar with this part of the code might recognize that...

~~~

Yes, this is better, applied
 

4.

In short, the changes on the subscriber is mostly
combining parts of (unique) index scan and sequential scan codes.

"is mostly" -> "are mostly"

~~~


applied
 
5.

From the performance point of view, there are few things to note.

"are few" -> "are a few"


applied
 
======

6. src/backend/executor/execReplication.c - build_replindex_scan_key

+static int
 build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
  TupleTableSlot *searchslot)
 {
- int attoff;
+ int index_attoff;
+ int scankey_attoff = 0;

Should it be called 'skey_attoff' for consistency with the param 'skey'?


That looks better, updated
 
~~~

7.

  Oid operator;
  Oid opfamily;
  RegProcedure regop;
- int pkattno = attoff + 1;
- int mainattno = indkey->values[attoff];
- Oid optype = get_opclass_input_type(opclass->values[attoff]);
+ int table_attno = indkey->values[index_attoff];
+ Oid optype = get_opclass_input_type(opclass->values[index_attoff]);

Maybe the 'optype' should be adjacent to the other Oid opXXX
declarations just to keep them all together?

I do not have any preference on this. Although I do not see such a strong pattern in the code, I have no objection to doing so changed.

~~~

8.

+ if (!AttributeNumberIsValid(table_attno))
+ {
+ IndexInfo *indexInfo PG_USED_FOR_ASSERTS_ONLY;
+
+ /*
+ * There are two cases to consider. First, if the index is a primary or
+ * unique key, we cannot have any indexes with expressions. So, at this
+ * point we are sure that the index we are dealing with is not these.
+ */
+ Assert(RelationGetReplicaIndex(rel) != RelationGetRelid(idxrel) &&
+    RelationGetPrimaryKeyIndex(rel) != RelationGetRelid(idxrel));
+
+ /*
+ * At this point, we are also sure that the index is not consisting
+ * of only expressions.
+ */
+#ifdef USE_ASSERT_CHECKING
+ indexInfo = BuildIndexInfo(idxrel);
+ Assert(!IndexOnlyOnExpression(indexInfo));
+#endif

I was a bit confused by the comment. IIUC the code has already called
the FilterOutNotSuitablePathsForReplIdentFull some point prior so all
the unwanted indexes are already filtered out. Therefore these
assertions are just for no reason, other than sanity checking that
fact, right? If my understand is correct perhaps a simpler single
comment is possible:

Yes, these are for sanity check
 

SUGGESTION (or something like this)
This attribute is an expression, however
FilterOutNotSuitablePathsForReplIdentFull was called earlier during
[...] and the indexes comprising only expressions have already been
eliminated. We sanity check this now. Furthermore, because primary key
and unique key indexes can't include expressions we also sanity check
the index is neither of those kinds.

~~~

I agree that we can improve comments here. I incorporated your suggestion as well. 
 

9.
- return hasnulls;
+ /* We should always use at least one attribute for the index scan */
+ Assert (scankey_attoff > 0);

SUGGESTION
There should always be at least one attribute for the index scan.

applied
 

~~~

10. src/backend/executor/execReplication.c - RelationFindReplTupleByIndex

ScanKeyData skey[INDEX_MAX_KEYS];
IndexScanDesc scan;
SnapshotData snap;
TransactionId xwait;
Relation idxrel;
bool found;
TypeCacheEntry **eq = NULL; /* only used when the index is not unique */
bool indisunique;
int scankey_attoff;

10a.
Should 'scankey_attoff' be called 'skey_attoff' for consistency with
the 'skey' array?

Yes, it makes sense as you suggested on build_replindex_scan_key

~

10b.
Also, it might be tidier to declare the 'skey_attoff' adjacent to the 'skey'.

moved 

======

11. src/backend/replication/logical/relation.c

For LogicalRepPartMap, I was wondering if it should keep a small
comment to xref back to the long comment which was moved to
logicalreplication.h

e.g.
/* Refer to the LogicalRepPartMapEntry comment in logicalrelation.h */

 Could work, added. We already have the xref the other way around (LogicalRepPartMapEntry->LogicalRepPartMap)


~~~

12. src/backend/replication/logical/relation.c - logicalrep_partition_open

+ /*
+ * Finding a usable index is an infrequent task. It occurs when
+ * an operation is first performed on the relation, or after
+ * invalidation of the relation cache entry (e.g., such as ANALYZE).
+ */
+ entry->usableIndexOid = FindLogicalRepUsableIndex(entry->localrel, remoterel);
  entry->localrelvalid = true;

Should there be a blank line between those assignments? (just for
consistency with the other code of this patch in a later function that
does exactly the same assignments).

done
 

~~~

13. src/backend/replication/logical/relation.c -
FilterOutNotSuitablePathsForReplIdentFull

Not sure about this function name. Maybe should be something like
'FilterOutUnsuitablePathsForReplIdentFull', or just
'SuitablePathsForReplIdentFull'

~~~

I think I'll go with a slight modification of your suggestion: SuitablePathsForRepIdentFull 

14.

+ else
+ {
+ Relation indexRelation;
+ IndexInfo *indexInfo;
+ bool is_btree;
+ bool is_partial;
+ bool is_only_on_expression;

Is that another var that could be renamed 'idxoid' like all the others?

seems so, updated
 
~~~

15. src/backend/replication/logical/relation.c -
GetCheapestReplicaIdentityFullPath

+ typentry = lookup_type_cache(attr->atttypid,
+ TYPECACHE_EQ_OPR_FINFO);

Seems unnecessary wrapping.

fixed
 
~~~

15.

+ /*
+ * Currently it is not possible for planner to pick a
+ * partial index or indexes only on expressions. We
+ * still want to be explicit and eliminate such
+ * paths proactively.
...
...
+ */

This large comment seems unusually skinny. Needs pg_indent. 


Ok, it has been a while that I have not run pg_indent. Now did and this comment is fixed as well
 
~~~

16. src/backend/replication/logical/worker.c - check_relation_updatable

@@ -1753,7 +1738,7 @@ check_relation_updatable(LogicalRepRelMapEntry *rel)
  * We are in error mode so it's fine this is somewhat slow. It's better to
  * give user correct error.
  */
- if (OidIsValid(GetRelationIdentityOrPK(rel->localrel)))
+ if (OidIsValid(rel->usableIndexOid))

The original comment about it being "somewhat slow" does not seem
relevant anymore because it is no longer calling a function in this
condition.


Fixed (also a similar comment raised in another review)
 
~~~

17. src/backend/replication/logical/worker.c - usable_indexoid_internal

+ relmapentry = &(part_entry->relmapentry);

The parentheses seem overkill, and code is not written like this
elsewhere in the same patch.

true, no need, removed the parentheses 


~~~

18. src/backend/replication/logical/worker.c - apply_handle_tuple_routing

@@ -2202,13 +2225,15 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
  * suitable partition.
  */
  {
+ LogicalRepRelMapEntry *entry = &part_entry->relmapentry;

I think elsewhere in the patch the same variable is called
'relmapentry' (which seems a bit better than just 'entry')


true, it used as relmapentry in other place(s), and in this context entry is confusing. So, changed to relmapentry.
 
======

19. .../subscription/t/032_subscribe_use_index.pl

+# ANALYZING child will change the index used on child_1 and going to
use index_on_child_1_b
+$node_subscriber->safe_psql('postgres', "ANALYZE child_1");

19a.
"ANALYZING child" ? Should that be worded differently? There is
nothing named 'child' that I could see.


Do you mean  it should be "child_1"? Tha is the name of the table. I updated the comment, let me know if it is still confusing.

~

19b.
"and going to use" ? wording ? "which will be used for " ??


Rewording the comment below, is that better?

# ANALYZING child_1 will change the index used on the table and
# UPDATE/DELETEs on the subscriber are going to use index_on_child_1_b 


I also attached v_11 of the patch.

Thanks,
Onder Kalaci
Attachment

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"kuroda.hayato@fujitsu.com"
Date:
Dear Önder,

Thanks for updating the patch! I will check it later.
Currently I just reply to your comments.

> Hmm, I couldn't realize this comment earlier. So you suggest "slow" here refers to the additional function call
"GetRelationIdentityOrPK"?If so, yes I'll update that.
 

Yes I meant to say that, because functions will be called like:

GetRelationIdentityOrPK() -> RelationGetPrimaryKeyIndex() -> RelationGetIndexList() -> ..

and according to comments last one seems to do the heavy lifting.


> Makes sense, simplified the function. Though, it is always hard to pick good names for these kinds of helper
functions.I picked GenerateDummySelectPlannerInfoForRelation(), does that sound good to you as well?
 

I could not find any better naming than yours. 

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


I had quick look at the latest v11-0001 patch differences from v10.

Here are some initial comments:

======

1. Commit message

It looks like some small mistake happened. You wrote [1] that my
previous review comments about the commit message were fixed, but it
seems the v11 commit message is unchanged since v10.

======

2. src/backend/replication/logical/relation.c -
GenerateDummySelectPlannerInfoForRelation

+/*
+ * This is not a generic function, helper function for
+ * GetCheapestReplicaIdentityFullPath. The function creates
+ * a dummy PlannerInfo for the given relationId as if the
+ * relation is queried with SELECT command.
+ */
+static PlannerInfo *
+GenerateDummySelectPlannerInfoForRelation(Oid relationId)

"generic function, helper function" -> "generic function. It is a
helper function"


------
[1] https://www.postgresql.org/message-id/CACawEhXnTcXBOTofptkgSBOyD81Pohd7MSfFaW0SKo-0oKrCJg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



I've gone through the v11-0001 patch in more detail.

Here are some more review comments (nothing functional I think -
mostly just wording)

======

1. src/backend/executor/execReplication.c - build_replindex_scan_key

- * This is not generic routine, it expects the idxrel to be replication
- * identity of a rel and meet all limitations associated with that.
+ * This is not generic routine, it expects the idxrel to be an index
+ * that planner would choose if the searchslot includes all the columns
+ * (e.g., REPLICA IDENTITY FULL on the source).
  */
-static bool
+static int
 build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
  TupleTableSlot *searchslot)


(I know this is not caused by your patch but maybe fix it at the same time?)

"This is not generic routine, it expects..." -> "This is not a generic
routine - it expects..."

~~~

2.

+ IndexInfo  *indexInfo PG_USED_FOR_ASSERTS_ONLY;
+
+ /*
+ * This attribute is an expression, and
+ * SuitablePathsForRepIdentFull() was called earlier while the
+ * index for subscriber is selected. There, the indexes comprising
+ * *only* expressions have already been eliminated.
+ *
+ * We sanity check this now.
+ */
+#ifdef USE_ASSERT_CHECKING
+ indexInfo = BuildIndexInfo(idxrel);
+ Assert(!IndexOnlyOnExpression(indexInfo));
+#endif

2a.
"while the index for subscriber is selected..." -> "when the index for
the subscriber was selected...”

~

2b.
Because there is only one declaration in this code block you could
simplify this a bit if you wanted to.

SUGGESTION
/*
 * This attribute is an expression, and
 * SuitablePathsForRepIdentFull() was called earlier while the
 * index for subscriber is selected. There, the indexes comprising
 * *only* expressions have already been eliminated.
 *
 * We sanity check this now.
 */
#ifdef USE_ASSERT_CHECKING
IndexInfo  *indexInfo = BuildIndexInfo(idxrel);
Assert(!IndexOnlyOnExpression(indexInfo));
#endif

~~~

3. src/backend/executor/execReplication.c - RelationFindReplTupleByIndex

+ /* Start an index scan. */
+ scan = index_beginscan(rel, idxrel, &snap, skey_attoff, 0);
 retry:
  found = false;

It might be better to have a blank line before that ‘retry’ label,
like in the original code.

======

4. src/backend/replication/logical/relation.c

+/* see LogicalRepPartMapEntry for details in logicalrelation.h */
 static HTAB *LogicalRepPartMap = NULL;

Personally, I'd word that something like:
"/* For LogicalRepPartMap details see LogicalRepPartMapEntry in
logicalrelation.h */"

but YMMV.

~~~

5. src/backend/replication/logical/relation.c -
GenerateDummySelectPlannerInfoForRelation

+/*
+ * This is not a generic function, helper function for
+ * GetCheapestReplicaIdentityFullPath. The function creates
+ * a dummy PlannerInfo for the given relationId as if the
+ * relation is queried with SELECT command.
+ */
+static PlannerInfo *
+GenerateDummySelectPlannerInfoForRelation(Oid relationId)

(mentioned this one in my previous post)

"This is not a generic function, helper function" -> "This is not a
generic function. It is a helper function"

~~~

6. src/backend/replication/logical/relation.c -
GetCheapestReplicaIdentityFullPath

+/*
+ * Generate all the possible paths for the given subscriber relation,
+ * for the cases that the source relation is replicated via REPLICA
+ * IDENTITY FULL. The function returns the cheapest Path among the
+ * eligible paths, see SuitablePathsForRepIdentFull().
+ *
+ * The function guarantees to return a path, because it adds sequential
+ * scan path if needed.
+ *
+ * The function assumes that all the columns will be provided during
+ * the execution phase, given that REPLICA IDENTITY FULL guarantees
+ * that.
+ */
+static Path *
+GetCheapestReplicaIdentityFullPath(Relation localrel)


"for the cases that..." -> "for cases where..."

~~~

7.

+ /*
+ * Currently it is not possible for planner to pick a partial index or
+ * indexes only on expressions. We still want to be explicit and eliminate
+ * such paths proactively.

"for planner..." -> "for the planner..."

======

8. .../subscription/t/032_subscribe_use_index.pl - general

8a.
(remove the 'we')
"# we wait until..." -> "# wait until..." X many occurrences

~

8b.
(remove the 'we')
"# we show that..." -> “# show that..." X many occurrences

~~~

9.

There is inconsistent wording for some of your test case start/end comments

9a.
e.g.
start: SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS
end: SUBSCRIPTION USES INDEX MODIFIES MULTIPLE ROWS

~

9b.
e.g.
start: SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS
end: SUBSCRIPTION USES INDEX MODIFIES MULTIPLE ROWS

~~~

10.

I did not really understand the point of having special subscription names
tap_sub_rep_full_0
tap_sub_rep_full_2
tap_sub_rep_full_3
tap_sub_rep_full_4
etc...

Since you drop/recreate these for each test case can't they just be
called 'tap_sub_rep_full'?

~~~

11. SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS

+# updates 200 rows
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");

The comment says update but this is doing delete


~~~

12. SUBSCRIPTION USES INDEX WITH DROPPED COLUMNS

+# cleanup sub
+$node_subscriber->safe_psql('postgres',
+ "DROP SUBSCRIPTION tap_sub_rep_full_4");

Unusual wrapping?

~~~

13. SUBSCRIPTION USES INDEX ON PARTITIONED TABLES

+# updates rows and moves between partitions
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM users_table_part WHERE user_id = 1 and value_1 = 1;");
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM users_table_part WHERE user_id = 12 and value_1 = 12;");

The comment says update but SQL says delete

~~~

14. SUBSCRIPTION CAN USE INDEXES WITH EXPRESSIONS AND COLUMNS

+# update 1 row and delete 1 row using index_b, so index_a still has 2 idx_scan
+$node_subscriber->poll_query_until(
+ 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
indexrelname = 'index_a';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full_0 updates two rows via index scan with index on high
cardinality column-3";
+

The comment seems misplaced. Doesn't it belong on the lines above this
where the update/delete is being done?

~~~

15. SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE - INHERITED TABLE

+# ANALYZING child will change the index used on child_1 and going to
use index_on_child_1_b
+$node_subscriber->safe_psql('postgres', "ANALYZE child_1");

Should the comment say 'child_1' instead of child?

------
Kind Regards,
Peter Smith.
Fujitsu Australia



Hi Peter,

Thanks for the quick response.


1. Commit message

It looks like some small mistake happened. You wrote [1] that my
previous review comments about the commit message were fixed, but it
seems the v11 commit message is unchanged since v10.


Oops, yes you are right, I forgot to push commit message changes. I'll incorporate all these suggestions on v12.

 
======

2. src/backend/replication/logical/relation.c -
GenerateDummySelectPlannerInfoForRelation

+/*
+ * This is not a generic function, helper function for
+ * GetCheapestReplicaIdentityFullPath. The function creates
+ * a dummy PlannerInfo for the given relationId as if the
+ * relation is queried with SELECT command.
+ */
+static PlannerInfo *
+GenerateDummySelectPlannerInfoForRelation(Oid relationId)

"generic function, helper function" -> "generic function. It is a
helper function"

 
Fixed.

I'll attach the changes in the next email with v12.

Thanks,
Onder 
Hi Peter,

 

1. src/backend/executor/execReplication.c - build_replindex_scan_key

- * This is not generic routine, it expects the idxrel to be replication
- * identity of a rel and meet all limitations associated with that.
+ * This is not generic routine, it expects the idxrel to be an index
+ * that planner would choose if the searchslot includes all the columns
+ * (e.g., REPLICA IDENTITY FULL on the source).
  */
-static bool
+static int
 build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
  TupleTableSlot *searchslot)


(I know this is not caused by your patch but maybe fix it at the same time?)

"This is not generic routine, it expects..." -> "This is not a generic
routine - it expects..."


Fixed
 

2.

+ IndexInfo  *indexInfo PG_USED_FOR_ASSERTS_ONLY;
+
+ /*
+ * This attribute is an expression, and
+ * SuitablePathsForRepIdentFull() was called earlier while the
+ * index for subscriber is selected. There, the indexes comprising
+ * *only* expressions have already been eliminated.
+ *
+ * We sanity check this now.
+ */
+#ifdef USE_ASSERT_CHECKING
+ indexInfo = BuildIndexInfo(idxrel);
+ Assert(!IndexOnlyOnExpression(indexInfo));
+#endif

2a.
"while the index for subscriber is selected..." -> "when the index for
the subscriber was selected...”


fixed
 
~

2b.
Because there is only one declaration in this code block you could
simplify this a bit if you wanted to.

SUGGESTION
/*
 * This attribute is an expression, and
 * SuitablePathsForRepIdentFull() was called earlier while the
 * index for subscriber is selected. There, the indexes comprising
 * *only* expressions have already been eliminated.
 *
 * We sanity check this now.
 */
#ifdef USE_ASSERT_CHECKING
IndexInfo  *indexInfo = BuildIndexInfo(idxrel);
Assert(!IndexOnlyOnExpression(indexInfo));
#endif


Makes sense, no reason to declare above
 
~~~

3. src/backend/executor/execReplication.c - RelationFindReplTupleByIndex

+ /* Start an index scan. */
+ scan = index_beginscan(rel, idxrel, &snap, skey_attoff, 0);
 retry:
  found = false;

It might be better to have a blank line before that ‘retry’ label,
like in the original code.

agreed, fixed
 

======

4. src/backend/replication/logical/relation.c

+/* see LogicalRepPartMapEntry for details in logicalrelation.h */
 static HTAB *LogicalRepPartMap = NULL;

Personally, I'd word that something like:
"/* For LogicalRepPartMap details see LogicalRepPartMapEntry in
logicalrelation.h */"

but YMMV.

I also don't have any strong opinions on that, updated to your suggestion.
 

~~~

5. src/backend/replication/logical/relation.c -
GenerateDummySelectPlannerInfoForRelation

+/*
+ * This is not a generic function, helper function for
+ * GetCheapestReplicaIdentityFullPath. The function creates
+ * a dummy PlannerInfo for the given relationId as if the
+ * relation is queried with SELECT command.
+ */
+static PlannerInfo *
+GenerateDummySelectPlannerInfoForRelation(Oid relationId)

(mentioned this one in my previous post)

"This is not a generic function, helper function" -> "This is not a
generic function. It is a helper function"

Yes, applied. 
 

~~~

6. src/backend/replication/logical/relation.c -
GetCheapestReplicaIdentityFullPath

+/*
+ * Generate all the possible paths for the given subscriber relation,
+ * for the cases that the source relation is replicated via REPLICA
+ * IDENTITY FULL. The function returns the cheapest Path among the
+ * eligible paths, see SuitablePathsForRepIdentFull().
+ *
+ * The function guarantees to return a path, because it adds sequential
+ * scan path if needed.
+ *
+ * The function assumes that all the columns will be provided during
+ * the execution phase, given that REPLICA IDENTITY FULL guarantees
+ * that.
+ */
+static Path *
+GetCheapestReplicaIdentityFullPath(Relation localrel)


"for the cases that..." -> "for cases where..."


sounds good 
 
~~~

7.

+ /*
+ * Currently it is not possible for planner to pick a partial index or
+ * indexes only on expressions. We still want to be explicit and eliminate
+ * such paths proactively.

"for planner..." -> "for the planner..."

fixed 
 

======

8. .../subscription/t/032_subscribe_use_index.pl - general

8a.
(remove the 'we')
"# we wait until..." -> "# wait until..." X many occurrences

~

8b.
(remove the 'we')
"# we show that..." -> “# show that..." X many occurrences

Ok, removed all "we"s in the test 

~~~

9.

There is inconsistent wording for some of your test case start/end comments

9a.
e.g.
start: SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS
end: SUBSCRIPTION USES INDEX MODIFIES MULTIPLE ROWS

~

9b.
e.g.
start: SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS
end: SUBSCRIPTION USES INDEX MODIFIES MULTIPLE ROWS


thanks, fixed all

 
~~~

10.

I did not really understand the point of having special subscription names
tap_sub_rep_full_0
tap_sub_rep_full_2
tap_sub_rep_full_3
tap_sub_rep_full_4
etc...

Since you drop/recreate these for each test case can't they just be
called 'tap_sub_rep_full'?


There is no special reason for that, updated all to tap_sub_rep_full. 

I think I initially made it in order to distinguish certain error messages in the tests, but then we already have unique messages regardless of the subscription name.
 
~~~

11. SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS

+# updates 200 rows
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");

The comment says update but this is doing delete


fixed
 

~~~

12. SUBSCRIPTION USES INDEX WITH DROPPED COLUMNS

+# cleanup sub
+$node_subscriber->safe_psql('postgres',
+ "DROP SUBSCRIPTION tap_sub_rep_full_4");

Unusual wrapping?

Fixed 
 

~~~

13. SUBSCRIPTION USES INDEX ON PARTITIONED TABLES

+# updates rows and moves between partitions
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM users_table_part WHERE user_id = 1 and value_1 = 1;");
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM users_table_part WHERE user_id = 12 and value_1 = 12;");

The comment says update but SQL says delete


fixed
 
~~~

14. SUBSCRIPTION CAN USE INDEXES WITH EXPRESSIONS AND COLUMNS

+# update 1 row and delete 1 row using index_b, so index_a still has 2 idx_scan
+$node_subscriber->poll_query_until(
+ 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
indexrelname = 'index_a';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full_0 updates two rows via index scan with index on high
cardinality column-3";
+

The comment seems misplaced. Doesn't it belong on the lines above this
where the update/delete is being done?

 
Yes, it seems so. moved
 
~~~

15. SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE - INHERITED TABLE

+# ANALYZING child will change the index used on child_1 and going to
use index_on_child_1_b
+$node_subscriber->safe_psql('postgres', "ANALYZE child_1");

Should the comment say 'child_1' instead of child?

------

Seems better, changed.

Thanks for the reviews, attached v12.

Onder Kalaci

 
Attachment
Hi Onder,

Thanks for addressing all my previous feedback. I checked the latest
v12-0001, and have no more comments at this time.

One last thing - do you think there is any need to mention this
behaviour in the pgdocs, or is OK just to be a hidden performance
improvement?

------
Kind Regards,
Peter Smith.
Fujitsu Australia



RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"kuroda.hayato@fujitsu.com"
Date:
> One last thing - do you think there is any need to mention this
> behaviour in the pgdocs, or is OK just to be a hidden performance
> improvement?

FYI - I put my opinion.
We have following sentence in the logical-replication.sgml:

```
...
If the table does not have any suitable key, then it can be set
   to replica identity <quote>full</quote>, which means the entire row becomes
   the key.  This, however, is very inefficient and should only be used as a
   fallback if no other solution is possible.
...
```

Here the word "very inefficient" may mean that sequential scans will be executed every time.
I think some descriptions can be added around here.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"wangw.fnst@fujitsu.com"
Date:
On Tues, Sep 20, 2022 at 18:30 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> Thanks for the reviews, attached v12.

Thanks for your patch. Here is a question and a comment:

1. In the function GetCheapestReplicaIdentityFullPath.
+    if (rel->pathlist == NIL)
+    {
+        /*
+         * A sequential scan could have been dominated by by an index scan
+         * during make_one_rel(). We should always have a sequential scan
+         * before set_cheapest().
+         */
+        Path       *seqScanPath = create_seqscan_path(root, rel, NULL, 0);
+
+        add_path(rel, seqScanPath);
+    }

This is a question I'm not sure about:
Do we need this part to add sequential scan?

I think in our case, the sequential scan seems to have been added by the
function make_one_rel (see function set_plain_rel_pathlist). If I am missing
something, please let me know. BTW, there is a typo in above comment: `by by`.

2. In the file execReplication.c.
+#ifdef USE_ASSERT_CHECKING
+#include "catalog/index.h"
+#endif
 #include "commands/trigger.h"
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "nodes/nodeFuncs.h"
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
+#ifdef USE_ASSERT_CHECKING
+#include "replication/logicalrelation.h"
+#endif

I think it's fine to only add `logicalrelation.h` here, because `index.h` has
been added by `logicalrelation.h`.

Regards,
Wang wei

Hi Peter, Kuroda

kuroda.hayato@fujitsu.com <kuroda.hayato@fujitsu.com>, 21 Eyl 2022 Çar, 04:21 tarihinde şunu yazdı:
> One last thing - do you think there is any need to mention this
> behaviour in the pgdocs, or is OK just to be a hidden performance
> improvement?

FYI - I put my opinion.
We have following sentence in the logical-replication.sgml:

```
...
If the table does not have any suitable key, then it can be set
   to replica identity <quote>full</quote>, which means the entire row becomes
   the key.  This, however, is very inefficient and should only be used as a
   fallback if no other solution is possible.
...
```

Here the word "very inefficient" may mean that sequential scans will be executed every time.
I think some descriptions can be added around here.

Making a small edit in that file makes sense. I'll attach v13 in the next email that also includes this change.

Also, do you think is this a good time for me to mark the patch "Ready for committer" in the commit fest? Not sure when and who should change the state, but it seems I can change. I couldn't find any documentation on how that process should work.

Thanks!
Hii Wang wei,

1. In the function GetCheapestReplicaIdentityFullPath.
+       if (rel->pathlist == NIL)
+       {
+               /*
+                * A sequential scan could have been dominated by by an index scan
+                * during make_one_rel(). We should always have a sequential scan
+                * before set_cheapest().
+                */
+               Path       *seqScanPath = create_seqscan_path(root, rel, NULL, 0);
+
+               add_path(rel, seqScanPath);
+       }

This is a question I'm not sure about:
Do we need this part to add sequential scan?

I think in our case, the sequential scan seems to have been added by the
function make_one_rel (see function set_plain_rel_pathlist).

Yes, the sequential scan is added during make_one_rel. 
 
If I am missing
something, please let me know. BTW, there is a typo in above comment: `by by`.

As the comment mentions, the sequential scan could have been dominated & removed by index scan, see add_path():

> *We also remove from the rel's pathlist any old paths that are dominated 
*  by new_path --- that is, new_path is cheaper, at least as well ordered, 
*  generates no more rows, requires no outer rels not required by the old 
*  path, and is no less parallel-safe.

Still, I agree that the comment could be improved, which I pushed.


2. In the file execReplication.c.
+#ifdef USE_ASSERT_CHECKING
+#include "catalog/index.h"
+#endif
 #include "commands/trigger.h"
 #include "executor/executor.h"
 #include "executor/nodeModifyTable.h"
 #include "nodes/nodeFuncs.h"
 #include "parser/parse_relation.h"
 #include "parser/parsetree.h"
+#ifdef USE_ASSERT_CHECKING
+#include "replication/logicalrelation.h"
+#endif

I think it's fine to only add `logicalrelation.h` here, because `index.h` has
been added by `logicalrelation.h`.


Makes sense, removed thanks. 

Attached v13.


Attachment
On Thu, Sep 22, 2022 at 9:44 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Also, do you think is this a good time for me to mark the patch "Ready for committer" in the commit fest? Not sure
whenand who should change the state, but it seems I can change. I couldn't find any documentation on how that process
shouldwork. 
>

Normally, the reviewers mark it as "Ready for committer".

--
With Regards,
Amit Kapila.



RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"kuroda.hayato@fujitsu.com"
Date:
Dear Önder:

Thank you for updating patch! 
Your documentation seems OK, and I could not find any other places to be added

Followings are my comments.

====
01 relation.c - general

Many files are newly included.
I was not sure but some codes related with planner may be able to move to src/backend/optimizer/plan.
How do you and any other one think?

02 relation.c - FindLogicalRepUsableIndex

```
+/*
+ * Returns an index oid if we can use an index for the apply side. If not,
+ * returns InvalidOid.
+ */
+static Oid
+FindLogicalRepUsableIndex(Relation localrel, LogicalRepRelation *remoterel)
```

I grepped files, but I cannot find the word "apply side". How about "subscriber" instead?

03 relation.c - FindLogicalRepUsableIndex

```
+       /* Simple case, we already have an identity or pkey */
+       idxoid = GetRelationIdentityOrPK(localrel);
+       if (OidIsValid(idxoid))
+               return idxoid;
+
+       /*
+        * If index scans are disabled, use a sequential scan.
+        *
+        * Note that we still allowed index scans above when there is a primary
+        * key or unique index replica identity, but that is the legacy behaviour
+        * (even when enable_indexscan is false), so we hesitate to move this
+        * enable_indexscan check to be done earlier in this function.
+        */
+       if (!enable_indexscan)
+               return InvalidOid;
```

a. 
I think "identity or pkey" should be "replica identity key or primary key" or "RI or PK"

b. 
Later part should be at around GetRelationIdentityOrPK.


04 relation.c - FindUsableIndexForReplicaIdentityFull

```
+       MemoryContext usableIndexContext;
...
+       usableIndexContext = AllocSetContextCreate(CurrentMemoryContext,
+                                                                                          "usableIndexContext",
+                                                                                          ALLOCSET_DEFAULT_SIZES);
```

I grepped other sources, and I found that the name like "tmpcxt" is used for the temporary MemoryContext.

05 relation.c - SuitablePathsForRepIdentFull

```
+                       indexRelation = index_open(idxoid, AccessShareLock);
+                       indexInfo = BuildIndexInfo(indexRelation);
+                       is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
+                       is_partial = (indexInfo->ii_Predicate != NIL);
+                       is_only_on_expression = IndexOnlyOnExpression(indexInfo);
+                       index_close(indexRelation, NoLock);
```

Why the index is closed with NoLock? AccessShareLock is acquired, so shouldn't same lock be released?


06 relation.c - GetCheapestReplicaIdentityFullPath

IIUC a query like "SELECT tbl WHERE attr1 = $1 AND attr2 = $2 ... AND attrN = $N" is emulated, right?
you can write explicitly it as comment

07 relation.c - GetCheapestReplicaIdentityFullPath

```
+               Path       *path = (Path *) lfirst(lc);
+               Oid                     idxoid = GetIndexOidFromPath(path);
+
+               if (!OidIsValid(idxoid))
+               {
+                       /* Unrelated Path, skip */
+                       suitableIndexList = lappend(suitableIndexList, path);
+               }
```

I was not clear about here. IIUC in the function we want to extract "good" scan plan and based on that the cheapest one
ischosen. 
 
GetIndexOidFromPath() seems to return InvalidOid when the input path is not index scan, so why is it appended to the
suitablelist?
 


===
08 worker.c - usable_indexoid_internal

I think this is not "internal" function, such name should be used for like "apply_handle_commit" -
"apply_handle_commit_internal",or "apply_handle_insert" - "apply_handle_insert_internal".
 
How about "get_usable_index" or something?

09 worker.c - usable_indexoid_internal

```
+       Oid                     targetrelid = targetResultRelInfo->ri_RelationDesc->rd_rel->oid;
+       Oid                     localrelid = relinfo->ri_RelationDesc->rd_id;
+
+       if (targetrelid != localrelid)
```

I think these lines are very confusable.
IIUC targetrelid is corresponded to the "parent", and localrelid is corresponded to the "child", right?
How about changing name to "partitionedoid" and "leadoid" or something?

===
10 032_subscribe_use_index.pl

```
# create tables pub and sub
$node_publisher->safe_psql('postgres',
    "CREATE TABLE test_replica_id_full (x int)");
$node_publisher->safe_psql('postgres',
    "ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;");
$node_subscriber->safe_psql('postgres',
    "CREATE TABLE test_replica_id_full (x int)");
$node_subscriber->safe_psql('postgres',
    "CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x)");
```

In many places same table is defined, altered as "REPLICA IDENTITY FULL", and index is created.
Could you combine them into function?

11 032_subscribe_use_index.pl

```
# wait until the index is used on the subscriber
$node_subscriber->poll_query_until(
    'postgres', q{select (idx_scan = 1) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full_0 updates one row via index";
```

In many places this check is done. Could you combine them into function?

12 032_subscribe_use_index.pl

```
# create pub/sub
$node_publisher->safe_psql('postgres',
    "CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full");
$node_subscriber->safe_psql('postgres',
    "CREATE SUBSCRIPTION tap_sub_rep_full CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION
tap_pub_rep_full"
);
```

Same as above

13 032_subscribe_use_index.pl

```
# cleanup pub
$node_publisher->safe_psql('postgres', "DROP PUBLICATION tap_pub_rep_full");
$node_publisher->safe_psql('postgres', "DROP TABLE test_replica_id_full");
# cleanup sub
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_rep_full");
$node_subscriber->safe_psql('postgres', "DROP TABLE test_replica_id_full");
```

Same as above

14 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX

```
# make sure that the subscriber has the correct data
my $result = $node_subscriber->safe_psql('postgres',
    "SELECT sum(x) FROM test_replica_id_full");
is($result, qq(212), 'ensure subscriber has the correct data at the end of the test');

$node_subscriber->poll_query_until(
    'postgres', q{select sum(x)=212 AND count(*)=21 AND count(DISTINCT x)=20 FROM test_replica_id_full;}
) or die "ensure subscriber has the correct data at the end of the test";
```

I think first one is not needed.


15 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS

```
# insert some initial data within the range 0-20
$node_publisher->safe_psql('postgres',
    "INSERT INTO test_replica_id_full SELECT i%20 FROM generate_series(0,1000)i;"
);
```

I think data is within the range 0-19.
(There are some mistakes)

===
16 test/subscription/meson.build

Your test 't/032_subscribe_use_index.pl' must be added in the 'tests' for meson build system.
(I checked on my env, and your test works well)

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Hi Hayato Kuroda,

Thanks for the review!


====
01 relation.c - general

Many files are newly included.
I was not sure but some codes related with planner may be able to move to src/backend/optimizer/plan.
How do you and any other one think?


My thinking on those functions is that they should probably stay in src/backend/replication/logical/relation.c. My main motivation is that those functions are so much tailored to the purposes of this file that I cannot see any use-case for these functions in any other context.

Still, at some point, I considered maybe doing something similar to src/backend/executor/execReplication.c, where I create a new file, say, src/backend/optimizer/plan/planReplication.c or such as you noted. I'm a bit torn on this. 

Does anyone have any strong opinions for moving to src/backend/optimizer/plan/planReplication.c? (or another name)
 
02 relation.c - FindLogicalRepUsableIndex

```
+/*
+ * Returns an index oid if we can use an index for the apply side. If not,
+ * returns InvalidOid.
+ */
+static Oid
+FindLogicalRepUsableIndex(Relation localrel, LogicalRepRelation *remoterel)
```

I grepped files, but I cannot find the word "apply side". How about "subscriber" instead?

Yes, it makes sense. I guess I made up the "apply side" as there is the concept of "apply worker". But, yes, subscribers sound better, updated.
 

03 relation.c - FindLogicalRepUsableIndex

```
+       /* Simple case, we already have an identity or pkey */
+       idxoid = GetRelationIdentityOrPK(localrel);
+       if (OidIsValid(idxoid))
+               return idxoid;
+
+       /*
+        * If index scans are disabled, use a sequential scan.
+        *
+        * Note that we still allowed index scans above when there is a primary
+        * key or unique index replica identity, but that is the legacy behaviour
+        * (even when enable_indexscan is false), so we hesitate to move this
+        * enable_indexscan check to be done earlier in this function.
+        */
+       if (!enable_indexscan)
+               return InvalidOid;
```

a.
I think "identity or pkey" should be "replica identity key or primary key" or "RI or PK"

Looking into other places, it seems "replica identity index" is favored over "replica identity key". So, I used that term.

You can see this pattern in RelationGetReplicaIndex()
 

b.
Later part should be at around GetRelationIdentityOrPK.

Hmm, I cannot follow this comment. Can you please clarify?
 


04 relation.c - FindUsableIndexForReplicaIdentityFull

```
+       MemoryContext usableIndexContext;
...
+       usableIndexContext = AllocSetContextCreate(CurrentMemoryContext,
+                                                                                          "usableIndexContext",
+                                                                                          ALLOCSET_DEFAULT_SIZES);
```

I grepped other sources, and I found that the name like "tmpcxt" is used for the temporary MemoryContext.

I think there are also several contextes that are named more specifically, such as new_pdcxt, perTupCxt, anl_context, cluster_context and many others.

So, I think it is better to have specific names, no?
 

05 relation.c - SuitablePathsForRepIdentFull

```
+                       indexRelation = index_open(idxoid, AccessShareLock);
+                       indexInfo = BuildIndexInfo(indexRelation);
+                       is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
+                       is_partial = (indexInfo->ii_Predicate != NIL);
+                       is_only_on_expression = IndexOnlyOnExpression(indexInfo);
+                       index_close(indexRelation, NoLock);
```

Why the index is closed with NoLock? AccessShareLock is acquired, so shouldn't same lock be released?

Hmm, yes you are right. Keeping the lock seems unnecessary and wrong. It could actually have prevented dropping an index. However, given that RelationFindReplTupleByIndex() also closes this index at the end, the apply worker releases the lock. Hence, no problem observed.

Anyway, I'm still changing it to releasing the lock.

Also note that as soon as any index is dropped on the relation, the cache is invalidated and suitable indexes are re-calculated. That's why it seems fine to release the lock. 
 


06 relation.c - GetCheapestReplicaIdentityFullPath

IIUC a query like "SELECT tbl WHERE attr1 = $1 AND attr2 = $2 ... AND attrN = $N" is emulated, right?
you can write explicitly it as comment


The inlined comment in the function has a similar comment. Is that clear enough?

/* * Generate restrictions for all columns in the form of col_1 = $1 AND * col_2 = $2 ... */
 
07 relation.c - GetCheapestReplicaIdentityFullPath

```
+               Path       *path = (Path *) lfirst(lc);
+               Oid                     idxoid = GetIndexOidFromPath(path);
+
+               if (!OidIsValid(idxoid))
+               {
+                       /* Unrelated Path, skip */
+                       suitableIndexList = lappend(suitableIndexList, path);
+               }
```

I was not clear about here. IIUC in the function we want to extract "good" scan plan and based on that the cheapest one is chosen.
GetIndexOidFromPath() seems to return InvalidOid when the input path is not index scan, so why is it appended to the suitable list?


It could be a sequential scan that we have fall-back. However, we already add the sequential scan at the end of the function. So, actually you are right, there is no need to keep any other paths here. Adjusted the comments.
 

===
08 worker.c - usable_indexoid_internal

I think this is not "internal" function, such name should be used for like "apply_handle_commit" - "apply_handle_commit_internal", or "apply_handle_insert" - "apply_handle_insert_internal".
How about "get_usable_index" or something?

Yeah, you are right. I use this function inside functions ending with _internal, but this one is clearly not an internal function. I used get_usable_indexoid().
 

09 worker.c - usable_indexoid_internal

```
+       Oid                     targetrelid = targetResultRelInfo->ri_RelationDesc->rd_rel->oid;
+       Oid                     localrelid = relinfo->ri_RelationDesc->rd_id;
+
+       if (targetrelid != localrelid)
```

I think these lines are very confusable.
IIUC targetrelid is corresponded to the "parent", and localrelid is corresponded to the "child", right?
How about changing name to "partitionedoid" and "leadoid" or something?

We do not know whether targetrelid is definitely a "parent". But, if that is a parent, this function fetches the relevant partition's usableIndexOid. So, I'm not convinced that "parent" is a good choice.

Though, I agree that we can improve the code a bit. I now use targetrelkind and dropped localrelid to check whether the target is a partitioned table. Is this better?
 
 

===
10 032_subscribe_use_index.pl

```
# create tables pub and sub
$node_publisher->safe_psql('postgres',
        "CREATE TABLE test_replica_id_full (x int)");
$node_publisher->safe_psql('postgres',
        "ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;");
$node_subscriber->safe_psql('postgres',
        "CREATE TABLE test_replica_id_full (x int)");
$node_subscriber->safe_psql('postgres',
        "CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x)");
```

In many places same table is defined, altered as "REPLICA IDENTITY FULL", and index is created.
Could you combine them into function?

Well, I'm not sure if it is worth the complexity. There are only 4 usages of the same table, and these are all pretty simple statements, and all other tests seem to have a similar pattern. I have not seen any tests where these simple statements are done in a function even if they are repeated. I'd rather keep it so that this doesn't lead to other style discussions?
 

11 032_subscribe_use_index.pl

```
# wait until the index is used on the subscriber
$node_subscriber->poll_query_until(
        'postgres', q{select (idx_scan = 1) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full_0 updates one row via index";
```

In many places this check is done. Could you combine them into function?

I'm a little confused. Isn't that already inside a function (e.g., poll_query_until) ? Can you please clarify this suggestion a bit more? 
 

12 032_subscribe_use_index.pl

```
# create pub/sub
$node_publisher->safe_psql('postgres',
        "CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full");
$node_subscriber->safe_psql('postgres',
        "CREATE SUBSCRIPTION tap_sub_rep_full CONNECTION '$publisher_connstr application_name=$appname' PUBLICATION tap_pub_rep_full"
);
```

Same as above

Well, again, I'm not sure if it is worth moving these simple statements to functions as an improvement here. One might tell that it is better to see the statements explicitly on the test -- which almost all the tests do. I want to avoid introducing some unusual pattern to the tests.
 

13 032_subscribe_use_index.pl

```
# cleanup pub
$node_publisher->safe_psql('postgres', "DROP PUBLICATION tap_pub_rep_full");
$node_publisher->safe_psql('postgres', "DROP TABLE test_replica_id_full");
# cleanup sub
$node_subscriber->safe_psql('postgres', "DROP SUBSCRIPTION tap_sub_rep_full");
$node_subscriber->safe_psql('postgres', "DROP TABLE test_replica_id_full");
```

Same as above

Same as above :) 
 

14 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX

```
# make sure that the subscriber has the correct data
my $result = $node_subscriber->safe_psql('postgres',
        "SELECT sum(x) FROM test_replica_id_full");
is($result, qq(212), 'ensure subscriber has the correct data at the end of the test');

$node_subscriber->poll_query_until(
        'postgres', q{select sum(x)=212 AND count(*)=21 AND count(DISTINCT x)=20 FROM test_replica_id_full;}
) or die "ensure subscriber has the correct data at the end of the test";
```

I think first one is not needed.

I preferred to keep the second one because is($result, .. is needed for tests to show the progress while running.
 


15 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS

```
# insert some initial data within the range 0-20
$node_publisher->safe_psql('postgres',
        "INSERT INTO test_replica_id_full SELECT i%20 FROM generate_series(0,1000)i;"
);
```

I think data is within the range 0-19.
(There are some mistakes)

Yes, I fixed it all.

 

===
16 test/subscription/meson.build

Your test 't/032_subscribe_use_index.pl' must be added in the 'tests' for meson build system.
(I checked on my env, and your test works well)

 
Oh, I didn't know about this, thanks!

Attached v14.

Thanks,
Onder
 
Attachment

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"kuroda.hayato@fujitsu.com"
Date:
Dear Önder,

Thank you for updating the patch! At first I replied to your comments.

> My thinking on those functions is that they should probably stay
> in src/backend/replication/logical/relation.c. My main motivation is that
> those functions are so much tailored to the purposes of this file that I
> cannot see any use-case for these functions in any other context.

I was not sure what should be, but I agreed that functions will be not used from other parts.

> Hmm, I cannot follow this comment. Can you please clarify?

In your patch:

```
+       /* Simple case, we already have a primary key or a replica identity index */
+       idxoid = GetRelationIdentityOrPK(localrel);
+       if (OidIsValid(idxoid))
+               return idxoid;
+
+       /*
+        * If index scans are disabled, use a sequential scan.
+        *
+        * Note that we still allowed index scans above when there is a primary
+        * key or unique index replica identity, but that is the legacy behaviour
+        * (even when enable_indexscan is false), so we hesitate to move this
+        * enable_indexscan check to be done earlier in this function.
+        */ 
```

And the paragraph " Note that we..." should be at above of GetRelationIdentityOrPK().
Future readers will read the function from top to bottom,
and when they read around GetRelationIdentityOrPK() they may be confused.

> So, I think it is better to have specific names, no?

OK.

> The inlined comment in the function has a similar comment. Is that clear
> enough?
> /* * Generate restrictions for all columns in the form of col_1 = $1 AND *
> col_2 = $2 ... */

Actually I missed it, but I still think that whole of emulated SQL should be clarified. 

> Though, I agree that we can improve the code a bit. I now
> use targetrelkind and dropped localrelid to check whether the target is a
> partitioned table. Is this better?

Great improvement. Genus!

> Well, I'm not sure if it is worth the complexity. There are only 4 usages
> of the same table, and these are all pretty simple statements, and all
> other tests seem to have a similar pattern. I have not seen any tests where
> these simple statements are done in a function even if they are repeated.
> I'd rather keep it so that this doesn't lead to other style discussions?

If other tests do not combine such parts, it's OK.
My motivation of these comments were to reduce the number of row for the test code.

> Oh, I didn't know about this, thanks!

Now meson test system do your test. OK.


And followings are the comments for v14. They are mainly about comments.

===
01. relation.c - logicalrep_rel_open

```
+               /*
+                * Finding a usable index is an infrequent task. It occurs when an
+                * operation is first performed on the relation, or after invalidation
+                * of the relation cache entry (e.g., such as ANALYZE).
+                */
+               entry->usableIndexOid = FindLogicalRepUsableIndex(entry->localrel, remoterel);
```

I thought you can mention CREATE INDEX in the comment.

According to your analysis [1] the relation cache will be invalidated if users do CREATE INDEX
At that time the hash entry will be removed (logicalrep_relmap_invalidate_cb) and "usable" index
will be checked again.

~~~
02. relation.c - logicalrep_partition_open

```
+       /*
+        * Finding a usable index is an infrequent task. It occurs when an
+        * operation is first performed on the relation, or after invalidation of
+        * the relation cache entry (e.g., such as ANALYZE).
+        */
+       entry->usableIndexOid = FindLogicalRepUsableIndex(partrel, remoterel);
+
```

Same as above

~~~
03. relation.c - GetIndexOidFromPath

```
+       if (path->pathtype == T_IndexScan || path->pathtype == T_IndexOnlyScan)
+       {
+               IndexPath  *index_sc = (IndexPath *) path;
+
+               return index_sc->indexinfo->indexoid;
+       }
```

I thought Assert(OidIsValid(indexoid)) may be added here. Or is it quite trivial?

~~~
04. relation.c - IndexOnlyOnExpression

This method just returns "yes" or "no", so the name of method should be start "Has" or "Is".

~~~
05. relation.c - SuitablePathsForRepIdentFull

```
+/*
+ * Iterates over the input path list and returns another
+ * path list that includes index [only] scans where paths
+ * with non-btree indexes, partial indexes or
+ * indexes on only expressions have been removed.
+ */
```

These lines seems to be around 60 columns. Could you expand around 80?

~~~
06. relation.c - SuitablePathsForRepIdentFull

```
+                       Relation        indexRelation;
+                       IndexInfo  *indexInfo;
+                       bool            is_btree;
+                       bool            is_partial;
+                       bool            is_only_on_expression;
+
+                       indexRelation = index_open(idxoid, AccessShareLock);
+                       indexInfo = BuildIndexInfo(indexRelation);
+                       is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
+                       is_partial = (indexInfo->ii_Predicate != NIL);
+                       is_only_on_expression = IndexOnlyOnExpression(indexInfo);
+                       index_close(indexRelation, AccessShareLock);
+
+                       if (is_btree && !is_partial && !is_only_on_expression)
+                               suitableIndexList = lappend(suitableIndexList, path);
```

Please add a comment like "eliminating not suitable path" or something.

~~~
07. relation.c - GenerateDummySelectPlannerInfoForRelation

```
+/*
+ * This is not a generic function. It is a helper function
+ * for GetCheapestReplicaIdentityFullPath. The function
+ * creates a dummy PlannerInfo for the given relationId
+ * as if the relation is queried with SELECT command.
+ */
```

These lines seems to be around 60 columns. Could you expand around 80?

~~~
08. relation.c - FindLogicalRepUsableIndex

```
+/*
+ * Returns an index oid if we can use an index for subscriber . If not,
+ * returns InvalidOid.
+ */
```

"subscriber ." should be "subscriber.", blank is not needed.

~~~
09. worker.c - get_usable_indexoid

```
+               Assert(targetResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
+                          RELKIND_PARTITIONED_TABLE);
```

I thought this assertion seems to be not needed, because this is completely same as the condition of if-statement.

[1] https://www.postgresql.org/message-id/CACawEhXgP_Kj_1iyNAp16MYos4Anrtz%2BOZVtj2z-QOPGdPCt_A%40mail.gmail.com


Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Hi Kurado Hayato,


In your patch:

```
+       /* Simple case, we already have a primary key or a replica identity index */
+       idxoid = GetRelationIdentityOrPK(localrel);
+       if (OidIsValid(idxoid))
+               return idxoid;
+
+       /*
+        * If index scans are disabled, use a sequential scan.
+        *
+        * Note that we still allowed index scans above when there is a primary
+        * key or unique index replica identity, but that is the legacy behaviour
+        * (even when enable_indexscan is false), so we hesitate to move this
+        * enable_indexscan check to be done earlier in this function.
+        */
```

And the paragraph " Note that we..." should be at above of GetRelationIdentityOrPK().
Future readers will read the function from top to bottom,
and when they read around GetRelationIdentityOrPK() they may be confused.


Ah, makes sense, now I applied your feedback (with some different wording).
 

> The inlined comment in the function has a similar comment. Is that clear
> enough?
> /* * Generate restrictions for all columns in the form of col_1 = $1 AND *
> col_2 = $2 ... */

Actually I missed it, but I still think that whole of emulated SQL should be clarified.

Alright, it makes sense. I added the emulated SQL to the function comment of GetCheapestReplicaIdentityFullPath.
 
And followings are the comments for v14. They are mainly about comments.

===
01. relation.c - logicalrep_rel_open

```
+               /*
+                * Finding a usable index is an infrequent task. It occurs when an
+                * operation is first performed on the relation, or after invalidation
+                * of the relation cache entry (e.g., such as ANALYZE).
+                */
+               entry->usableIndexOid = FindLogicalRepUsableIndex(entry->localrel, remoterel);
```

I thought you can mention CREATE INDEX in the comment.

According to your analysis [1] the relation cache will be invalidated if users do CREATE INDEX
At that time the hash entry will be removed (logicalrep_relmap_invalidate_cb) and "usable" index
will be checked again.

Yes, that is right. I think it makes sense to mention that as well. In fact, I also decided to add such a test.

I realized that all tests use ANALYZE for re-calculation of the index. Now, I added an explicit test that uses CREATE/DROP index to re-calculate the index.
 
see # Testcase start: SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP INDEX.



~~~
02. relation.c - logicalrep_partition_open

```
+       /*
+        * Finding a usable index is an infrequent task. It occurs when an
+        * operation is first performed on the relation, or after invalidation of
+        * the relation cache entry (e.g., such as ANALYZE).
+        */
+       entry->usableIndexOid = FindLogicalRepUsableIndex(partrel, remoterel);
+
```

Same as above

 
done
 
~~~
03. relation.c - GetIndexOidFromPath

```
+       if (path->pathtype == T_IndexScan || path->pathtype == T_IndexOnlyScan)
+       {
+               IndexPath  *index_sc = (IndexPath *) path;
+
+               return index_sc->indexinfo->indexoid;
+       }
```

I thought Assert(OidIsValid(indexoid)) may be added here. Or is it quite trivial?

Looking at the PG code, I couldn't see any place that asserts the information. That seems like fundamental information that is never invalid.

Btw, even if it returns InvalidOid for some reason, we'd not be crashing. Only not able to use any indexes, fall back to seq. scan.
 

~~~
04. relation.c - IndexOnlyOnExpression

This method just returns "yes" or "no", so the name of method should be start "Has" or "Is".

Yes, it seems like that is a common convention.
 
~~~
05. relation.c - SuitablePathsForRepIdentFull

```
+/*
+ * Iterates over the input path list and returns another
+ * path list that includes index [only] scans where paths
+ * with non-btree indexes, partial indexes or
+ * indexes on only expressions have been removed.
+ */
```

These lines seems to be around 60 columns. Could you expand around 80?

done
 

~~~
06. relation.c - SuitablePathsForRepIdentFull

```
+                       Relation        indexRelation;
+                       IndexInfo  *indexInfo;
+                       bool            is_btree;
+                       bool            is_partial;
+                       bool            is_only_on_expression;
+
+                       indexRelation = index_open(idxoid, AccessShareLock);
+                       indexInfo = BuildIndexInfo(indexRelation);
+                       is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
+                       is_partial = (indexInfo->ii_Predicate != NIL);
+                       is_only_on_expression = IndexOnlyOnExpression(indexInfo);
+                       index_close(indexRelation, AccessShareLock);
+
+                       if (is_btree && !is_partial && !is_only_on_expression)
+                               suitableIndexList = lappend(suitableIndexList, path);
```

Please add a comment like "eliminating not suitable path" or something.

done
 

~~~
07. relation.c - GenerateDummySelectPlannerInfoForRelation

```
+/*
+ * This is not a generic function. It is a helper function
+ * for GetCheapestReplicaIdentityFullPath. The function
+ * creates a dummy PlannerInfo for the given relationId
+ * as if the relation is queried with SELECT command.
+ */
```

These lines seems to be around 60 columns. Could you expand around 80?

done
 

~~~
08. relation.c - FindLogicalRepUsableIndex

```
+/*
+ * Returns an index oid if we can use an index for subscriber . If not,
+ * returns InvalidOid.
+ */
```

"subscriber ." should be "subscriber.", blank is not needed.

fixed
 

~~~
09. worker.c - get_usable_indexoid

```
+               Assert(targetResultRelInfo->ri_RelationDesc->rd_rel->relkind ==
+                          RELKIND_PARTITIONED_TABLE);
```

I thought this assertion seems to be not needed, because this is completely same as the condition of if-statement.

Yes, the refactor we made in the previous iteration made this assertion obsolete as you noted.
 
Attached v15, thanks for the reviews.

Thanks,
Onder KALACI
Attachment

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"kuroda.hayato@fujitsu.com"
Date:
Dear Önder,

Thanks for updating the patch! I checked yours and almost good.
Followings are just cosmetic comments.

===
01. relation.c - GetCheapestReplicaIdentityFullPath

```
     * The reason that the planner would not pick partial indexes and indexes
     * with only expressions based on the way currently baserestrictinfos are
     * formed (e.g., col_1 = $1 ... AND col_N = $2).
```

Is "col_N = $2" a typo? I think it should be "col_N = $N" or "attr1 = $1 ... AND attrN = $N".

===
02. 032_subscribe_use_index.pl

If a table has a primary key on the subscriber, it will be used even if enable_indexscan is false(legacy behavior).
Should we test it?

~~~
03. 032_subscribe_use_index.pl -  SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP INDEX

I think this test seems to be not trivial, so could you write down the motivation?

~~~
04. 032_subscribe_use_index.pl -  SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP INDEX

```
# wait until the index is created
$node_subscriber->poll_query_until(
    'postgres', q{select count(*)=1 from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full_0 updates one row via index";
```

CREATE INDEX is a synchronous behavior, right? If so we don't have to wait here.
...And the comment in case of die may be wrong.
(There are some cases like this)

~~~
05. 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS

```
# Testcase start: SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS
#
# Basic test where the subscriber uses index
# and touches 50 rows with UPDATE
```

"touches 50 rows with UPDATE" -> "updates 50 rows", per other tests.

~~~
06. 032_subscribe_use_index.pl - SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE

I think this test seems to be not trivial, so could you write down the motivation?
(Same as Re-calclate)

~~~
07. 032_subscribe_use_index.pl - SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE

```
# show that index_b is not used
$node_subscriber->poll_query_until(
    'postgres', q{select idx_scan=0 from pg_stat_all_indexes where indexrelname = 'index_b';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on
highcardinality column-2";
 
```

I think we don't have to wait here, is() should be used instead. 
poll_query_until() should be used only when idx_scan>0 is checked.
(There are some cases like this)

~~~
08. 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX ON PARTITIONED TABLES

```
# make sure that the subscriber has the correct data
$node_subscriber->poll_query_until(
    'postgres', q{select sum(user_id+value_1+value_2)=550070 AND count(DISTINCT(user_id,value_1, value_2))=981 from
users_table_part;}
) or die "ensure subscriber has the correct data at the end of the test";
```

I think we can replace it to wait_for_catchup() and is()...
Moreover, we don't have to wait here because in above line we wait until the index is used on the subscriber.
(There are some cases like this)


Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Hi Kuroda Hayato,


===
01. relation.c - GetCheapestReplicaIdentityFullPath

```
         * The reason that the planner would not pick partial indexes and indexes
         * with only expressions based on the way currently baserestrictinfos are
         * formed (e.g., col_1 = $1 ... AND col_N = $2).
```

Is "col_N = $2" a typo? I think it should be "col_N = $N" or "attr1 = $1 ... AND attrN = $N".


Yes, it is a typo, fixed now.
 
===
02. 032_subscribe_use_index.pl

If a table has a primary key on the subscriber, it will be used even if enable_indexscan is false(legacy behavior).
Should we test it?


Yes, good idea. I added two tests, one test that we cannot use regular indexes when index scan is disabled, and another one that we use replica identity index when index scan is disabled. This is useful to make sure if someone changes the behavior can see the impact.
 
~~~
03. 032_subscribe_use_index.pl -  SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP INDEX

I think this test seems to be not trivial, so could you write down the motivation?

makes sense, done
 

~~~
04. 032_subscribe_use_index.pl -  SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP INDEX

```
# wait until the index is created
$node_subscriber->poll_query_until(
        'postgres', q{select count(*)=1 from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full_0 updates one row via index";
```

CREATE INDEX is a synchronous behavior, right? If so we don't have to wait here.
...And the comment in case of die may be wrong.
(There are some cases like this)

It is not about CREATE INDEX being async. It is about pg_stat_all_indexes being async. If we do not wait, the tests become flaky, because sometimes the update has not been reflected in the view immediately.


When using the statistics to monitor collected data, it is important to realize that the information does not update instantaneously. Each individual server process transmits new statistical counts to the collector just before going idle; so a query or transaction still in progress does not affect the displayed totals. Also, the collector itself emits a new report at most once per PGSTAT_STAT_INTERVAL milliseconds (500 ms unless altered while building the server). So the displayed information lags behind actual activity. However, current-query information collected by track_activities is always up-to-date.

 

~~~
05. 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS

```
# Testcase start: SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS
#
# Basic test where the subscriber uses index
# and touches 50 rows with UPDATE
```

"touches 50 rows with UPDATE" -> "updates 50 rows", per other tests.

fixed
 
~~~
06. 032_subscribe_use_index.pl - SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE

I think this test seems to be not trivial, so could you write down the motivation?
(Same as Re-calclate)

sure, done
 

~~~
07. 032_subscribe_use_index.pl - SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE

```
# show that index_b is not used
$node_subscriber->poll_query_until(
        'postgres', q{select idx_scan=0 from pg_stat_all_indexes where indexrelname = 'index_b';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on high cardinality column-2";
```

I think we don't have to wait here, is() should be used instead.
poll_query_until() should be used only when idx_scan>0 is checked.
(There are some cases like this)

Yes, makes sense
 

~~~
08. 032_subscribe_use_index.pl - SUBSCRIPTION USES INDEX ON PARTITIONED TABLES

```
# make sure that the subscriber has the correct data
$node_subscriber->poll_query_until(
        'postgres', q{select sum(user_id+value_1+value_2)=550070 AND count(DISTINCT(user_id,value_1, value_2))=981 from users_table_part;}
) or die "ensure subscriber has the correct data at the end of the test";
```


Ah, for this case, we already have is() checks for the same results, this is just a left-over from the earlier iterations
 
I think we can replace it to wait_for_catchup() and is()...
Moreover, we don't have to wait here because in above line we wait until the index is used on the subscriber.
(There are some cases like this)

Fixed a few more such cases.

Thanks for the review! Attached v16.

Onder KALACI 
Attachment

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"kuroda.hayato@fujitsu.com"
Date:
Dear Önder,

Thank you for updating the patch!

> It is not about CREATE INDEX being async. It is about pg_stat_all_indexes
> being async. If we do not wait, the tests become flaky, because sometimes
> the update has not been reflected in the view immediately.

Make sense, I forgot how stats collector works...

Followings are comments for v16. Only for test codes.

~~~
01. 032_subscribe_use_index.pl - SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE

```
# show that index_b is not used
$node_subscriber->poll_query_until(
    'postgres', q{select idx_scan=0 from pg_stat_all_indexes where indexrelname = 'index_b';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on
highcardinality column-2";
 
```

poll_query_until() is still remained here, it should be replaced to is().


02. 032_subscribe_use_index.pl - SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN

```
# show that the unique index on replica identity is used even when enable_indexscan=false
$result = $node_subscriber->safe_psql('postgres',
    "select idx_scan from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx'");
is($result, qq(0), 'ensure subscriber has not used index with enable_indexscan=false');
```

Is the comment wrong? The index test_replica_id_full_idx is not used here.


032_subscribe_use_index.pl - SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE

```
# show that index_b is not used
$node_subscriber->poll_query_until(
    'postgres', q{select idx_scan=0 from pg_stat_all_indexes where indexrelname = 'index_b';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on
highcardinality column-2";
 
```

poll_query_until() is still remained here, it should be replaced to is()

032_subscribe_use_index.pl - SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN

```
# show that the unique index on replica identity is used even when enable_indexscan=false
$result = $node_subscriber->safe_psql('postgres',
    "select idx_scan from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx'");
is($result, qq(0), 'ensure subscriber has not used index with enable_indexscan=false');
```

Is the comment wrong? The index test_replica_id_full_idx is not used here.

03. 032_subscribe_use_index.pl - SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN

```
$node_publisher->safe_psql('postgres',
    "ALTER TABLE test_replica_id_full REPLICA IDENTITY USING INDEX test_replica_id_full_unique;");
```

I was not sure why ALTER TABLE REPLICA IDENTITY USING INDEX was done on the publisher side.
IIUC this feature works when REPLICA IDENTITY FULL is specified on a publisher,
so it might not be altered here. If so, an index does not have to define on the publisher too.

04. 032_subscribe_use_index.pl - SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN

```
$node_subscriber->poll_query_until(
    'postgres', q{select (idx_scan=1) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_unique'}
) or die "Timed out while waiting ensuring subscriber used unique index as replica identity even with
enable_indexscan=false'";
```

03 comment should be added here.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Hi,




~~~
01. 032_subscribe_use_index.pl - SUBSCRIPTION CAN UPDATE THE INDEX IT USES AFTER ANALYZE

```
# show that index_b is not used
$node_subscriber->poll_query_until(
        'postgres', q{select idx_scan=0 from pg_stat_all_indexes where indexrelname = 'index_b';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on high cardinality column-2";
```

poll_query_until() is still remained here, it should be replaced to is().



Updated 

02. 032_subscribe_use_index.pl - SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN

```
# show that the unique index on replica identity is used even when enable_indexscan=false
$result = $node_subscriber->safe_psql('postgres',
        "select idx_scan from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx'");
is($result, qq(0), 'ensure subscriber has not used index with enable_indexscan=false');
```

Is the comment wrong? The index test_replica_id_full_idx is not used here.


Yeah, the comment is wrong. It is a copy & paste error from the other test. Fixed now
 


03. 032_subscribe_use_index.pl - SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN

```
$node_publisher->safe_psql('postgres',
        "ALTER TABLE test_replica_id_full REPLICA IDENTITY USING INDEX test_replica_id_full_unique;");
```

I was not sure why ALTER TABLE REPLICA IDENTITY USING INDEX was done on the publisher side.
IIUC this feature works when REPLICA IDENTITY FULL is specified on a publisher,
so it might not be altered here. If so, an index does not have to define on the publisher too.

 
Yes, not strictly necessary but it is often the case that both subscriber and publication have the similar schemas when unique index/pkey is used. For example, see t/028_row_filter.pl where we follow this pattern. 

Still, I manually tried that without the index on the publisher (e.g., replica identity full), that works as expected. But given that the majority of the tests already have that approach and this test focuses on enable_indexscan, I think I'll keep it as is - unless it is confusing?
 
04. 032_subscribe_use_index.pl - SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN

```
$node_subscriber->poll_query_until(
        'postgres', q{select (idx_scan=1) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_unique'}
) or die "Timed out while waiting ensuring subscriber used unique index as replica identity even with enable_indexscan=false'";
```

03 comment should be added here.

Yes, done that as well.


Attached v17 now. Thanks for the review! 
Attachment

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"kuroda.hayato@fujitsu.com"
Date:
Dear Önder,

Thanks for updating the patch!

I think your saying seems reasonable.
I have no comments anymore now. Thanks for updating so quickly.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED


RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"shiy.fnst@fujitsu.com"
Date:
On Wed, Aug 24, 2022 12:25 AM Önder Kalacı <onderkalaci@gmail.com> wrote:
> Hi,
> 
> Thanks for the review!
> 

Thanks for your reply.

> 
> >
> > 1.
> > In FilterOutNotSuitablePathsForReplIdentFull(), is
> > "nonPartialIndexPathList" a
> > good name for the list? Indexes on only expressions are also be filtered.
> >
> > +static List *
> > +FilterOutNotSuitablePathsForReplIdentFull(List *pathlist)
> > +{
> > +       ListCell   *lc;
> > +       List *nonPartialIndexPathList = NIL;
> >
> >
> Yes, true. We only started filtering the non-partial ones first. Now
> changed to *suitableIndexList*, does that look right?
> 

That looks ok to me.

> 
> 
> > 3.
> > It looks we should change the comment for FindReplTupleInLocalRel() in this
> > patch.
> >
> > /*
> >  * Try to find a tuple received from the publication side (in
> > 'remoteslot') in
> >  * the corresponding local relation using either replica identity index,
> >  * primary key or if needed, sequential scan.
> >  *
> >  * Local tuple, if found, is returned in '*localslot'.
> >  */
> > static bool
> > FindReplTupleInLocalRel(EState *estate, Relation localrel,
> >
> >
> I made a small change, just adding "index". Do you expect a larger change?
> 
> 

I think that's sufficient.

> 
> 
> > 5.
> > +               if (!AttributeNumberIsValid(mainattno))
> > +               {
> > +                       /*
> > +                        * There are two cases to consider. First, if the
> > index is a primary or
> > +                        * unique key, we cannot have any indexes with
> > expressions. So, at this
> > +                        * point we are sure that the index we deal is not
> > these.
> > +                        */
> > +                       Assert(RelationGetReplicaIndex(rel) !=
> > RelationGetRelid(idxrel) &&
> > +                                  RelationGetPrimaryKeyIndex(rel) !=
> > RelationGetRelid(idxrel));
> > +
> > +                       /*
> > +                        * For a non-primary/unique index with an
> > expression, we are sure that
> > +                        * the expression cannot be used for replication
> > index search. The
> > +                        * reason is that we create relevant index paths
> > by providing column
> > +                        * equalities. And, the planner does not pick
> > expression indexes via
> > +                        * column equality restrictions in the query.
> > +                        */
> > +                       continue;
> > +               }
> >
> > Is it possible that it is a usable index with an expression? I think
> > indexes
> > with an expression has been filtered in
> > FilterOutNotSuitablePathsForReplIdentFull(). If it can't be a usable index
> > with
> > an expression, maybe we shouldn't use "continue" here.
> >
> 
> 
> 
> Ok, I think there are some confusing comments in the code, which I updated.
> Also, added one more explicit Assert to make the code a little more
> readable.
> 
> We can support indexes involving expressions but not indexes that are only
> consisting of expressions. FilterOutNotSuitablePathsForReplIdentFull()
> filters out the latter, see IndexOnlyOnExpression().
> 
> So, for example, if we have an index as below, we are skipping the
> expression while building the index scan keys:
> 
> CREATE INDEX people_names ON people (firstname, lastname, (id || '_' ||
> sub_id));
> 
> We can consider removing `continue`, but that'd mean we should also adjust
> the following code-block to handle indexprs. To me, that seems like an edge
> case to implement at this point, given such an index is probably not
> common. Do you think should I try to use the indexprs as well while
> building the scan key?
> 
> I'm mostly trying to keep the complexity small. If you suggest this
> limitation should be lifted, I can give it a shot. I think the limitation I
> leave here is with a single sentence: *The index on the subscriber can only
> use simple column references.  *
> 

Thanks for your explanation. I get it and think it's OK.

> > 6.
> > In the following case, I got a result which is different from HEAD, could
> > you
> > please look into it?
> >
> > -- publisher
> > CREATE TABLE test_replica_id_full (x int);
> > ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
> > CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;
> >
> > -- subscriber
> > CREATE TABLE test_replica_id_full (x int, y int);
> > CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x,y);
> > CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres
> > port=5432' PUBLICATION tap_pub_rep_full;
> >
> > -- publisher
> > INSERT INTO test_replica_id_full VALUES (1);
> > UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;
> >
> > The data in subscriber:
> > on HEAD:
> > postgres=# select * from test_replica_id_full ;
> >  x | y
> > ---+---
> >  2 |
> > (1 row)
> >
> > After applying the patch:
> > postgres=# select * from test_replica_id_full ;
> >  x | y
> > ---+---
> >  1 |
> > (1 row)
> >
> >
> Ops, good catch. it seems we forgot to have:
> 
> skey[scankey_attoff].sk_flags |= SK_SEARCHNULL;
> 
> On head, the index used for this purpose could only be the primary key or
> unique key on NOT NULL columns. Now,  we do allow NULL values, and need to
> search for them. Added that (and your test) to the updated patch.
> 
> As a semi-related note, tuples_equal() decides `true` for (NULL = NULL). I
> have not changed that, and it seems right in this context. Do you see any
> issues with that?
> 
> Also, I realized that the functions in the execReplication.c expect only
> btree indexes. So, I skipped others as well. If that makes sense, I can
> work on a follow-up patch after we can merge this, to remove some of the
> limitations mentioned here.

Thanks for fixing it and updating the patch, I didn't see any issue about it.

Here are some comments on v17 patch.

1. 
-LogicalRepRelMapEntry *
+LogicalRepPartMapEntry *
 logicalrep_partition_open(LogicalRepRelMapEntry *root,
                           Relation partrel, AttrMap *map)
 {

Is there any reason to change the return type of logicalrep_partition_open()? It
seems ok without this change.

2. 

+         * of the relation cache entry (e.g., such as ANALYZE or
+         * CREATE/DROP index on the relation).

"e.g." and "such as" mean the same. I think we remove one of them.

3.
+$node_subscriber->poll_query_until(
+    'postgres', q{select (idx_scan = 2) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for'check subscriber tap_sub_rep_full deletes one row via index";
+

+$node_subscriber->poll_query_until(
+    'postgres', q{select (idx_scan = 1) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idy';}
+) or die "Timed out while waiting for'check subscriber tap_sub_rep_full deletes one row via index";


"for'check" -> "for check"

3.
+$node_subscriber->safe_psql('postgres',
+    "SELECT pg_reload_conf();");
+
+# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN
+# ====================================================================
+
+$node_subscriber->stop('fast');
+$node_publisher->stop('fast');
+

"Testcase start" in the comment should be "Testcase end".

4.
There seems to be a problem in the following scenario, which results in
inconsistent data between publisher and subscriber.

-- publisher
CREATE TABLE test_replica_id_full (x int, y int);
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;

-- subscriber
CREATE TABLE test_replica_id_full (x int, y int);
CREATE UNIQUE INDEX test_replica_id_full_idx ON test_replica_id_full(x);
CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres port=5432' PUBLICATION tap_pub_rep_full;

-- publisher
INSERT INTO test_replica_id_full VALUES (NULL,1);
INSERT INTO test_replica_id_full VALUES (NULL,2);
INSERT INTO test_replica_id_full VALUES (NULL,3);
update test_replica_id_full SET x=1 where y=2;

The data in publisher:
postgres=# select * from test_replica_id_full order by y;
 x | y
---+---
   | 1
 1 | 2
   | 3
(3 rows)

The data in subscriber:
postgres=# select * from test_replica_id_full order by y;
 x | y
---+---
   | 2
 1 | 2
   | 3
(3 rows)

There is no such problem on master branch.


Regards,
Shi yu

Hi,

Thanks for the review!


Here are some comments on v17 patch.

1.
-LogicalRepRelMapEntry *
+LogicalRepPartMapEntry *
 logicalrep_partition_open(LogicalRepRelMapEntry *root,
                                                  Relation partrel, AttrMap *map)
 {

Is there any reason to change the return type of logicalrep_partition_open()? It
seems ok without this change.

I think you are right, I probably needed that in some of my earlier iterations of the patch, but now it seems redundant. Reverted back to the original version.
 

2.

+                * of the relation cache entry (e.g., such as ANALYZE or
+                * CREATE/DROP index on the relation).

"e.g." and "such as" mean the same. I think we remove one of them.

fixed
 

3.
+$node_subscriber->poll_query_until(
+       'postgres', q{select (idx_scan = 2) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for'check subscriber tap_sub_rep_full deletes one row via index";
+

+$node_subscriber->poll_query_until(
+       'postgres', q{select (idx_scan = 1) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idy';}
+) or die "Timed out while waiting for'check subscriber tap_sub_rep_full deletes one row via index";


"for'check" -> "for check"

fixed
 

3.
+$node_subscriber->safe_psql('postgres',
+       "SELECT pg_reload_conf();");
+
+# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN
+# ====================================================================
+
+$node_subscriber->stop('fast');
+$node_publisher->stop('fast');
+

"Testcase start" in the comment should be "Testcase end".


fixed
 
4.
There seems to be a problem in the following scenario, which results in
inconsistent data between publisher and subscriber.

-- publisher
CREATE TABLE test_replica_id_full (x int, y int);
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;

-- subscriber
CREATE TABLE test_replica_id_full (x int, y int);
CREATE UNIQUE INDEX test_replica_id_full_idx ON test_replica_id_full(x);
CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres port=5432' PUBLICATION tap_pub_rep_full;

-- publisher
INSERT INTO test_replica_id_full VALUES (NULL,1);
INSERT INTO test_replica_id_full VALUES (NULL,2);
INSERT INTO test_replica_id_full VALUES (NULL,3);
update test_replica_id_full SET x=1 where y=2;

The data in publisher:
postgres=# select * from test_replica_id_full order by y;
 x | y
---+---
   | 1
 1 | 2
   | 3
(3 rows)

The data in subscriber:
postgres=# select * from test_replica_id_full order by y;
 x | y
---+---
   | 2
 1 | 2
   | 3
(3 rows)

There is no such problem on master branch.


Uff, the second problem reported regarding NULL values for this patch (both by you). First, v18 contains the fix for the problem. It turns out that my idea of treating all unique indexes (pkey, replica identity and unique regular indexes) the same proved to be wrong.  The former two require all the involved columns to have NOT NULL. The latter not. 

This resulted in RelationFindReplTupleByIndex() to skip tuples_equal() for regular unique indexes (e.g., non pkey/replid). Hence, the first NULL value is considered the matching tuple. Instead, we should be doing a full tuple equality check (e.g., tuples_equal). This is what v18 does. Also, add the above scenario as a test.

I think we can probably skip tuples_equal() for unique indexes that consist of only NOT NULL columns. However, that seems like an over-optimization. If you have such a unique index, why not create a primary key anyway?  That's why I don't see much value in compicating the code for that use case.

Thanks for the review & testing. I'll focus more on the NULL values on my own testing as well. Still, I wanted to push my changes so that you can also have a look if possible.

Attach v18. 

Onder KALACI

 
Attachment

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"wangw.fnst@fujitsu.com"
Date:
On Fri, Sep 23, 2022 at 0:14 AM Önder Kalacı <onderkalaci@gmail.com> wrote:
> Hii Wang wei,

Thanks for updating the patch and your reply.

> > 1. In the function GetCheapestReplicaIdentityFullPath.
> > +    if (rel->pathlist == NIL)
> > +    {
> > +        /*
> > +         * A sequential scan could have been dominated by by an index
> > scan
> > +         * during make_one_rel(). We should always have a sequential
> > scan
> > +         * before set_cheapest().
> > +         */
> > +        Path       *seqScanPath = create_seqscan_path(root, rel, NULL,
> > 0);
> > +
> > +        add_path(rel, seqScanPath);
> > +    }
> >
> > This is a question I'm not sure about:
> > Do we need this part to add sequential scan?
> >
> > I think in our case, the sequential scan seems to have been added by the
> > function make_one_rel (see function set_plain_rel_pathlist).
> 
> Yes, the sequential scan is added during make_one_rel.
> 
> > If I am missing something, please let me know. BTW, there is a typo in
> > above comment: `by by`.
> 
> As the comment mentions, the sequential scan could have been dominated &
> removed by index scan, see add_path():
> 
> *We also remove from the rel's pathlist any old paths that are dominated
> *  by new_path --- that is, new_path is cheaper, at least as well ordered,
> *  generates no more rows, requires no outer rels not required by the old
> *  path, and is no less parallel-safe.
> 
> Still, I agree that the comment could be improved, which I pushed.

Oh, sorry I didn't realize this part of the logic. Thanks for sharing this.

And I have another confusion about function GetCheapestReplicaIdentityFullPath:
If rel->pathlist is NIL, could we return NULL directly from this function, and
then set idxoid to InvalidOid in function FindUsableIndexForReplicaIdentityFull
in that case?

===

Here are some comments for test file  032_subscribe_use_index.pl on v18 patch:

1.
```
+# Basic test where the subscriber uses index
+# and only updates 1 row for and deletes
+# 1 other row
```
There seems to be an extra "for" here.

2. Typos for subscription name in the error messages.
tap_sub_rep_full_0 -> tap_sub_rep_full

3. Typo in comments
```
+# use the newly created index (provided that it fullfils the requirements).
```
fullfils -> fulfils

4. Some extra single quotes at the end of the error message ('").
For example:
```
# wait until the index is used on the subscriber
$node_subscriber->poll_query_until(
    'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates 200 rows via index'";
```

5. The column names in the error message appear to be a typo.
```
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on
highcardinality column-1";
 
...
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on
highcardinality column-3";
 
...
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on
highcardinality column-4";
 
```
It seems that we need to do the following change: 'column-3' -> 'column-1' and
'column-4' -> 'column-2'.
Or we could use the column names directly like this: 'column-1' -> 'column a',
'column_3' -> 'column a' and 'column_4' -> 'column b'.

6. DELETE action is missing from the error message.
```
+# 2 rows from first command, another 2 from the second command
+# overall index_on_child_1_a is used 4 times
+$node_subscriber->poll_query_until(
+    'postgres', q{select idx_scan=4 from pg_stat_all_indexes where indexrelname = 'index_on_child_1_a';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates child_1 table'";
```
I think we execute both UPDATE and DELETE for child_1 here. Could we add DELETE
action to this error message?

7. Table name in the error message.
```
# check if the index is used even when the index has NULL values
$node_subscriber->poll_query_until(
    'postgres', q{select idx_scan=2 from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates parent table'";
```
It seems to be "test_replica_id_full" here instead of "parent'".

Regards,
Wang wei

Hi Wang, all


And I have another confusion about function GetCheapestReplicaIdentityFullPath:
If rel->pathlist is NIL, could we return NULL directly from this function, and
then set idxoid to InvalidOid in function FindUsableIndexForReplicaIdentityFull
in that case?


We could, but then we need to move some other checks to some other places. I find the current flow easier to follow, where all happens via cheapest_total_path, which is a natural field for this purpose.

Do you have a strong opinion on this?
 
===

Here are some comments for test file  032_subscribe_use_index.pl on v18 patch:

1.
```
+# Basic test where the subscriber uses index
+# and only updates 1 row for and deletes
+# 1 other row
```
There seems to be an extra "for" here.

 Fixed


2. Typos for subscription name in the error messages.
tap_sub_rep_full_0 -> tap_sub_rep_full


Fixed
 
3. Typo in comments
```
+# use the newly created index (provided that it fullfils the requirements).
```
fullfils -> fulfils


Fixed
 
4. Some extra single quotes at the end of the error message ('").
For example:
```
# wait until the index is used on the subscriber
$node_subscriber->poll_query_until(
        'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates 200 rows via index'";
```

All fixed, thanks

 

5. The column names in the error message appear to be a typo.
```
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on high cardinality column-1";
...
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on high cardinality column-3";
...
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates two rows via index scan with index on high cardinality column-4";
```
It seems that we need to do the following change: 'column-3' -> 'column-1' and
'column-4' -> 'column-2'.
Or we could use the column names directly like this: 'column-1' -> 'column a',
'column_3' -> 'column a' and 'column_4' -> 'column b'.

I think the latter is easier to follow, thanks.
 

6. DELETE action is missing from the error message.
```
+# 2 rows from first command, another 2 from the second command
+# overall index_on_child_1_a is used 4 times
+$node_subscriber->poll_query_until(
+       'postgres', q{select idx_scan=4 from pg_stat_all_indexes where indexrelname = 'index_on_child_1_a';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates child_1 table'";
```
I think we execute both UPDATE and DELETE for child_1 here. Could we add DELETE
action to this error message?


makes sense, added
 
7. Table name in the error message.
```
# check if the index is used even when the index has NULL values
$node_subscriber->poll_query_until(
        'postgres', q{select idx_scan=2 from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates parent table'";
```
It seems to be "test_replica_id_full" here instead of "parent'".
fixed as well. 


Attached v19.

Thanks,
Onder KALACI

Attachment

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"shiy.fnst@fujitsu.com"
Date:
On Wed, Oct 19, 2022 12:05 AM Önder Kalacı <onderkalaci@gmail.com>  wrote:
> 
> Attached v19.
> 

Thanks for updating the patch. Here are some comments on v19.

1.
In execReplication.c:

+    TypeCacheEntry **eq = NULL; /* only used when the index is not unique */

Maybe the comment here should be changed. Now it is used when the index is not
primary key or replica identity index.

2.
+# wait until the index is created
+$node_subscriber->poll_query_until(
+    'postgres', q{select count(*)=1 from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates one row via index";

The message doesn't seem right,  should it be changed to "Timed out while
waiting for creating index test_replica_id_full_idx"?

3.
+# now, ingest more data and create index on column y which has higher cardinality
+# then create an index on column y so that future commands uses the index on column
+$node_publisher->safe_psql('postgres',
+    "INSERT INTO test_replica_id_full SELECT 50, i FROM generate_series(0,3100)i;");

The comment say "create (an) index on column y" twice, maybe it can be changed
to:

now, ingest more data and create index on column y which has higher cardinality,
so that future commands will use the index on column y

4.
+# deletes 200 rows
+$node_publisher->safe_psql('postgres',
+    "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+    'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates 200 rows via index";

It would be better to call wait_for_catchup() after DELETE. (And some other
places in this file.)
Besides, the "updates" in the message should be "deletes".

5.
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+    'postgres', q{select sum(idx_scan)=10 from pg_stat_all_indexes where indexrelname ilike 'users_table_part_%';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates partitioned table";

Maybe we should say "updates partitioned table with index" in this message.

Regards,
Shi yu

Hi Shi yu, all


In execReplication.c:

+       TypeCacheEntry **eq = NULL; /* only used when the index is not unique */

Maybe the comment here should be changed. Now it is used when the index is not
primary key or replica identity index.


makes sense, updated
 
2.
+# wait until the index is created
+$node_subscriber->poll_query_until(
+       'postgres', q{select count(*)=1 from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates one row via index";

The message doesn't seem right,  should it be changed to "Timed out while
waiting for creating index test_replica_id_full_idx"?

yes, updated
 

3.
+# now, ingest more data and create index on column y which has higher cardinality
+# then create an index on column y so that future commands uses the index on column
+$node_publisher->safe_psql('postgres',
+       "INSERT INTO test_replica_id_full SELECT 50, i FROM generate_series(0,3100)i;");

The comment say "create (an) index on column y" twice, maybe it can be changed
to:

now, ingest more data and create index on column y which has higher cardinality,
so that future commands will use the index on column y


fixed
 
4.
+# deletes 200 rows
+$node_publisher->safe_psql('postgres',
+       "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+       'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates 200 rows via index";

It would be better to call wait_for_catchup() after DELETE. (And some other
places in this file.)

Hmm, I cannot follow this easily. 

Why do you think wait_for_catchup() should be called? In general, I tried to follow a pattern where we call poll_query_until() so that we are sure that all the changes are replicated via the index. And then, an additional check with `is($result, ..` such that we also verify the correctness of the data. 

One alternative could be to use wait_for_catchup() and then have multiple  `is($result, ..` to check both pg_stat_all_indexes and the correctness of the data.

One minor advantage I see with the current approach is that every  `is($result, ..` adds one step to the test. So, if I use  `is($result, ..`  for pg_stat_all_indexes queries, then I'd be adding multiple steps for a single test. It felt it is more natural/common to test roughly once with  `is($result, ..` on each test. Or, at least do not add additional ones for pg_stat_all_indexes checks.

 
Besides, the "updates" in the message should be "deletes".


fixed
 
5.
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+       'postgres', q{select sum(idx_scan)=10 from pg_stat_all_indexes where indexrelname ilike 'users_table_part_%';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates partitioned table";

Maybe we should say "updates partitioned table with index" in this message.


Fixed

Attached v20.

Thanks!

Onder KALACI
Attachment
Hi hackers,

I rebased the changes to the current master branch, reflected pg_indent suggestions and also made a few minor style changes.

Also, tested the patch with a few new PG 15 features in combination (such as row/column filter in logical replication, NULLS NOT DISTINCT indexes etc.)  as well somethings that I haven't tested before such as publish_via_partition_root.

I have not added those tests to the regression tests as the existing tests of this patch are already bulky and I don't see a specific reason to add all combinations. Still, if anyone thinks that it is a good idea to add more tests, I can do that. For reference, here are the tests that I did manually: More Replication Index Tests (github.com)

Attached v21.

Onder KALACI
  


Önder Kalacı <onderkalaci@gmail.com>, 21 Eki 2022 Cum, 14:14 tarihinde şunu yazdı:
Hi Shi yu, all


In execReplication.c:

+       TypeCacheEntry **eq = NULL; /* only used when the index is not unique */

Maybe the comment here should be changed. Now it is used when the index is not
primary key or replica identity index.


makes sense, updated
 
2.
+# wait until the index is created
+$node_subscriber->poll_query_until(
+       'postgres', q{select count(*)=1 from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates one row via index";

The message doesn't seem right,  should it be changed to "Timed out while
waiting for creating index test_replica_id_full_idx"?

yes, updated
 

3.
+# now, ingest more data and create index on column y which has higher cardinality
+# then create an index on column y so that future commands uses the index on column
+$node_publisher->safe_psql('postgres',
+       "INSERT INTO test_replica_id_full SELECT 50, i FROM generate_series(0,3100)i;");

The comment say "create (an) index on column y" twice, maybe it can be changed
to:

now, ingest more data and create index on column y which has higher cardinality,
so that future commands will use the index on column y


fixed
 
4.
+# deletes 200 rows
+$node_publisher->safe_psql('postgres',
+       "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+       'postgres', q{select (idx_scan = 200) from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates 200 rows via index";

It would be better to call wait_for_catchup() after DELETE. (And some other
places in this file.)

Hmm, I cannot follow this easily. 

Why do you think wait_for_catchup() should be called? In general, I tried to follow a pattern where we call poll_query_until() so that we are sure that all the changes are replicated via the index. And then, an additional check with `is($result, ..` such that we also verify the correctness of the data. 

One alternative could be to use wait_for_catchup() and then have multiple  `is($result, ..` to check both pg_stat_all_indexes and the correctness of the data.

One minor advantage I see with the current approach is that every  `is($result, ..` adds one step to the test. So, if I use  `is($result, ..`  for pg_stat_all_indexes queries, then I'd be adding multiple steps for a single test. It felt it is more natural/common to test roughly once with  `is($result, ..` on each test. Or, at least do not add additional ones for pg_stat_all_indexes checks.

 
Besides, the "updates" in the message should be "deletes".


fixed
 
5.
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+       'postgres', q{select sum(idx_scan)=10 from pg_stat_all_indexes where indexrelname ilike 'users_table_part_%';}
+) or die "Timed out while waiting for check subscriber tap_sub_rep_full updates partitioned table";

Maybe we should say "updates partitioned table with index" in this message.


Fixed

Attached v20.

Thanks!

Onder KALACI
Attachment
Hi,

On 2022-11-11 17:16:36 +0100, Önder Kalacı wrote:
> I rebased the changes to the current master branch, reflected pg_indent
> suggestions and also made a few minor style changes.

Needs another rebase, I think:

https://cirrus-ci.com/task/5592444637544448

[05:44:22.102] FAILED: src/backend/postgres_lib.a.p/replication_logical_worker.c.o 
[05:44:22.102] ccache cc -Isrc/backend/postgres_lib.a.p -Isrc/include -I../src/include -Isrc/include/storage
-Isrc/include/utils-Isrc/include/catalog -Isrc/include/nodes -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64
-Wall-Winvalid-pch -g -fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE -Wmissing-prototypes
-Wpointer-arith-Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
-Wshadow=compatible-local-Wformat-security -Wdeclaration-after-statement -Wno-format-truncation
-Wno-stringop-truncation-fPIC -pthread -DBUILDING_DLL -MD -MQ
src/backend/postgres_lib.a.p/replication_logical_worker.c.o-MF
src/backend/postgres_lib.a.p/replication_logical_worker.c.o.d-o
src/backend/postgres_lib.a.p/replication_logical_worker.c.o-c ../src/backend/replication/logical/worker.c
 
[05:44:22.102] ../src/backend/replication/logical/worker.c: In function ‘get_usable_indexoid’:
[05:44:22.102] ../src/backend/replication/logical/worker.c:2101:36: error: ‘ResultRelInfo’ has no member named
‘ri_RootToPartitionMap’
[05:44:22.102]  2101 |   TupleConversionMap *map = relinfo->ri_RootToPartitionMap;
[05:44:22.102]       |                                    ^~

Greetings,

Andres Freund



Hi, 

Thanks for the heads-up.


Needs another rebase, I think:

https://cirrus-ci.com/task/5592444637544448

[05:44:22.102] FAILED: src/backend/postgres_lib.a.p/replication_logical_worker.c.o
[05:44:22.102] ccache cc -Isrc/backend/postgres_lib.a.p -Isrc/include -I../src/include -Isrc/include/storage -Isrc/include/utils -Isrc/include/catalog -Isrc/include/nodes -fdiagnostics-color=always -pipe -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -g -fno-strict-aliasing -fwrapv -fexcess-precision=standard -D_GNU_SOURCE -Wmissing-prototypes -Wpointer-arith -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security -Wdeclaration-after-statement -Wno-format-truncation -Wno-stringop-truncation -fPIC -pthread -DBUILDING_DLL -MD -MQ src/backend/postgres_lib.a.p/replication_logical_worker.c.o -MF src/backend/postgres_lib.a.p/replication_logical_worker.c.o.d -o src/backend/postgres_lib.a.p/replication_logical_worker.c.o -c ../src/backend/replication/logical/worker.c
[05:44:22.102] ../src/backend/replication/logical/worker.c: In function ‘get_usable_indexoid’:
[05:44:22.102] ../src/backend/replication/logical/worker.c:2101:36: error: ‘ResultRelInfo’ has no member named ‘ri_RootToPartitionMap’
[05:44:22.102]  2101 |   TupleConversionMap *map = relinfo->ri_RootToPartitionMap;
[05:44:22.102]       |                                    ^~


Yes, it seems the commit (fb958b5da86da69651f6fb9f540c2cfb1346cdc5) broke the build and commit(a61b1f74823c9c4f79c95226a461f1e7a367764b) broke the tests. But the fixes were trivial. All tests pass again.  

Attached v22.

Onder KALACI
Attachment
=?UTF-8?B?w5ZuZGVyIEthbGFjxLE=?= <onderkalaci@gmail.com> writes:
> Attached v22.

I took a very brief look through this.  I'm not too pleased with
this whole line of development TBH.  It seems to me that the core
design of execReplication.c and related code is "let's build our
own half-baked executor and much-less-than-half-baked planner,
because XXX".  (I'm not too sure what XXX was, really, but apparently
somebody managed to convince people that that is a sane and
maintainable design.)  Now this patch has decided that it *will*
use the real planner, or at least portions of it in some cases.
If we're going to do that ISTM we ought to replace all the existing
not-really-a-planner logic, but this has not done that; instead
we have a large net addition to the already very duplicative
replication code, with weird restrictions because it doesn't want
to make changes to the half-baked executor.

I think we should either live within the constraints set by this
overarching design, or else nuke execReplication.c from orbit and
start using the real planner and executor.  Perhaps the foreign
key enforcement mechanisms could be a model --- although if you
don't want to buy into using SPI as well, you probably should look
at Amit L's work at [1].

Also ... maybe I am missing something, but is REPLICA IDENTITY FULL
sanely defined in the first place?  It looks to me that
RelationFindReplTupleSeq assumes without proof that there is a unique
full-tuple match, but that is only reasonable to assume if there is at
least one unique index (and maybe not even then, if nulls are involved).
If there is a unique index, why can't that be chosen as replica identity?
If there isn't, what semantics are we actually providing?

What I'm thinking about is that maybe REPLICA IDENTITY FULL should be
defined as "the subscriber can pick any unique index to match on,
and is allowed to fail if the table has none".  Or if "fail" is a bridge
too far for you, we could fall back to the existing seqscan logic.
But thumbing through the existing indexes to find a non-expression unique
index wouldn't require invoking the full planner.  Any candidate index
would result in a plan estimated to fetch just one row, so there aren't
likely to be serious cost differences.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/CA+HiwqG5e8pk8s7+7zhr1Nc_PGyhEdM5f=pHkMOdK1RYWXfJsg@mail.gmail.com



Hi,

Thank you for the useful comments!


I took a very brief look through this.  I'm not too pleased with
this whole line of development TBH.  It seems to me that the core
design of execReplication.c and related code is "let's build our
own half-baked executor and much-less-than-half-baked planner,
because XXX".  (I'm not too sure what XXX was, really, but apparently
somebody managed to convince people that that is a sane and
maintainable design.) 

This provided me with a broad perspective for the whole execReplication.c. 
Before your comment, I have not thought about why there is a specific
logic for the execution of logical replication.

I tried to read the initial commit that adds execReplication.c (665d1fad99e7b11678b0d5fa24d2898424243cd6)
and the main relevant mail thread (PostgreSQL: Re: Logical Replication WIP). But, I couldn't find
any references on this decision. Maybe I'm missing something? 

Regarding planner, as far as I can speculate, before my patch, there is probably no need for any planner infrastructure.
The reason seems that the logical replication either needs a sequential scan for REPLICA IDENTITY FULL
or an index scan for the primary key / unique index.  I'm not suggesting that we shouldn't use planner at all,
just trying to understand the design choices that have been made earlier. 


Now this patch has decided that it *will*
use the real planner, or at least portions of it in some cases.
If we're going to do that ISTM we ought to replace all the existing
not-really-a-planner logic, but this has not done that; instead
we have a large net addition to the already very duplicative
replication code, with weird restrictions because it doesn't want
to make changes to the half-baked executor.

That sounds like a one good perspective on the restrictions that this patch adds.
From my perspective, I wanted to fit into the existing execReplication.c, which only 
works for primary keys / unique keys. And, if you look closely, the restrictions I suggest
are actually the same/similar restrictions with REPLICA IDENTITY ... USING INDEX. 
I hope/assume this is no surprise for you and not too hard to explain to the users.
 

I think we should either live within the constraints set by this
overarching design, or else nuke execReplication.c from orbit and
start using the real planner and executor.  Perhaps the foreign
key enforcement mechanisms could be a model --- although if you
don't want to buy into using SPI as well, you probably should look
at Amit L's work at [1].

This sounds like a good long term plan to me. Are you also suggesting to do that
before this patch?

I think that such a change is a non-trivial / XL project, which could likely not be easily
achievable by myself in a reasonable time frame.
 

Also ... maybe I am missing something, but is REPLICA IDENTITY FULL
sanely defined in the first place?  It looks to me that
RelationFindReplTupleSeq assumes without proof that there is a unique
full-tuple match, but that is only reasonable to assume if there is at
least one unique index (and maybe not even then, if nulls are involved).

In general, RelationFindReplTupleSeq is ok not to match any tuples. So, I'm not sure
if uniqueness is required?

Even if there are multiple matches, RelationFindReplTupleSeq does only one change at
a time. My understanding is that if there are multiple matches on the source, they are
generated as different messages, and each message triggers RelationFindReplTupleSeq.

 
If there is a unique index, why can't that be chosen as replica identity?
If there isn't, what semantics are we actually providing?

I'm not sure I can fully follow this question. In this patch, I'm trying to allow non-unique
indexes to be used in the subscription. And, the target could have multiple indexes.

So, the semantics is that we automatically allow users to be able to use non-unique
indexes on the subscription side even if the replica identity is full on the source.

The reason (a) we use planner (b) not ask users which index to use, is that
it'd be very inconvenient for any user to pick the indexes among multiple
indexes on the subscription. 

If there is a unique index, the expectation is that the user would pick 
REPLICA IDENTITY .. USING INDEX or just make it the primary key. 
In those cases, this patch would not interfere with the existing logic. 


What I'm thinking about is that maybe REPLICA IDENTITY FULL should be
defined as "the subscriber can pick any unique index to match on,
and is allowed to fail if the table has none".  Or if "fail" is a bridge
too far for you, we could fall back to the existing seqscan logic.
But thumbing through the existing indexes to find a non-expression unique
index wouldn't require invoking the full planner.  Any candidate index
would result in a plan estimated to fetch just one row, so there aren't
likely to be serious cost differences.

Again, maybe I'm missing something in your comments, but this patch deals with
non-unique indexes. That's why we rely on the planner to pick the optimal index
among what we have on the subscription. (In my first iteration of this patch, 
I decided to pick the index without planner, but than it seems much nicer to rely
on the planner for obvious reasons of picking the right index) 

For example, if you have a unique index on the subscription, the planner already 
picks that. But, still, if you could afford to have unique index, you should better
use REPLICA IDENTITY .. USING INDEX or just primary key. I gave this example
for explaining one edge case that many devs could think of. 

Lastly, any (auto)-ANALYZE on the target table re-calculates the candidate index on 
the subscription. So, hopefully we are not too behind with the statistics for a long time, 
and have a good index to use.

Thanks,
Onder KALACI


Hi,

On 2023-01-07 13:50:04 -0500, Tom Lane wrote:
> I think we should either live within the constraints set by this
> overarching design, or else nuke execReplication.c from orbit and
> start using the real planner and executor.  Perhaps the foreign
> key enforcement mechanisms could be a model --- although if you
> don't want to buy into using SPI as well, you probably should look
> at Amit L's work at [1].

I don't think using the full executor for every change is feasible from an
overhead perspective. But it might make sense to bail out to using the full
executor in a bunch of non-fastpath paths.

I think this is basically similar to COPY not using the full executor.

But that doesn't mean that all of this has to be open coded in
execReplication.c. Abstracting pieces so that COPY, logical rep and perhaps
even nodeModifyTable.c can share code makes sense.


> Also ... maybe I am missing something, but is REPLICA IDENTITY FULL
> sanely defined in the first place?  It looks to me that
> RelationFindReplTupleSeq assumes without proof that there is a unique
> full-tuple match, but that is only reasonable to assume if there is at
> least one unique index (and maybe not even then, if nulls are involved).

If the table definition match between publisher and standby, it doesn't matter
which tuple is updated, if all columns are used to match. Since there's
nothing distinguishing two rows with all columns being equal, it doesn't
matter which we update.

Greetings,

Andres Freund



Andres Freund <andres@anarazel.de> writes:
> On 2023-01-07 13:50:04 -0500, Tom Lane wrote:
>> Also ... maybe I am missing something, but is REPLICA IDENTITY FULL
>> sanely defined in the first place?  It looks to me that
>> RelationFindReplTupleSeq assumes without proof that there is a unique
>> full-tuple match, but that is only reasonable to assume if there is at
>> least one unique index (and maybe not even then, if nulls are involved).

> If the table definition match between publisher and standby, it doesn't matter
> which tuple is updated, if all columns are used to match. Since there's
> nothing distinguishing two rows with all columns being equal, it doesn't
> matter which we update.

Yeah, but the point here is precisely that they might *not* match;
for example there could be extra columns in the subscriber's table.
This may be largely a documentation problem, though --- I think my
beef is mainly that there's nothing in our docs explaining the
semantic pitfalls of FULL, we only say "it's slow".

Anyway, to get back to the point at hand: if we do have a REPLICA IDENTITY
FULL situation then we can make use of any unique index over a subset of
the transmitted columns, and if there's more than one candidate index
it's unlikely to matter which one we pick.  Given your comment I guess
we have to also compare the non-indexed columns, so we can't completely
convert the FULL case to the straight index case.  But still it doesn't
seem to me to be appropriate to use the planner to find a suitable index.

            regards, tom lane



On Mon, Jan 9, 2023 at 11:37 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Anyway, to get back to the point at hand: if we do have a REPLICA IDENTITY
> FULL situation then we can make use of any unique index over a subset of
> the transmitted columns, and if there's more than one candidate index
> it's unlikely to matter which one we pick.  Given your comment I guess
> we have to also compare the non-indexed columns, so we can't completely
> convert the FULL case to the straight index case.  But still it doesn't
> seem to me to be appropriate to use the planner to find a suitable index.

The main purpose of REPLICA IDENTITY FULL seems to be to enable logical replication for tables that may have duplicates and therefore cannot have a unique index that can be used as a replica identity.

For those tables the user currently needs to choose between update/delete erroring (bad) or doing a sequential scan on the apply side per updated/deleted tuple (often worse). This issue currently prevents a lot of automation around logical replication, because users need to decide whether and when they are willing to accept partial downtime. The current REPLICA IDENTITY FULL implementation can work in some cases, but applying the effects of an update that affected a million rows through a million sequential scans will certainly not end well.

This patch solves the problem by allowing the apply side to pick a non-unique index to find any matching tuple instead of always using a sequential scan, but that either requires some planning/costing logic to avoid picking a lousy index, or allowing the user to manually preselect the index to use, which is less convenient.

An alternative might be to construct prepared statements and using the regular planner. If applied uniformly that would also be nice from the extensibility point-of-view, since there is currently no way for an extension to augment the apply side. However, I assume the current approach of using low-level functions in the common case was chosen for performance reasons.

I suppose the options are:
1. use regular planner uniformly
2. use regular planner only when there's no replica identity (or configurable?)
3. only use low-level functions
4. keep using sequential scans for every single updated row
5. introduce a hidden logical row identifier in the heap that is guaranteed unique within a table and can be used as a replica identity when no unique index exists

Any thoughts?

cheers,
Marco
Hi Marco, Tom,

> But still it doesn't seem to me to be appropriate to use the planner to find a suitable index.

As Marco noted, here we are trying to pick an index that is non-unique. We could pick the index based on information extracted from pg_index (or such), but then, it'd be a premature selection. Before sending the patch to pgsql-hackers, I initially tried to find a suitable one with such an approach. 

But then, I still ended up using costing functions (and some other low level functions). Overall, it felt like the planner is the module that makes this decision best. Why would we try to invent another immature way of doing this? With that reasoning, I ended up using the related planner functions directly.

However, I assume the current approach of using low-level functions in the common case was chosen for performance reasons.

That's partially the reason. If you look at the patch, we use the planner (or the low level functions) infrequently. It is only called when the logical replication relation cache is rebuilt. As far as I can see, that happens with (auto) ANALYZE or DDLs etc. I expect these are infrequent operations. Still, I wanted to make sure we do not create too much overhead even if there are frequent invalidations.

The main reason for using the low level functions over the planner itself is to have some more control over the decision. For example, due to the execution limitations, we currently cannot allow an index that consists of only expressions (similar to pkey restriction). With the current approach, we can easily filter those out.

Also, another minor reason is that, if we use planner, we'd get a PlannedStmt back. It also felt weird to check back the index used from a PlannedStmt. In the current patch, we iterate over Paths, which seems more intuitive to me.


I suppose the options are:
1. use regular planner uniformly
2. use regular planner only when there's no replica identity (or configurable?)
3. only use low-level functions
4. keep using sequential scans for every single updated row
5. introduce a hidden logical row identifier in the heap that is guaranteed unique within a table and can be used as a replica identity when no unique index exists
 
One other option I considered was to ask the index explicitly on the subscriber side from the user when REPLICA IDENTITY is FULL. But, it is a pretty hard choice for any user, even a planner sometimes fails to pick the right index :)  Also, it is probably controversial to change any of the APIs for this purpose?

I'd be happy to hear from more experienced hackers on the trade-offs for the above, and I'd be open to work on that if there is a clear winner. For me (3) is a decent solution for the problem.

Thanks,
Onder

On Fri, Jan 27, 2023 at 6:32 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>>
>> I suppose the options are:
>> 1. use regular planner uniformly
>> 2. use regular planner only when there's no replica identity (or configurable?)
>> 3. only use low-level functions
>> 4. keep using sequential scans for every single updated row
>> 5. introduce a hidden logical row identifier in the heap that is guaranteed unique within a table and can be used as
areplica identity when no unique index exists 
>
>
> One other option I considered was to ask the index explicitly on the subscriber side from the user when REPLICA
IDENTITYis FULL. But, it is a pretty hard choice for any user, even a planner sometimes fails to pick the right index
:) Also, it is probably controversial to change any of the APIs for this purpose? 
>

I agree that it won't be a very convenient option for the user but how
about along with asking for an index from the user (when the user
didn't provide an index), we also allow to make use of any unique
index over a subset of the transmitted columns, and if there's more
than one candidate index pick any one. Additionally, we can allow
disabling the use of an index scan for this particular case. If we are
too worried about API change for allowing users to specify the index
then we can do that later or as a separate patch.

> I'd be happy to hear from more experienced hackers on the trade-offs for the above, and I'd be open to work on that
ifthere is a clear winner. For me (3) is a decent solution for the problem. 
>

From the discussion above it is not very clear that adding maintenance
costs in this area is worth it even though that can give better
results as far as this feature is concerned.

--
With Regards,
Amit Kapila.



Hi all,

Thanks for the feedback!

I agree that it won't be a very convenient option for the user but how
about along with asking for an index from the user (when the user
didn't provide an index), we also allow to make use of any unique
index over a subset of the transmitted columns,

Tbh, I cannot follow why you would use REPLICA IDENTITY FULL if you can already
create a unique index? Aren't you supposed to use REPLICA IDENTITY .. USING INDEX
in that case (if not simply pkey)?

That seems like a potential expansion of this patch, but I don't consider it as essential. Given it 
is hard to get even small commits in, I'd rather wait to see what you think before doing such
a change.
 
and if there's more
than one candidate index pick any one. Additionally, we can allow
disabling the use of an index scan for this particular case. If we are
too worried about API change for allowing users to specify the index
then we can do that later or as a separate patch.


On v23, I dropped the planner support for picking the index. Instead, it simply
iterates over the indexes and picks the first one that is suitable. 

I'm currently thinking on how to enable users to override this decision.
One option I'm leaning towards is to add a syntax like the following:

ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...

Though, that should probably be a seperate patch. I'm going to work
on that, but still wanted to share v23 given picking the index sounds
complementary, not strictly required at this point.

Thanks,
Onder


Attachment
On Thu, Feb 2, 2023 at 2:03 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>>
>> and if there's more
>> than one candidate index pick any one. Additionally, we can allow
>> disabling the use of an index scan for this particular case. If we are
>> too worried about API change for allowing users to specify the index
>> then we can do that later or as a separate patch.
>>
>
> On v23, I dropped the planner support for picking the index. Instead, it simply
> iterates over the indexes and picks the first one that is suitable.
>
> I'm currently thinking on how to enable users to override this decision.
> One option I'm leaning towards is to add a syntax like the following:
>
> ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...
>
> Though, that should probably be a seperate patch. I'm going to work
> on that, but still wanted to share v23 given picking the index sounds
> complementary, not strictly required at this point.
>

I agree that it could be a separate patch. However, do you think we
need some way to disable picking the index scan? This is to avoid
cases where sequence scan could be better or do we think there won't
exist such a case?

--
With Regards,
Amit Kapila.



RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"shiy.fnst@fujitsu.com"
Date:
On Thu, Feb 2, 2023 4:34 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>>
>> and if there's more
>> than one candidate index pick any one. Additionally, we can allow
>> disabling the use of an index scan for this particular case. If we are
>> too worried about API change for allowing users to specify the index
>> then we can do that later or as a separate patch.
>>
>
> On v23, I dropped the planner support for picking the index. Instead, it simply
> iterates over the indexes and picks the first one that is suitable.
>
> I'm currently thinking on how to enable users to override this decision.
> One option I'm leaning towards is to add a syntax like the following:
>
> ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...
>
> Though, that should probably be a seperate patch. I'm going to work
> on that, but still wanted to share v23 given picking the index sounds
> complementary, not strictly required at this point.
>

Thanks for your patch. Here are some comments.

1.
I noticed that get_usable_indexoid() is called in apply_handle_update_internal()
and apply_handle_delete_internal() to get the usable index. Could usableIndexOid
be a parameter of these two functions? Because we have got the
LogicalRepRelMapEntry when calling them and if we do so, we can get
usableIndexOid without get_usable_indexoid(). Otherwise for partitioned tables,
logicalrep_partition_open() is called in get_usable_indexoid() and searching
the entry via hash_search() will increase cost.

2.
+             * This attribute is an expression, and
+             * SuitableIndexPathsForRepIdentFull() was called earlier when the
+             * index for subscriber was selected. There, the indexes
+             * comprising *only* expressions have already been eliminated.

The comment looks need to be updated:
SuitableIndexPathsForRepIdentFull
->
FindUsableIndexForReplicaIdentityFull

3.

     /* Build scankey for every attribute in the index. */
-    for (attoff = 0; attoff < IndexRelationGetNumberOfKeyAttributes(idxrel); attoff++)
+    for (index_attoff = 0; index_attoff < IndexRelationGetNumberOfKeyAttributes(idxrel);
+         index_attoff++)
     {

Should the comment be changed? Because we skip the attributes that are expressions.

4.
+            Assert(RelationGetReplicaIndex(rel) != RelationGetRelid(idxrel) &&
+                   RelationGetPrimaryKeyIndex(rel) != RelationGetRelid(idxrel));

Maybe we can call the new function idxIsRelationIdentityOrPK()?

Regards,
Shi Yu

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"shiy.fnst@fujitsu.com"
Date:
On Mon, Feb 13, 2023 7:01 PM shiy.fnst@fujitsu.com <shiy.fnst@fujitsu.com> wrote:
> 
> On Thu, Feb 2, 2023 4:34 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> >
> >>
> >> and if there's more
> >> than one candidate index pick any one. Additionally, we can allow
> >> disabling the use of an index scan for this particular case. If we are
> >> too worried about API change for allowing users to specify the index
> >> then we can do that later or as a separate patch.
> >>
> >
> > On v23, I dropped the planner support for picking the index. Instead, it simply
> > iterates over the indexes and picks the first one that is suitable.
> >
> > I'm currently thinking on how to enable users to override this decision.
> > One option I'm leaning towards is to add a syntax like the following:
> >
> > ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...
> >
> > Though, that should probably be a seperate patch. I'm going to work
> > on that, but still wanted to share v23 given picking the index sounds
> > complementary, not strictly required at this point.
> >
> 
> Thanks for your patch. Here are some comments.
> 

Hi,

Here are some comments on the test cases.

1. in test case "SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP INDEX"
+# now, ingest more data and create index on column y which has higher cardinality
+# so that the future commands use the index on column y
+$node_publisher->safe_psql('postgres',
+    "INSERT INTO test_replica_id_full SELECT 50, i FROM generate_series(0,3100)i;");
+$node_subscriber->safe_psql('postgres',
+    "CREATE INDEX test_replica_id_full_idy ON test_replica_id_full(y)");

We don't pick the cheapest index in the current patch, so should we modify this
part of the test?

BTW, the following comment in FindLogicalRepUsableIndex() need to be changed,
too.

+         * We are looking for one more opportunity for using an index. If
+         * there are any indexes defined on the local relation, try to pick
+         * the cheapest index.


2. Is there any reasons why we need the test case "SUBSCRIPTION USES INDEX WITH
DROPPED COLUMNS"? Has there been a problem related to dropped columns before?

3. in test case "SUBSCRIPTION USES INDEX ON PARTITIONED TABLES"
+# deletes rows and moves between partitions
+$node_publisher->safe_psql('postgres',
+    "DELETE FROM users_table_part WHERE user_id = 1 and value_1 = 1;");
+$node_publisher->safe_psql('postgres',
+    "DELETE FROM users_table_part WHERE user_id = 12 and value_1 = 12;");

"moves between partitions" in the comment seems wrong.

4. in test case "SUBSCRIPTION DOES NOT USE INDEXES WITH ONLY EXPRESSIONS"
+# update 2 rows
+$node_publisher->safe_psql('postgres',
+    "UPDATE people SET firstname = 'Nan' WHERE firstname = 'first_name_1';");
+$node_publisher->safe_psql('postgres',
+    "UPDATE people SET firstname = 'Nan' WHERE firstname = 'first_name_2' AND lastname = 'last_name_2';");
+
+# make sure the index is not used on the subscriber
+$result = $node_subscriber->safe_psql('postgres',
+    "select idx_scan from pg_stat_all_indexes where indexrelname = 'people_names'");
+is($result, qq(0), 'ensure subscriber tap_sub_rep_full updates two rows via seq. scan with index on expressions');
+

I think it would be better to call wait_for_catchup() before the check because
we want to check the index is NOT used. Otherwise the check may pass because the
rows have not yet been updated on subscriber.

5. in test case "SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN"
+# show that index is not used even when enable_indexscan=false
+$result = $node_subscriber->safe_psql('postgres',
+    "select idx_scan from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx'");
+is($result, qq(0), 'ensure subscriber has not used index with enable_indexscan=false');

Should we remove the word "even" in the comment?

6. 
In each test case we re-create publications, subscriptions, and tables. Could we
create only one publication and one subscription at the beginning, and use them
in all test cases? I think this can save some time running the test file.

Regards,
Shi Yu

Here are some review comments for patch v23.

======
General

1.
IIUC the previous logic for checking "cost" comparisons and selecting
the "cheapest" strategy is no longer present in the latest patch.

In that case, I think there are some leftover stale comments that need
changing. For example,

1a. Commit message:
"let the planner sub-modules compare the costs of index versus
sequential scan and choose the cheapest."

~

1b. Commit message:
"Finally, pick the cheapest `Path` among."

~

1c. FindLogicalRepUsableIndex function:
+ * We are looking for one more opportunity for using an index. If
+ * there are any indexes defined on the local relation, try to pick
+ * the cheapest index.

======
doc/src/sgml/logical-replication.sgml

If replica identity "full" is used, indexes can be used on the
subscriber side for seaching the rows. The index should be btree,
non-partial and have at least one column reference (e.g., should not
consists of only expressions). If there are no suitable indexes, the
search on the subscriber side is very inefficient and should only be
used as a fallback if no other solution is possible

2a.
Fixed typo "seaching", and minor rewording.

SUGGESTION
When replica identity "full" is specified, indexes can be used on the
subscriber side for searching the rows. These indexes should be btree,
non-partial and have at least one column reference (e.g., should not
consist of only expressions). If there are no such suitable indexes,
the search on the subscriber side can be very inefficient, therefore
replica identity "full" should only be used as a fallback if no other
solution is possible.

~

2b.
I know you are just following some existing text here, but IMO this
should probably refer to replica identity <literal>FULL</literal>
instead of "full".

======
src/backend/executor/execReplication.c

3. IdxIsRelationIdentityOrPK

+/*
+ * Given a relation and OID of an index, returns true if
+ * the index is relation's primary key's index or
+ * relaton's replica identity index.
+ *
+ * Returns false otherwise.
+ */
+static bool
+IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
+{
+ Assert(OidIsValid(idxoid));
+
+ if (RelationGetReplicaIndex(rel) == idxoid ||
+ RelationGetPrimaryKeyIndex(rel) == idxoid)
+ return true;
+
+ return false;

3a.
Comment typo "relaton"

~

3b.
Code could be written using single like below if you wish (but see #2c)

return RelationGetReplicaIndex(rel) == idxoid ||
RelationGetPrimaryKeyIndex(rel) == idxoid;

~

3c.
Actually, RelationGetReplicaIndex and RelationGetPrimaryKeyIndex
implementations are very similar so it seemed inefficient to be
calling both of them. IMO it might be better to just make a new
relcache function IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid).
This implementation will be similar to those others, but now you need
only to call the workhorse RelationGetIndexList *one* time.

~~~

4. RelationFindReplTupleByIndex

  bool found;
+ TypeCacheEntry **eq = NULL; /* only used when the index is not repl. ident
+ * or pkey */
+ bool idxIsRelationIdentityOrPK;


If you change the comment to say "RI" instead of "repl. Ident" then it
can all fit on one line, which would be an improvement.


======
src/backend/replication/logical/relation.c

5.
 #include "replication/logicalrelation.h"
 #include "replication/worker_internal.h"
+#include "optimizer/cost.h"
 #include "utils/inval.h"

Can that #include be added in alphabetical order like the others or not?

~~~

6. logicalrep_partition_open

+ /*
+ * Finding a usable index is an infrequent task. It occurs when an
+ * operation is first performed on the relation, or after invalidation of
+ * of the relation cache entry (such as ANALYZE or CREATE/DROP index on
+ * the relation).
+ */
+ entry->usableIndexOid = FindLogicalRepUsableIndex(partrel, remoterel);
+

Typo "of of the relation"

~~~

7. FindUsableIndexForReplicaIdentityFull

+static Oid
+FindUsableIndexForReplicaIdentityFull(Relation localrel)
+{
+ MemoryContext usableIndexContext;
+ MemoryContext oldctx;
+ Oid usableIndex;
+ Oid idxoid;
+ List *indexlist;
+ ListCell   *lc;
+ Relation        indexRelation;
+ IndexInfo  *indexInfo;
+ bool is_btree;
+ bool is_partial;
+ bool is_only_on_expression;

It looks like some of these variables are only used within the scope
of the foreach loop, so I think that is where they should be declared.

~~~

8.
+ usableIndex = InvalidOid;

Might as well do that assignment at the declaration.

~~~

9. FindLogicalRepUsableIndex

+ /*
+ * Simple case, we already have a primary key or a replica identity index.
+ *
+ * Note that we do not use index scans below when enable_indexscan is
+ * false. Allowing primary key or replica identity even when index scan is
+ * disabled is the legacy behaviour. So we hesitate to move the below
+ * enable_indexscan check to be done earlier in this function.
+ */
+ idxoid = GetRelationIdentityOrPK(localrel);
+ if (OidIsValid(idxoid))
+ return idxoid;
+
+ /* If index scans are disabled, use a sequential scan */
+ if (!enable_indexscan)
+ return InvalidOid;

~

IMO that "Note" really belongs with the if (!enable)indexscan) more like this:

SUGGESTION
/*
* Simple case, we already have a primary key or a replica identity index.
*/
idxoid = GetRelationIdentityOrPK(localrel);
if (OidIsValid(idxoid))
return idxoid;

/*
* If index scans are disabled, use a sequential scan.
*
* Note we hesitate to move this check to earlier in this function
* because allowing primary key or replica identity even when index scan
* is disabled is the legacy behaviour.
*/
if (!enable_indexscan)
return InvalidOid;

======
src/backend/replication/logical/worker.c

10. get_usable_indexoid

+/*
+ * Decide whether we can pick an index for the relinfo (e.g., the relation)
+ * we're actually deleting/updating from. If it is a child partition of
+ * edata->targetRelInfo, find the index on the partition.
+ *
+ * Note that if the corresponding relmapentry has invalid usableIndexOid,
+ * the function returns InvalidOid.
+ */

"(e.g., the relation)" --> "(i.e. the relation)"

------
Kind Regards,
Peter Smith.
Fujitsu Australia



RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"shiy.fnst@fujitsu.com"
Date:
On Sat, Feb 4, 2023 7:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Thu, Feb 2, 2023 at 2:03 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> >
> >>
> >> and if there's more
> >> than one candidate index pick any one. Additionally, we can allow
> >> disabling the use of an index scan for this particular case. If we are
> >> too worried about API change for allowing users to specify the index
> >> then we can do that later or as a separate patch.
> >>
> >
> > On v23, I dropped the planner support for picking the index. Instead, it simply
> > iterates over the indexes and picks the first one that is suitable.
> >
> > I'm currently thinking on how to enable users to override this decision.
> > One option I'm leaning towards is to add a syntax like the following:
> >
> > ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...
> >
> > Though, that should probably be a seperate patch. I'm going to work
> > on that, but still wanted to share v23 given picking the index sounds
> > complementary, not strictly required at this point.
> >
> 
> I agree that it could be a separate patch. However, do you think we
> need some way to disable picking the index scan? This is to avoid
> cases where sequence scan could be better or do we think there won't
> exist such a case?
> 

I think such a case exists. I tried the following cases based on v23 patch.

# Step 1.
Create publication, subscription and tables.
-- on publisher
create table tbl (a int);
alter table tbl replica identity full;
create publication pub for table tbl;

-- on subscriber
create table tbl (a int);
create index idx_a on tbl(a);
create subscription sub connection 'dbname=postgres port=5432' publication pub;

# Step 2.
Setup synchronous replication.

# Step 3.
Execute SQL query on publisher.

-- case 1 (All values are duplicated)
truncate tbl;
insert into tbl select 1 from generate_series(0,10000)i;
update tbl set a=a+1;

-- case 2
truncate tbl;
insert into tbl select i%3 from generate_series(0,10000)i;
update tbl set a=a+1;

-- case 3
truncate tbl;
insert into tbl select i%5 from generate_series(0,10000)i;
update tbl set a=a+1;

-- case 4
truncate tbl;
insert into tbl select i%10 from generate_series(0,10000)i;
update tbl set a=a+1;

-- case 5
truncate tbl;
insert into tbl select i%100 from generate_series(0,10000)i;
update tbl set a=a+1;

-- case 6
truncate tbl;
insert into tbl select i%1000 from generate_series(0,10000)i;
update tbl set a=a+1;

-- case 7 (No duplicate value)
truncate tbl;
insert into tbl select i from generate_series(0,10000)i;
update tbl set a=a+1;

# Result
The time executing update (the average of 3 runs is taken, the unit is
milliseconds):

+--------+---------+---------+
|        | patched |  master |
+--------+---------+---------+
| case 1 | 3933.68 | 1298.32 |
| case 2 | 1803.46 | 1294.42 |
| case 3 | 1380.82 | 1299.90 |
| case 4 | 1042.60 | 1300.20 |
| case 5 |  691.69 | 1297.51 |
| case 6 |  578.50 | 1300.69 |
| case 7 |  566.45 | 1302.17 |
+--------+---------+---------+

In case 1~3, there's an overhead after applying the patch. In other cases, the
patch improved the performance. As more duplicate values, the greater the
overhead after applying the patch.

Regards,
Shi Yu


On Wed, Feb 15, 2023 at 9:23 AM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> On Sat, Feb 4, 2023 7:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Feb 2, 2023 at 2:03 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> > >
> > > On v23, I dropped the planner support for picking the index. Instead, it simply
> > > iterates over the indexes and picks the first one that is suitable.
> > >
> > > I'm currently thinking on how to enable users to override this decision.
> > > One option I'm leaning towards is to add a syntax like the following:
> > >
> > > ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...
> > >
> > > Though, that should probably be a seperate patch. I'm going to work
> > > on that, but still wanted to share v23 given picking the index sounds
> > > complementary, not strictly required at this point.
> > >
> >
> > I agree that it could be a separate patch. However, do you think we
> > need some way to disable picking the index scan? This is to avoid
> > cases where sequence scan could be better or do we think there won't
> > exist such a case?
> >
>
> I think such a case exists. I tried the following cases based on v23 patch.
>
...
> # Result
> The time executing update (the average of 3 runs is taken, the unit is
> milliseconds):
>
> +--------+---------+---------+
> |        | patched |  master |
> +--------+---------+---------+
> | case 1 | 3933.68 | 1298.32 |
> | case 2 | 1803.46 | 1294.42 |
> | case 3 | 1380.82 | 1299.90 |
> | case 4 | 1042.60 | 1300.20 |
> | case 5 |  691.69 | 1297.51 |
> | case 6 |  578.50 | 1300.69 |
> | case 7 |  566.45 | 1302.17 |
> +--------+---------+---------+
>
> In case 1~3, there's an overhead after applying the patch. In other cases, the
> patch improved the performance. As more duplicate values, the greater the
> overhead after applying the patch.
>

I think this overhead seems to be mostly due to the need to perform
tuples_equal multiple times for duplicate values. I don't know if
there is any simple way to avoid this without using the planner stuff
as was used in the previous approach. So, this brings us to the
question of whether just providing a way to disable/enable the use of
index scan for such cases is sufficient or if we need any other way.

Tom, Andres, or others, do you have any suggestions on how to move
forward with this patch?

--
With Regards,
Amit Kapila.



FYI, I accidentally left this (v23) patch's TAP test
t/032_subscribe_use_index.pl still lurking even after removing all
other parts of this patch.

In this scenario, the t/032 test gets stuck (build of latest HEAD)

IIUC the patch is only meant to affect performance, so I expected this
032 test to work regardless of whether the rest of the patch is
applied.

Anyway,  it hangs every time for me. I didn't dig looking for the
cause, but if it requires patched code for this new test to pass, I
thought it indicates something wrong either with the test or something
more sinister the new test has exposed. Maybe I am mistaken

------
Kind Regards,
Peter Smith.
Fujitsu Australia



On Fri, Feb 17, 2023 at 5:57 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> FYI, I accidentally left this (v23) patch's TAP test
> t/032_subscribe_use_index.pl still lurking even after removing all
> other parts of this patch.
>
> In this scenario, the t/032 test gets stuck (build of latest HEAD)
>
> IIUC the patch is only meant to affect performance, so I expected this
> 032 test to work regardless of whether the rest of the patch is
> applied.
>
> Anyway,  it hangs every time for me. I didn't dig looking for the
> cause, but if it requires patched code for this new test to pass, I
> thought it indicates something wrong either with the test or something
> more sinister the new test has exposed. Maybe I am mistaken
>

Sorry, probably the above was a false alarm. After a long time
(minutes) the stuck test did eventually timeout with:

t/032_subscribe_use_index.pl ....... # poll_query_until timed out
executing this query:
# select (idx_scan = 1) from pg_stat_all_indexes where indexrelname =
'test_replica_id_full_idx';
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
t/032_subscribe_use_index.pl ....... Dubious, test returned 29 (wstat
7424, 0x1d00)

------
Kind Regards,
Peter Smith.
Fujitsu Australia



Hi Amit, all

Amit Kapila <amit.kapila16@gmail.com>, 15 Şub 2023 Çar, 07:37 tarihinde şunu yazdı:
On Wed, Feb 15, 2023 at 9:23 AM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> On Sat, Feb 4, 2023 7:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Feb 2, 2023 at 2:03 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> > >
> > > On v23, I dropped the planner support for picking the index. Instead, it simply
> > > iterates over the indexes and picks the first one that is suitable.
> > >
> > > I'm currently thinking on how to enable users to override this decision.
> > > One option I'm leaning towards is to add a syntax like the following:
> > >
> > > ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...
> > >
> > > Though, that should probably be a seperate patch. I'm going to work
> > > on that, but still wanted to share v23 given picking the index sounds
> > > complementary, not strictly required at this point.
> > >
> >
> > I agree that it could be a separate patch. However, do you think we
> > need some way to disable picking the index scan? This is to avoid
> > cases where sequence scan could be better or do we think there won't
> > exist such a case?
> >
>
> I think such a case exists. I tried the following cases based on v23 patch.
>
...
> # Result
> The time executing update (the average of 3 runs is taken, the unit is
> milliseconds):
>
> +--------+---------+---------+
> |        | patched |  master |
> +--------+---------+---------+
> | case 1 | 3933.68 | 1298.32 |
> | case 2 | 1803.46 | 1294.42 |
> | case 3 | 1380.82 | 1299.90 |
> | case 4 | 1042.60 | 1300.20 |
> | case 5 |  691.69 | 1297.51 |
> | case 6 |  578.50 | 1300.69 |
> | case 7 |  566.45 | 1302.17 |
> +--------+---------+---------+
>
> In case 1~3, there's an overhead after applying the patch. In other cases, the
> patch improved the performance. As more duplicate values, the greater the
> overhead after applying the patch.
>

I think this overhead seems to be mostly due to the need to perform
tuples_equal multiple times for duplicate values. I don't know if
there is any simple way to avoid this without using the planner stuff
as was used in the previous approach. So, this brings us to the
question of whether just providing a way to disable/enable the use of
index scan for such cases is sufficient or if we need any other way.

Tom, Andres, or others, do you have any suggestions on how to move
forward with this patch?


Thanks for the feedback and testing. Due to personal circumstances,
I could not reply the thread in the last 2 weeks, but I'll be more active
going forward. 

 I also agree that we should have a way to control the behavior.

I created another patch (v24_0001_optionally_disable_index.patch) which can be applied
on top of v23_0001_use_index_on_subs_when_pub_rep_ident_full.patch. 

The new patch adds a new subscription_parameter for both CREATE and ALTER subscription
named: enable_index_scan. The setting is valid only when REPLICA IDENTITY is full.

What do you think about such a patch to control the behavior? It does not give a per-relation
level of control, but still useful for many cases.

(Note that I'll be working on the other feedback in the email thread, wanted to send this earlier
to hear some early thoughts on v24_0001_optionally_disable_index.patch).


Attachment
Hi Peter, Amit, Shi Yu and all,

(I'm replying multiple reviews in this single reply, hope that's fine)


On Fri, Feb 17, 2023 at 5:57 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> FYI, I accidentally left this (v23) patch's TAP test
> t/032_subscribe_use_index.pl still lurking even after removing all
> other parts of this patch.
>
> In this scenario, the t/032 test gets stuck (build of latest HEAD)
>
> IIUC the patch is only meant to affect performance, so I expected this
> 032 test to work regardless of whether the rest of the patch is
> applied.
>
> Anyway,  it hangs every time for me. I didn't dig looking for the
> cause, but if it requires patched code for this new test to pass, I
> thought it indicates something wrong either with the test or something
> more sinister the new test has exposed. Maybe I am mistaken
>

Sorry, probably the above was a false alarm. After a long time
(minutes) the stuck test did eventually timeout with:
t/032_subscribe_use_index.pl ....... # poll_query_until timed out
executing this query:
# select (idx_scan = 1) from pg_stat_all_indexes where indexrelname =
'test_replica_id_full_idx';
# expecting this output:
# t
# last actual query output:
# f
# with stderr:
t/032_subscribe_use_index.pl ....... Dubious, test returned 29 (wstat
7424, 0x1d00)

I can tell that this is the expected behavior. Majority of the tests do the following:
- update/delete row on the source
- check pg_stat_all_indexes on the target

So, given that HEAD does not use any indexes, it is expected that the tests would
wait until poll_query_until timeout.  That's why, I do not see/expect any problems on
HEAD. I run the test file by removing the poll_query_until for the index scan counts,
and all finished properly.


I think such a case exists. I tried the following cases based on v23 patch.

As I noted in the earlier reply, I created another patch, which optionally gives the ability to
disable index scans on the subscription level for the replica identity full case.

That is the second patch attached in this mail, named v25_0001_optionally_disable_index.patch.

Here are some review comments for patch v23.

Thanks Peter, see the following reply: 

======
General

1.
IIUC the previous logic for checking "cost" comparisons and selecting
the "cheapest" strategy is no longer present in the latest patch.
In that case, I think there are some leftover stale comments that need
changing. For example,
1a. Commit message:
"let the planner sub-modules compare the costs of index versus
sequential scan and choose the cheapest."
1b. Commit message:
"Finally, pick the cheapest `Path` among."
1c. FindLogicalRepUsableIndex function:
+ * We are looking for one more opportunity for using an index. If
+ * there are any indexes defined on the local relation, try to pick
+ * the cheapest index.

Makes sense, the commit message and function messages should reflect
the new logic. I went over the patch with a detailed look for this. 
 

======
doc/src/sgml/logical-replication.sgml
If replica identity "full" is used, indexes can be used on the
subscriber side for seaching the rows. The index should be btree,
non-partial and have at least one column reference (e.g., should not
consists of only expressions). If there are no suitable indexes, the
search on the subscriber side is very inefficient and should only be
used as a fallback if no other solution is possible

2a.
Fixed typo "seaching", and minor rewording.
SUGGESTION
When replica identity "full" is specified, indexes can be used on the
subscriber side for searching the rows. These indexes should be btree,
non-partial and have at least one column reference (e.g., should not
consist of only expressions). If there are no such suitable indexes,
the search on the subscriber side can be very inefficient, therefore
replica identity "full" should only be used as a fallback if no other
solution is possible.

I like your suggestion, updated

2b.
I know you are just following some existing text here, but IMO this
should probably refer to replica identity <literal>FULL</literal>
instead of "full".

I guess that works, I don't have any preference / knowledge on this.

======
src/backend/executor/execReplication.c
3. IdxIsRelationIdentityOrPK
+/*
+ * Given a relation and OID of an index, returns true if
+ * the index is relation's primary key's index or
+ * relaton's replica identity index.
+ *
+ * Returns false otherwise.
+ */
+static bool
+IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
+{
+ Assert(OidIsValid(idxoid));
+
+ if (RelationGetReplicaIndex(rel) == idxoid ||
+ RelationGetPrimaryKeyIndex(rel) == idxoid)
+ return true;
+
+ return false;
3a.
Comment typo "relaton"

fixed


3b.
Code could be written using single like below if you wish (but see #2c)
return RelationGetReplicaIndex(rel) == idxoid ||
RelationGetPrimaryKeyIndex(rel) == idxoid;
~
3c.
Actually, RelationGetReplicaIndex and RelationGetPrimaryKeyIndex
implementations are very similar so it seemed inefficient to be
calling both of them. IMO it might be better to just make a new
relcache function IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid).
This implementation will be similar to those others, but now you need
only to call the workhorse RelationGetIndexList *one* time.
~~~

Regarding (3c), RelationGetIndexList is only called once, when !relation->rd_indexvalid.
So, there seems not to be a necessary case for merging two functions to me.
Also, I'd rather keep both functions, as I know that some extensions rely
on these functions separately. 

Regarding (3b), it makes sense, applied.


4. RelationFindReplTupleByIndex
  bool found;
+ TypeCacheEntry **eq = NULL; /* only used when the index is not repl. ident
+ * or pkey */
+ bool idxIsRelationIdentityOrPK;
If you change the comment to say "RI" instead of "repl. Ident" then it
can all fit on one line, which would be an improvement.

Done, also changed pkey to PK as this seems to be used throughout the code. 


======
src/backend/replication/logical/relation.c
5.
 #include "replication/logicalrelation.h"
 #include "replication/worker_internal.h"
+#include "optimizer/cost.h"
 #include "utils/inval.h"
Can that #include be added in alphabetical order like the others or not?

Sure, it seems like I intended to do it, but made a small mistake :) 


6. logicalrep_partition_open
+ /*
+ * Finding a usable index is an infrequent task. It occurs when an
+ * operation is first performed on the relation, or after invalidation of
+ * of the relation cache entry (such as ANALYZE or CREATE/DROP index on
+ * the relation).
+ */
+ entry->usableIndexOid = FindLogicalRepUsableIndex(partrel, remoterel);
+
Typo "of of the relation"

Fixed

7. FindUsableIndexForReplicaIdentityFull
+static Oid
+FindUsableIndexForReplicaIdentityFull(Relation localrel)
+{
+ MemoryContext usableIndexContext;
+ MemoryContext oldctx;
+ Oid usableIndex;
+ Oid idxoid;
+ List *indexlist;
+ ListCell   *lc;
+ Relation        indexRelation;
+ IndexInfo  *indexInfo;
+ bool is_btree;
+ bool is_partial;
+ bool is_only_on_expression;
It looks like some of these variables are only used within the scope
of the foreach loop, so I think that is where they should be declared.

makes sense, done



8.
+ usableIndex = InvalidOid;
Might as well do that assignment at the declaration.

done

9. FindLogicalRepUsableIndex
+ /*
+ * Simple case, we already have a primary key or a replica identity index.
+ *
+ * Note that we do not use index scans below when enable_indexscan is
+ * false. Allowing primary key or replica identity even when index scan is
+ * disabled is the legacy behaviour. So we hesitate to move the below
+ * enable_indexscan check to be done earlier in this function.
+ */
+ idxoid = GetRelationIdentityOrPK(localrel);
+ if (OidIsValid(idxoid))
+ return idxoid;
+
+ /* If index scans are disabled, use a sequential scan */
+ if (!enable_indexscan)
+ return InvalidOid;
IMO that "Note" really belongs with the if (!enable)indexscan) more like this:
SUGGESTION
/*
* Simple case, we already have a primary key or a replica identity index.
*/
idxoid = GetRelationIdentityOrPK(localrel);
if (OidIsValid(idxoid))
return idxoid;
/*
* If index scans are disabled, use a sequential scan.
*
* Note we hesitate to move this check to earlier in this function
* because allowing primary key or replica identity even when index scan
* is disabled is the legacy behaviour.
*/
if (!enable_indexscan)
return InvalidOid;

makes sense, moved


======
src/backend/replication/logical/worker.c
10. get_usable_indexoid
+/*
+ * Decide whether we can pick an index for the relinfo (e.g., the relation)
+ * we're actually deleting/updating from. If it is a child partition of
+ * edata->targetRelInfo, find the index on the partition.
+ *
+ * Note that if the corresponding relmapentry has invalid usableIndexOid,
+ * the function returns InvalidOid.
+ */
"(e.g., the relation)" --> "(i.e. the relation)"

fixed


Thanks for your patch. Here are some comments.

Thanks Shi Yu, see my reply below  

1.
I noticed that get_usable_indexoid() is called in apply_handle_update_internal()
and apply_handle_delete_internal() to get the usable index. Could usableIndexOid
be a parameter of these two functions? Because we have got the
LogicalRepRelMapEntry when calling them and if we do so, we can get
usableIndexOid without get_usable_indexoid(). Otherwise for partitioned tables,
logicalrep_partition_open() is called in get_usable_indexoid() and searching
the entry via hash_search() will increase cost.

I think I cannot easily follow this comment. We call logicalrep_partition_open()
because if an update/delete is on a partitioned table, we should find the 
corresponding local index on the partition itself. edata->targetRel points to the
partitioned table, and we map it to the partition inside get_usable_indexoid().

Overall, I cannot see how we can avoid the call to logicalrep_partition_open().
Can you please explain a little further?

Note that logicalrep_partition_open() is cheap for the cases where there is no
invalidations (which is probably most of the time)


2.
+                        * This attribute is an expression, and
+                        * SuitableIndexPathsForRepIdentFull() was called earlier when the
+                        * index for subscriber was selected. There, the indexes
+                        * comprising *only* expressions have already been eliminated.
The comment looks need to be updated:
SuitableIndexPathsForRepIdentFull
->
FindUsableIndexForReplicaIdentityFull

Yes, updated. 

3.
        /* Build scankey for every attribute in the index. */
-       for (attoff = 0; attoff < IndexRelationGetNumberOfKeyAttributes(idxrel); attoff++)
+       for (index_attoff = 0; index_attoff < IndexRelationGetNumberOfKeyAttributes(idxrel);
+                index_attoff++)
        {
Should the comment be changed? Because we skip the attributes that are expressions.

makes sense


4.
+                       Assert(RelationGetReplicaIndex(rel) != RelationGetRelid(idxrel) &&
+                                  RelationGetPrimaryKeyIndex(rel) != RelationGetRelid(idxrel));
Maybe we can call the new function idxIsRelationIdentityOrPK()?

Makes sense, becomes easier to understand.



Here are some comments on the test cases.

1. in test case "SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP INDEX"
+# now, ingest more data and create index on column y which has higher cardinality
+# so that the future commands use the index on column y
+$node_publisher->safe_psql('postgres',
+       "INSERT INTO test_replica_id_full SELECT 50, i FROM generate_series(0,3100)i;");
+$node_subscriber->safe_psql('postgres',
+       "CREATE INDEX test_replica_id_full_idy ON test_replica_id_full(y)");
We don't pick the cheapest index in the current patch, so should we modify this
part of the test?

I think I already changed that test. I kept the test so that we still make sure that even if we
create/drop indexes, we do not mess anything. I agree that the wording / comments were
stale.

Can you check if it looks better now?


BTW, the following comment in FindLogicalRepUsableIndex() need to be changed,
too.
+                * We are looking for one more opportunity for using an index. If
+                * there are any indexes defined on the local relation, try to pick
+                * the cheapest index.


makes sense, Peter also had a similar comment, fixed.


2. Is there any reasons why we need the test case "SUBSCRIPTION USES INDEX WITH
DROPPED COLUMNS"? Has there been a problem related to dropped columns before?

Not really, but dropped columns are tricky in general. As far as I know, those columns
continue to exist in pg_attribute, which might cause some edge cases. So, I wanted to
have coverage for that.


3. in test case "SUBSCRIPTION USES INDEX ON PARTITIONED TABLES"
+# deletes rows and moves between partitions
+$node_publisher->safe_psql('postgres',
+       "DELETE FROM users_table_part WHERE user_id = 1 and value_1 = 1;");
+$node_publisher->safe_psql('postgres',
+       "DELETE FROM users_table_part WHERE user_id = 12 and value_1 = 12;");
"moves between partitions" in the comment seems wrong.

Yes, probably copy & paste error from the UPDATE test 

4. in test case "SUBSCRIPTION DOES NOT USE INDEXES WITH ONLY EXPRESSIONS"
+# update 2 rows
+$node_publisher->safe_psql('postgres',
+       "UPDATE people SET firstname = 'Nan' WHERE firstname = 'first_name_1';");
+$node_publisher->safe_psql('postgres',
+       "UPDATE people SET firstname = 'Nan' WHERE firstname = 'first_name_2' AND lastname = 'last_name_2';");
+
+# make sure the index is not used on the subscriber
+$result = $node_subscriber->safe_psql('postgres',
+       "select idx_scan from pg_stat_all_indexes where indexrelname = 'people_names'");
+is($result, qq(0), 'ensure subscriber tap_sub_rep_full updates two rows via seq. scan with index on expressions');
+
I think it would be better to call wait_for_catchup() before the check because
we want to check the index is NOT used. Otherwise the check may pass because the
rows have not yet been updated on subscriber.

that's right, added 

5. in test case "SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN"
+# show that index is not used even when enable_indexscan=false
+$result = $node_subscriber->safe_psql('postgres',
+       "select idx_scan from pg_stat_all_indexes where indexrelname = 'test_replica_id_full_idx'");
+is($result, qq(0), 'ensure subscriber has not used index with enable_indexscan=false');
Should we remove the word "even" in the comment?

done 

6.
In each test case we re-create publications, subscriptions, and tables. Could we
create only one publication and one subscription at the beginning, and use them
in all test cases? I think this can save some time running the test file.
 
I'd rather keep as-is for (a) simplicity (b) other test files seem to have similar patterns.

Do you think strongly that we should change the test file? It could make debugging
the tests harder as well.


Tom, Andres, or others, do you have any suggestions on how to move
forward with this patch?

Yes, happy to hear any feedback for the attached patch(es).


Thanks,
Onder KALACI

Önder Kalacı <onderkalaci@gmail.com>, 21 Şub 2023 Sal, 17:25 tarihinde şunu yazdı:
Hi Amit, all

Amit Kapila <amit.kapila16@gmail.com>, 15 Şub 2023 Çar, 07:37 tarihinde şunu yazdı:
On Wed, Feb 15, 2023 at 9:23 AM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> On Sat, Feb 4, 2023 7:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Thu, Feb 2, 2023 at 2:03 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> > >
> > > On v23, I dropped the planner support for picking the index. Instead, it simply
> > > iterates over the indexes and picks the first one that is suitable.
> > >
> > > I'm currently thinking on how to enable users to override this decision.
> > > One option I'm leaning towards is to add a syntax like the following:
> > >
> > > ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...
> > >
> > > Though, that should probably be a seperate patch. I'm going to work
> > > on that, but still wanted to share v23 given picking the index sounds
> > > complementary, not strictly required at this point.
> > >
> >
> > I agree that it could be a separate patch. However, do you think we
> > need some way to disable picking the index scan? This is to avoid
> > cases where sequence scan could be better or do we think there won't
> > exist such a case?
> >
>
> I think such a case exists. I tried the following cases based on v23 patch.
>
...
> # Result
> The time executing update (the average of 3 runs is taken, the unit is
> milliseconds):
>
> +--------+---------+---------+
> |        | patched |  master |
> +--------+---------+---------+
> | case 1 | 3933.68 | 1298.32 |
> | case 2 | 1803.46 | 1294.42 |
> | case 3 | 1380.82 | 1299.90 |
> | case 4 | 1042.60 | 1300.20 |
> | case 5 |  691.69 | 1297.51 |
> | case 6 |  578.50 | 1300.69 |
> | case 7 |  566.45 | 1302.17 |
> +--------+---------+---------+
>
> In case 1~3, there's an overhead after applying the patch. In other cases, the
> patch improved the performance. As more duplicate values, the greater the
> overhead after applying the patch.
>

I think this overhead seems to be mostly due to the need to perform
tuples_equal multiple times for duplicate values. I don't know if
there is any simple way to avoid this without using the planner stuff
as was used in the previous approach. So, this brings us to the
question of whether just providing a way to disable/enable the use of
index scan for such cases is sufficient or if we need any other way.

Tom, Andres, or others, do you have any suggestions on how to move
forward with this patch?


Thanks for the feedback and testing. Due to personal circumstances,
I could not reply the thread in the last 2 weeks, but I'll be more active
going forward. 

 I also agree that we should have a way to control the behavior.

I created another patch (v24_0001_optionally_disable_index.patch) which can be applied
on top of v23_0001_use_index_on_subs_when_pub_rep_ident_full.patch. 

The new patch adds a new subscription_parameter for both CREATE and ALTER subscription
named: enable_index_scan. The setting is valid only when REPLICA IDENTITY is full.

What do you think about such a patch to control the behavior? It does not give a per-relation
level of control, but still useful for many cases.

(Note that I'll be working on the other feedback in the email thread, wanted to send this earlier
to hear some early thoughts on v24_0001_optionally_disable_index.patch).


Attachment
On Tue, Feb 21, 2023 at 7:55 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>>
>> I think this overhead seems to be mostly due to the need to perform
>> tuples_equal multiple times for duplicate values. I don't know if
>> there is any simple way to avoid this without using the planner stuff
>> as was used in the previous approach. So, this brings us to the
>> question of whether just providing a way to disable/enable the use of
>> index scan for such cases is sufficient or if we need any other way.
>>
>> Tom, Andres, or others, do you have any suggestions on how to move
>> forward with this patch?
>>
>
> Thanks for the feedback and testing. Due to personal circumstances,
> I could not reply the thread in the last 2 weeks, but I'll be more active
> going forward.
>
>  I also agree that we should have a way to control the behavior.
>
> I created another patch (v24_0001_optionally_disable_index.patch) which can be applied
> on top of v23_0001_use_index_on_subs_when_pub_rep_ident_full.patch.
>
> The new patch adds a new subscription_parameter for both CREATE and ALTER subscription
> named: enable_index_scan. The setting is valid only when REPLICA IDENTITY is full.
>
> What do you think about such a patch to control the behavior? It does not give a per-relation
> level of control, but still useful for many cases.
>

Wouldn't a table-level option like 'apply_index_scan' be better than a
subscription-level option with a default value as false? Anyway, the
bigger point is that we don't see a better way to proceed here than to
introduce some option to control this behavior.

I see this as a way to provide this feature for users but I would
prefer to proceed with this if we can get some more buy-in from senior
community members (at least one more committer) and some user(s) if
possible. So, I once again request others to chime in and share their
opinion.

--
With Regards,
Amit Kapila.



Hi Amit, all


Wouldn't a table-level option like 'apply_index_scan' be better than a
subscription-level option with a default value as false? Anyway, the
bigger point is that we don't see a better way to proceed here than to
introduce some option to control this behavior.

What would be a good API for adding such an option for table-level?
To be more specific, I cannot see any table level sub/pub options in the docs.

My main motivation for doing it for subscription-level is that (a) it might be
too much work for users to control the behavior if it is table-level (b) I couldn't 
find a good API for table-level, and inventing a new one seemed
like a big change.

Overall, I think it makes sense to disable the feature by default. It is
enabled by default, and that's good for test coverage for now, but
let me disable it when I push a version next time.
 

I see this as a way to provide this feature for users but I would
prefer to proceed with this if we can get some more buy-in from senior
community members (at least one more committer) and some user(s) if
possible. So, I once again request others to chime in and share their
opinion.

 
Agreed, it would be great to hear some other perspectives on this.

Thanks,
Onder

 
On Mon, Feb 27, 2023 at 12:35 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>
>> Wouldn't a table-level option like 'apply_index_scan' be better than a
>> subscription-level option with a default value as false? Anyway, the
>> bigger point is that we don't see a better way to proceed here than to
>> introduce some option to control this behavior.
>
>
> What would be a good API for adding such an option for table-level?
> To be more specific, I cannot see any table level sub/pub options in the docs.
>

I was thinking something along the lines of "Storage Parameters" [1]
for a table. See parameters like autovacuum_enabled that decide the
autovacuum behavior for a table. These can be set via CREATE/ALTER
TABLE commands.

[1] - https://www.postgresql.org/docs/devel/sql-createtable.html#SQL-CREATETABLE-STORAGE-PARAMETERS

--
With Regards,
Amit Kapila.



Hi,

On 2023-02-25 16:00:05 +0530, Amit Kapila wrote:
> On Tue, Feb 21, 2023 at 7:55 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> >> I think this overhead seems to be mostly due to the need to perform
> >> tuples_equal multiple times for duplicate values.

I think more work needs to be done to determine the source of the
overhead. It's not clear to me why there'd be an increase in tuples_equal()
calls in the tests upthread.


> Wouldn't a table-level option like 'apply_index_scan' be better than a
> subscription-level option with a default value as false? Anyway, the
> bigger point is that we don't see a better way to proceed here than to
> introduce some option to control this behavior.

I don't think this should default to false. The quadratic apply performance
the sequential scans cause, are a much bigger hazard for users than some apply
performance reqression.


> I see this as a way to provide this feature for users but I would
> prefer to proceed with this if we can get some more buy-in from senior
> community members (at least one more committer) and some user(s) if
> possible. So, I once again request others to chime in and share their
> opinion.

I'd prefer not having an option, because we figure out the cause of the
performance regression (reducing it to be small enough to not care). After
that an option defaulting to using indexes. I don't think an option defaulting
to false makes sense.

I don't care whether it's subscription or relation level option.

Greetings,

Andres Freund



On Wed, Mar 1, 2023 at 12:09 AM Andres Freund <andres@anarazel.de> wrote:
>
> > I see this as a way to provide this feature for users but I would
> > prefer to proceed with this if we can get some more buy-in from senior
> > community members (at least one more committer) and some user(s) if
> > possible. So, I once again request others to chime in and share their
> > opinion.
>
> I'd prefer not having an option, because we figure out the cause of the
> performance regression (reducing it to be small enough to not care). After
> that an option defaulting to using indexes.
>

Sure, if we can reduce regression to be small enough then we don't
need to keep the default as false, otherwise, also, we can consider it
to keep an option defaulting to using indexes depending on the
investigation for regression. Anyway, the main concern was whether it
is okay to have an option for this which I think we have an agreement
on, now I will continue my review.

--
With Regards,
Amit Kapila.



On Wed, Feb 22, 2023 at 7:54 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>

Few comments:
===============
1.
+   identity.  When replica identity <literal>FULL</literal> is specified,
+   indexes can be used on the subscriber side for searching the rows. These
+   indexes should be btree,

Why only btree and not others like a hash index? Also, there should be
some comments in FindUsableIndexForReplicaIdentityFull() to explain
the choices.

2.
- * This is not generic routine, it expects the idxrel to be replication
- * identity of a rel and meet all limitations associated with that.
+ * This is not a generic routine - it expects the idxrel to be an index
+ * that planner would choose if the searchslot includes all the columns
+ * (e.g., REPLICA IDENTITY FULL on the source).
  */
-static bool
+static int

This comment is not clear to me. Which change here makes the
expectation like that? Which planner function/functionality are you
referring to here?

3.
+/*
+ * Given a relation and OID of an index, returns true if
+ * the index is relation's primary key's index or
+ * relation's replica identity index.

It seems the line length is a bit off in the above comments. There
could be a similar mismatch in other places. You might want to run
pgindent.

4.
+}
+
+
+/*
+ * Returns an index oid if there is an index that can be used

Spurious empty line.

5.
- /*
- * We are in error mode so it's fine this is somewhat slow. It's better to
- * give user correct error.
- */
- if (OidIsValid(GetRelationIdentityOrPK(rel->localrel)))
+ /* Give user more precise error if possible. */
+ if (OidIsValid(rel->usableIndexOid))
  {
  ereport(ERROR,
  (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

Is this change valid? I mean this could lead to the error "publisher
did not send replica identity column expected by the logical
replication target relation" when it should have given an error:
"logical replication target relation \"%s.%s\" has neither REPLICA
IDENTITY index nor PRIMARY ...


--
With Regards,
Amit Kapila.



Hi Andres, Amit, Shi Yu, all

Andres Freund <andres@anarazel.de>, 28 Şub 2023 Sal, 21:39 tarihinde şunu yazdı:
Hi,

On 2023-02-25 16:00:05 +0530, Amit Kapila wrote:
> On Tue, Feb 21, 2023 at 7:55 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> >> I think this overhead seems to be mostly due to the need to perform
> >> tuples_equal multiple times for duplicate values.

I think more work needs to be done to determine the source of the
overhead. It's not clear to me why there'd be an increase in tuples_equal()
calls in the tests upthread.


You are right, looking closely, in fact, we most of the time do much less 
tuples_equal() with index scan.

I've done some profiling with perf, and created flame graphs for the apply worker, with the
test described above: -- case 1 (All values are duplicated). I used the following commands:
- perf record -F 99 -p 122555 -g -- sleep 60
-  perf script | ./stackcollapse-perf.pl > out.perf-folded
-  ./flamegraph.pl out.perf-folded > perf_[index|seq]_scan.svg

I attached both flame graphs. I do not see anything specific regarding what the patch does, but
instead the difference mostly seems to come down to index scan vs sequential scan related
functions. As I continue to investigate, I thought it might be useful to share the flame graphs
so that more experienced hackers could comment on the difference.   

Regarding my own end-to-end tests: In some runs, the sequential scan is indeed faster for case-1. But, 
when I execute update tbl set a=a+1; for 50 consecutive times, and measure end to end performance, I see
much better results for index scan, only case-1 is on-par as mostly I'd expect.

Case-1, running the update 50 times and waiting all changes applied
  • index scan: 2minutes 36 seconds
  • sequential scan: 2minutes 30 seconds
Case-2, running the update 50 times and waiting all changes applied
  • index scan: 1 minutes, 2 seconds
  • sequential scan: 2minutes 30 seconds
Case-7, running the update 50 times and waiting all changes applied
  • index scan: 6 seconds
  • sequential scan: 2minutes 26seconds


> # Result
The time executing update (the average of 3 runs is taken, the unit is
milliseconds):

Shi Yu, could it be possible for you to re-run the tests with some more runs, and share the average?
I suspect maybe your test results have a very small pool size, and some runs are making
the average slightly problematic.
 
In my tests, I shared the total time, which is probably also fine.

Thanks,
Onder


Attachment

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"shiy.fnst@fujitsu.com"
Date:
On Wed, Mar 1, 2023 9:22 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> 
> Hi Andres, Amit, Shi Yu, all
> 
> Andres Freund <mailto:andres@anarazel.de>, 28 Şub 2023 Sal, 21:39 tarihinde şunu yazdı:
> Hi,
> 
> On 2023-02-25 16:00:05 +0530, Amit Kapila wrote:
> > On Tue, Feb 21, 2023 at 7:55 PM Önder Kalacı <mailto:onderkalaci@gmail.com> wrote:
> > >> I think this overhead seems to be mostly due to the need to perform
> > >> tuples_equal multiple times for duplicate values.
> 
> I think more work needs to be done to determine the source of the
> overhead. It's not clear to me why there'd be an increase in tuples_equal()
> calls in the tests upthread.
> 
> You are right, looking closely, in fact, we most of the time do much less 
> tuples_equal() with index scan.
> 
> I've done some profiling with perf, and created flame graphs for the apply worker, with the
> test described above: -- case 1 (All values are duplicated). I used the following commands:
> - perf record -F 99 -p 122555 -g -- sleep 60
> -  perf script | ./http://stackcollapse-perf.pl > out.perf-folded
> -  ./http://flamegraph.pl out.perf-folded > perf_[index|seq]_scan.svg
> 
> I attached both flame graphs. I do not see anything specific regarding what the patch does, but
> instead the difference mostly seems to come down to index scan vs sequential scan related
> functions. As I continue to investigate, I thought it might be useful to share the flame graphs
> so that more experienced hackers could comment on the difference.   
> 
> Regarding my own end-to-end tests: In some runs, the sequential scan is indeed faster for case-1. But, 
> when I execute update tbl set a=a+1; for 50 consecutive times, and measure end to end performance, I see
> much better results for index scan, only case-1 is on-par as mostly I'd expect.
> 
> Case-1, running the update 50 times and waiting all changes applied
> • index scan: 2minutes 36 seconds
> • sequential scan: 2minutes 30 seconds
> Case-2, running the update 50 times and waiting all changes applied
> • index scan: 1 minutes, 2 seconds
> • sequential scan: 2minutes 30 seconds
> Case-7, running the update 50 times and waiting all changes applied
> • index scan: 6 seconds
> • sequential scan: 2minutes 26seconds
> 
> 
> > # Result
> The time executing update (the average of 3 runs is taken, the unit is
> milliseconds):
> 
> Shi Yu, could it be possible for you to re-run the tests with some more runs, and share the average?
> I suspect maybe your test results have a very small pool size, and some runs are making
> the average slightly problematic.
>  
> In my tests, I shared the total time, which is probably also fine.
>

Thanks for your reply, I re-tested (based on
v25_0001_use_index_on_subs_when_pub_rep_ident_full.patch) and took the average
of 100 runs. The results are as follows. The unit is milliseconds.

case1
sequential scan: 1348.57
index scan: 3785.15

case2
sequential scan: 1350.26
index scan: 1754.01

case3
sequential scan: 1350.13
index scan: 1340.97

There was still some degradation in the first two cases. There are some gaps in
our test results. Some information about my test is as follows.

a. Some parameters specified in postgresql.conf.
shared_buffers = 8GB
checkpoint_timeout = 30min
max_wal_size = 20GB
min_wal_size = 10GB
autovacuum = off

b. Executed SQL.
I executed TRUNCATE and INSERT before each UPDATE. I am not sure if you did the
same, or just executed 50 consecutive UPDATEs. If the latter one, there would be
lots of old tuples and this might have a bigger impact on sequential scan. I
tried this case (which executes 50 consecutive UPDATEs) and also saw that the
overhead is smaller than before.


Besides, I looked into the regression of this patch with `gprof`. Some results
are as follows. I think with single buffer lock, sequential scan can scan
multiple tuples (see heapgettup()), while index scan can only scan one tuple. So
in case1, which has lots of duplicate values and more tuples need to be scanned,
index scan takes longer time.

- results of `gprof`
case1:
master
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  1.37      0.66     0.01   654312     0.00     0.00  LWLockAttemptLock
  0.00      0.73     0.00   573358     0.00     0.00  LockBuffer
  0.00      0.73     0.00    10014     0.00     0.06  heap_getnextslot

patched
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  9.70      1.27     0.36 50531459     0.00     0.00  LWLockAttemptLock
  3.23      2.42     0.12 100259200     0.00     0.00  LockBuffer
  6.20      1.50     0.23 50015101     0.00     0.00  heapam_index_fetch_tuple
  4.04      2.02     0.15 50015101     0.00     0.00  index_fetch_heap
  1.35      3.21     0.05    10119     0.00     0.00  index_getnext_slot

case7:
master
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  2.67      0.60     0.02   654582     0.00     0.00  LWLockAttemptLock
  0.00      0.75     0.00   573488     0.00     0.00  LockBuffer
  0.00      0.75     0.00    10014     0.00     0.06  heap_getnextslot

patched
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  0.00      0.12     0.00   241979     0.00     0.00  LWLockAttemptLock
  0.00      0.12     0.00   180884     0.00     0.00  LockBuffer
  0.00      0.12     0.00    10101     0.00     0.00  heapam_index_fetch_tuple
  0.00      0.12     0.00    10101     0.00     0.00  index_fetch_heap
  0.00      0.12     0.00    10119     0.00     0.00  index_getnext_slot

Regards,
Shi Yu

On Thu, Mar 2, 2023 at 1:37 PM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> On Wed, Mar 1, 2023 9:22 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> >
> > > # Result
> > The time executing update (the average of 3 runs is taken, the unit is
> > milliseconds):
> >
> > Shi Yu, could it be possible for you to re-run the tests with some more runs, and share the average?
> > I suspect maybe your test results have a very small pool size, and some runs are making
> > the average slightly problematic.
> >
> > In my tests, I shared the total time, which is probably also fine.
> >
>
> Thanks for your reply, I re-tested (based on
> v25_0001_use_index_on_subs_when_pub_rep_ident_full.patch) and took the average
> of 100 runs. The results are as follows. The unit is milliseconds.
>
> case1
> sequential scan: 1348.57
> index scan: 3785.15
>
> case2
> sequential scan: 1350.26
> index scan: 1754.01
>
> case3
> sequential scan: 1350.13
> index scan: 1340.97
>
> There was still some degradation in the first two cases. There are some gaps in
> our test results. Some information about my test is as follows.
>
> a. Some parameters specified in postgresql.conf.
> shared_buffers = 8GB
> checkpoint_timeout = 30min
> max_wal_size = 20GB
> min_wal_size = 10GB
> autovacuum = off
>
> b. Executed SQL.
> I executed TRUNCATE and INSERT before each UPDATE. I am not sure if you did the
> same, or just executed 50 consecutive UPDATEs. If the latter one, there would be
> lots of old tuples and this might have a bigger impact on sequential scan. I
> tried this case (which executes 50 consecutive UPDATEs) and also saw that the
> overhead is smaller than before.
>
>
> Besides, I looked into the regression of this patch with `gprof`. Some results
> are as follows. I think with single buffer lock, sequential scan can scan
> multiple tuples (see heapgettup()), while index scan can only scan one tuple. So
> in case1, which has lots of duplicate values and more tuples need to be scanned,
> index scan takes longer time.
>
> - results of `gprof`
> case1:
> master
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>   1.37      0.66     0.01   654312     0.00     0.00  LWLockAttemptLock
>   0.00      0.73     0.00   573358     0.00     0.00  LockBuffer
>   0.00      0.73     0.00    10014     0.00     0.06  heap_getnextslot
>
> patched
>   %   cumulative   self              self     total
>  time   seconds   seconds    calls  ms/call  ms/call  name
>   9.70      1.27     0.36 50531459     0.00     0.00  LWLockAttemptLock
>   3.23      2.42     0.12 100259200     0.00     0.00  LockBuffer
>   6.20      1.50     0.23 50015101     0.00     0.00  heapam_index_fetch_tuple
>   4.04      2.02     0.15 50015101     0.00     0.00  index_fetch_heap
>   1.35      3.21     0.05    10119     0.00     0.00  index_getnext_slot
>

In the above profile number of calls to index_fetch_heap(),
heapam_index_fetch_tuple() explains the reason for the regression you
are seeing with the index scan. Because the update will generate dead
tuples in the same transaction and those dead tuples won't be removed,
we get those from the index and then need to perform
index_fetch_heap() to find out whether the tuple is dead or not. Now,
for sequence scan also we need to scan those dead tuples but there we
don't need to do back-and-forth between index and heap. I think we can
once check with more number of tuples (say with 20000, 50000, etc.)
for case-1.

--
With Regards,
Amit Kapila.



Hi,


Few comments:
===============
1.
+   identity.  When replica identity <literal>FULL</literal> is specified,
+   indexes can be used on the subscriber side for searching the rows. These
+   indexes should be btree,

Why only btree and not others like a hash index? Also, there should be
some comments in FindUsableIndexForReplicaIdentityFull() to explain
the choices.

I updated the comment(s).

For a more technical reference, we have these restrictions, because we rely on
RelationFindReplTupleByIndex() which is designed to handle PK/RI. And,
RelationFindReplTupleByIndex() is written in a way that only expects 
indexes with these limitations.

In order to keep the changes as small as possible, I refrained from relaxing this 
limitation for now. I'm definitely up to working on this for relaxing these
limitations, and practically allowing more cases for non-unique indexes. 
 

2.
- * This is not generic routine, it expects the idxrel to be replication
- * identity of a rel and meet all limitations associated with that.
+ * This is not a generic routine - it expects the idxrel to be an index
+ * that planner would choose if the searchslot includes all the columns
+ * (e.g., REPLICA IDENTITY FULL on the source).
  */
-static bool
+static int

This comment is not clear to me. Which change here makes the
expectation like that? Which planner function/functionality are you
referring to here?

Ops, planner related comments are definitely stale. As you might 
remember, in the earlier iterations of this patch, we had some
planner functions to pick  indexes for us.

Anyway, I think even for that version of the patch, this comment was
wrong. Updated now, does that look better?
 

3.
+/*
+ * Given a relation and OID of an index, returns true if
+ * the index is relation's primary key's index or
+ * relation's replica identity index.

It seems the line length is a bit off in the above comments. There
could be a similar mismatch in other places. You might want to run
pgindent.

Makes sense, run pgindent. But it didn't fix this specific instance automatically,
I changed that manually.
 

4.
+}
+
+
+/*
+ * Returns an index oid if there is an index that can be used

Spurious empty line.

fixed
 

5.
- /*
- * We are in error mode so it's fine this is somewhat slow. It's better to
- * give user correct error.
- */
- if (OidIsValid(GetRelationIdentityOrPK(rel->localrel)))
+ /* Give user more precise error if possible. */
+ if (OidIsValid(rel->usableIndexOid))
  {
  ereport(ERROR,
  (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),

Is this change valid? I mean this could lead to the error "publisher
did not send replica identity column expected by the logical
replication target relation" when it should have given an error:
"logical replication target relation \"%s.%s\" has neither REPLICA
IDENTITY index nor PRIMARY ...


 Hmm, that's right, we'd get a wrong error message. 

I spent quite a bit of time trying to understand whether we'd 
need anything additional after this patch regarding this function,
but my current understanding is that we should just leave the
check as-is. It is mainly because when REPLICA IDENTITY is
full, there is no need to check anything further (other than the
check that is at the bottom of the function)


Attached are both patches: the main patch, and the patch that
optionally disables the index scans. Let's discuss the necessity
for the second patch in the lights of the data we collect with
some more tests.

Thanks,
Onder


Attachment
On Thu, Mar 2, 2023 at 3:00 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>>
>> Few comments:
>> ===============
>> 1.
>> +   identity.  When replica identity <literal>FULL</literal> is specified,
>> +   indexes can be used on the subscriber side for searching the rows. These
>> +   indexes should be btree,
>>
>> Why only btree and not others like a hash index? Also, there should be
>> some comments in FindUsableIndexForReplicaIdentityFull() to explain
>> the choices.
>
>
> I updated the comment(s).
>
> For a more technical reference, we have these restrictions, because we rely on
> RelationFindReplTupleByIndex() which is designed to handle PK/RI. And,
> RelationFindReplTupleByIndex() is written in a way that only expects
> indexes with these limitations.
>
> In order to keep the changes as small as possible, I refrained from relaxing this
> limitation for now. I'm definitely up to working on this for relaxing these
> limitations, and practically allowing more cases for non-unique indexes.
>

See, I think I understand why partial/expression indexes can't be
supported. It seems to me that because the required tuple may not
satisfy the expression and that won't work for our case. But what are
other limitations you see due to which we can't support other index
types for non-unique indexes? Is it just a matter of testing other
index types or there is something more to it, if so, we should add
comments so that they can be supported in the future if it is feasible
to do so.

>
> Attached are both patches: the main patch, and the patch that
> optionally disables the index scans.
>

Both the patches are numbered 0001. It would be better to number them
as 0001 and 0002.



--
With Regards,
Amit Kapila.



Hi Amit, Shi Yu


>
> b. Executed SQL.
> I executed TRUNCATE and INSERT before each UPDATE. I am not sure if you did the
> same, or just executed 50 consecutive UPDATEs. If the latter one, there would be
> lots of old tuples and this might have a bigger impact on sequential scan. I
> tried this case (which executes 50 consecutive UPDATEs) and also saw that the
> overhead is smaller than before.

Alright, I'll do similarly, execute truncate/insert before each update.  
 
In the above profile number of calls to index_fetch_heap(),
heapam_index_fetch_tuple() explains the reason for the regression you
are seeing with the index scan. Because the update will generate dead
tuples in the same transaction and those dead tuples won't be removed,
we get those from the index and then need to perform
index_fetch_heap() to find out whether the tuple is dead or not. Now,
for sequence scan also we need to scan those dead tuples but there we
don't need to do back-and-forth between index and heap.

Thanks for the insights, I think what you describe makes a lot of sense.

 
I think we can
once check with more number of tuples (say with 20000, 50000, etc.)
for case-1.


As we'd expect, this test made the performance regression more visible.

I quickly ran case-1 for 50 times with 50000 as Shi Yu does, and got
the following results. I'm measuring end-to-end times for running the
whole set of commands:

seq_scan:     00 hr 24 minutes, 42 seconds
index_scan:  01 hr 04 minutes 54 seconds


But, I'm still not sure whether we should focus on this regression too
much. In the end, what we are talking about is a case (e.g., all or many
rows are duplicated) where using an index is not a good idea anyway. So,
I doubt users would have such indexes.


>  The quadratic apply performance
> the sequential scans cause, are a much bigger hazard for users than some apply
> performance reqression.

Quoting Andres' note, I personally think that the regression for this case
is not a big concern. 

> I'd prefer not having an option, because we figure out the cause of the
> performance regression (reducing it to be small enough to not care). After
> that an option defaulting to using indexes. I don't think an option defaulting
> to false makes sense.

I think we figured out the cause of the performance regression. I think it is not small 
enough for some scenarios like the above. But those scenarios seem like synthetic
test cases, with not much user impacting implications. Still, I think you are better suited
to comment on this.

If you consider that this is a significant issue,  we could consider the second patch as well
such that for this unlikely scenario users could disable index scans. 
  
Thanks,
Onder

Hi,


 Is it just a matter of testing other
index types 

Yes, there are more to it. build_replindex_scan_key()
only works for btree indexes, as it does BTEqualStrategyNumber.

I might expect a few more limitations like that. I added comment
in the code (see FindUsableIndexForReplicaIdentityFull)

or there is something more to it, if so, we should add
comments so that they can be supported in the future if it is feasible
to do so.

I really don't see any fundamental issues regarding expanding the
support for more index types, it is just some more coding & testing.

And, I can (and willing to) work on that as a follow-up. I explicitly
try to keep this patch as small as possible.
 
>
> Attached are both patches: the main patch, and the patch that
> optionally disables the index scans.
>

Both the patches are numbered 0001. It would be better to number them
as 0001 and 0002.


Alright, attached v27_0001_use_index_on_subs_when_pub_rep_ident_full.patch and 
v27_0002_use_index_on_subs_when_pub_rep_ident_full.patch.

I also added one more test which Andres asked me on a private chat
(Testcase start: SUBSCRIPTION USES INDEX WITH PUB/SUB different data).

Thanks,
Onder
Attachment
FYI,

After applying only the 0001 patch I received a TAP test error.

t/032_subscribe_use_index.pl ....... 1/? # Tests were run but no plan
was declared and done_testing() was not seen.
t/032_subscribe_use_index.pl ....... Dubious, test returned 29 (wstat
7424, 0x1d00)
All 1 subtests passed
t/100_bugs.pl ...................... ok


More details:

2023-03-03 12:45:45.382 AEDT [9931] 032_subscribe_use_index.pl LOG:
statement: CREATE INDEX test_replica_id_full_idx ON
test_replica_id_full(x)
2023-03-03 12:45:45.423 AEDT [9937] 032_subscribe_use_index.pl LOG:
statement: CREATE SUBSCRIPTION tap_sub_rep_full CONNECTION 'port=56538
host=/tmp/zWyRQnOa1a dbname=postgres application_name=tap_sub'
PUBLICATION tap_pub_rep_full WITH (enable_index_scan = false)
2023-03-03 12:45:45.423 AEDT [9937] 032_subscribe_use_index.pl ERROR:
unrecognized subscription parameter: "enable_index_scan"
2023-03-03 12:45:45.423 AEDT [9937] 032_subscribe_use_index.pl
STATEMENT:  CREATE SUBSCRIPTION tap_sub_rep_full CONNECTION
'port=56538 host=/tmp/zWyRQnOa1a dbname=postgres
application_name=tap_sub' PUBLICATION tap_pub_rep_full WITH
(enable_index_scan = false)
2023-03-03 12:45:45.532 AEDT [9834] LOG:  received immediate shutdown request
2023-03-03 12:45:45.533 AEDT [9834] LOG:  database system is shut down

~~

The patches 0001 and 0002 seem to have accidentally blended together
because AFAICT the error is because patch 0001 is testing something
that is not available until 0002.

------
Kind Regards,
Peter Smith.
Fujitsu Australia



On Thu, Mar 2, 2023 at 6:50 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>>
>>
>> In the above profile number of calls to index_fetch_heap(),
>> heapam_index_fetch_tuple() explains the reason for the regression you
>> are seeing with the index scan. Because the update will generate dead
>> tuples in the same transaction and those dead tuples won't be removed,
>> we get those from the index and then need to perform
>> index_fetch_heap() to find out whether the tuple is dead or not. Now,
>> for sequence scan also we need to scan those dead tuples but there we
>> don't need to do back-and-forth between index and heap.
>
>
> Thanks for the insights, I think what you describe makes a lot of sense.
>
...
...
>
> I think we figured out the cause of the performance regression. I think it is not small
> enough for some scenarios like the above. But those scenarios seem like synthetic
> test cases, with not much user impacting implications. Still, I think you are better suited
> to comment on this.
>
> If you consider that this is a significant issue,  we could consider the second patch as well
> such that for this unlikely scenario users could disable index scans.
>

I think we can't completely ignore this regression because the key
point of this patch is to pick one of the non-unique indexes to
perform scan and now it will be difficult to predict how many
duplicates (and or dead rows) some index has without more planner
support. Personally, I feel it is better to have a table-level option
for this so that users have some knob to avoid regressions in
particular cases. In general, I agree that it will be a win in more
number of cases than it can regress.

--
With Regards,
Amit Kapila.



RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"houzj.fnst@fujitsu.com"
Date:
On Thursday, March 2, 2023 11:23 PM Önder Kalacı <onderkalaci@gmail.com>  wrote:

> Both the patches are numbered 0001. It would be better to number them
> as 0001 and 0002.
> 
> Alright, attached v27_0001_use_index_on_subs_when_pub_rep_ident_full.patch and 
> v27_0002_use_index_on_subs_when_pub_rep_ident_full.patch.
> 
> I also added one more test which Andres asked me on a private chat
> (Testcase start: SUBSCRIPTION USES INDEX WITH PUB/SUB different data).

Thanks for updating the patch. I think this patch can bring noticeable
performance improvements in some use cases.

And here are few comments after reading the patch.

1.
+    usableIndexContext = AllocSetContextCreate(CurrentMemoryContext,
+                                               "usableIndexContext",
+                                               ALLOCSET_DEFAULT_SIZES);
+    oldctx = MemoryContextSwitchTo(usableIndexContext);
+
+    /* Get index list of the local relation */
+    indexlist = RelationGetIndexList(localrel);
+    Assert(indexlist != NIL);
+
+    foreach(lc, indexlist)

Is it necessary to create a memory context here ? I thought the memory will be
freed after this apply action soon.

2.

+            /*
+             * Furthermore, because primary key and unique key indexes can't
+             * include expressions we also sanity check the index is neither
+             * of those kinds.
+             */
+            Assert(!IdxIsRelationIdentityOrPK(rel, idxrel->rd_id));

It seems you mean "replica identity key" instead of "unique key" in the comments.


3.
--- a/src/include/replication/logicalrelation.h
+++ b/src/include/replication/logicalrelation.h
...
+extern bool IsIndexOnlyOnExpression(IndexInfo *indexInfo);

The definition function seems better to be placed in execReplication.c

4.

+extern Oid GetRelationIdentityOrPK(Relation rel);

The function is only used in relation.c, so we can make it a static
function.


5.

+    /*
+     * If index scans are disabled, use a sequential scan.
+     *
+     * Note that we do not use index scans below when enable_indexscan is
+     * false. Allowing primary key or replica identity even when index scan is
+     * disabled is the legacy behaviour. So we hesitate to move the below
+     * enable_indexscan check to be done earlier in this function.
+     */
+    if (!enable_indexscan)
+        return InvalidOid;

Since the document of enable_indexscan says "Enables or disables the query
planner's use of index-scan plan types. The default is on.", and we don't use
planner here, so I am not sure should we allow/disallow index scan in apply
worker based on this GUC.

Best Regards,
Hou zj


Here are some review comments for v27-0001 (not the tests)

======
Commit Message

1.
There is no smart mechanism to pick the index. Instead, we choose
the first index that fulfils the requirements mentioned above.

~

1a.
I think this paragraph should immediately follow the earlier one
("With this patch...") which talked about the index requirements.

~

1b.
Slight rewording

SUGGESTION
If there is more than one index that satisfies these requirements, we
just pick the first one.

======
doc/src/sgml/logical-replication.sgml

2.
A published table must have a “replica identity” configured in order
to be able to replicate UPDATE and DELETE operations, so that
appropriate rows to update or delete can be identified on the
subscriber side. By default, this is the primary key, if there is one.
Another unique index (with certain additional requirements) can also
be set to be the replica identity. When replica identity FULL is
specified, indexes can be used on the subscriber side for searching
the rows. These indexes should be btree, non-partial and have at least
one column reference (e.g., should not consist of only expressions).
These restrictions on the non-unique index properties are in essence
the same restrictions that are enforced for primary keys. Internally,
we follow the same approach for supporting index scans within logical
replication scope. If there are no such suitable indexes, the search
on the subscriber s ide can be very inefficient, therefore replica
identity FULL should only be used as a fallback if no other solution
is possible. If a replica identity other than “full” is set on the
publisher side, a replica identity comprising the same or fewer
columns must also be set on the subscriber side. See REPLICA IDENTITY
for details on how to set the replica identity. If a table without a
replica identity is added to a publication that replicates UPDATE or
DELETE operations then subsequent UPDATE or DELETE operations will
cause an error on the publisher. INSERT operations can proceed
regardless of any replica identity.

~

2a.
IMO the <quote>replica identity</quote> in the first sentence should
be changed to be <firstterm>replica identity</firstterm>

~

2b.
Typo: "subscriber s ide" --> "subscriber side"

~

2c.
There is still one remaining "full" in this text. I think ought to be
changed to <literal>FULL</literal> to match the others.

======
src/backend/executor/execReplication.c

3. IdxIsRelationIdentityOrPK

+/*
+ * Given a relation and OID of an index, returns true if the
+ * index is relation's primary key's index or relation's
+ * replica identity index.
+ *
+ * Returns false otherwise.
+ */
+bool
+IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
+{
+ Assert(OidIsValid(idxoid));
+
+ return RelationGetReplicaIndex(rel) == idxoid ||
+ RelationGetPrimaryKeyIndex(rel) == idxoid;
 }

~

Since the function name mentions RI (1st) and then PK (2nd), and since
the implementation also has the same order, I think the function
comment should use the same consistent order when describing what it
does.

======
src/backend/replication/logical/relation.c

4. FindUsableIndexForReplicaIdentityFull

+/*
+ * Returns an index oid if there is an index that can be used
+ * via the apply worker. The index should be btree, non-partial
+ * and have at least one column reference (e.g., should
+ * not consist of only expressions). The limitations arise from
+ * RelationFindReplTupleByIndex(), which is designed to handle
+ * PK/RI and these limitations are inherent to PK/RI.
+ *
+ * There are no fundamental problems for supporting non-btree
+ * and/or partial indexes. We should mostly relax the limitations
+ * in RelationFindReplTupleByIndex().
+ *
+ * If no suitable index is found, returns InvalidOid.
+ *
+ * Note that this is not a generic function, it expects REPLICA
+ * IDENTITY FULL for the remote relation.
+ */

~

4a.
Minor rewording of 1st sentence.

BEFORE
Returns an index oid if there is an index that can be used via the apply worker.

SUGGESTION
Returns the oid of an index that can be used via the apply worker.

~

4b.
+ * There are no fundamental problems for supporting non-btree
+ * and/or partial indexes. We should mostly relax the limitations
+ * in RelationFindReplTupleByIndex().

I think this paragraph should come later in the comment (just before
the Note) and should also have "XXX" prefix to indicate it is some
implementation note for future versions.

~~~

5. GetRelationIdentityOrPK

+/*
+ * Get replica identity index or if it is not defined a primary key.
+ *
+ * If neither is defined, returns InvalidOid
+ */
+Oid
+GetRelationIdentityOrPK(Relation rel)
+{
+ Oid idxoid;
+
+ idxoid = RelationGetReplicaIndex(rel);
+
+ if (!OidIsValid(idxoid))
+ idxoid = RelationGetPrimaryKeyIndex(rel);
+
+ return idxoid;
+}

This is really very similar code to the other new function called
IdxIsRelationIdentityOrPK. I wondered if such similar functions could
be defined together.

~~~

6. FindLogicalRepUsableIndex

+/*
+ * Returns an index oid if we can use an index for subscriber. If not,
+ * returns InvalidOid.
+ */

SUGGESTION
Returns the oid of an index that can be used by a subscriber.
Otherwise, returns InvalidOid.

------
Kind Regards,
Peter Smith.
Fujitsu Australia



On Thu, Mar 2, 2023 at 8:52 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Alright, attached v27_0001_use_index_on_subs_when_pub_rep_ident_full.patch and
> v27_0002_use_index_on_subs_when_pub_rep_ident_full.patch.
>

Few comments on 0001
====================
1.
+   such suitable indexes, the search on the subscriber s ide can be
very inefficient,

unnecessary space in 'side'

2.
-   identity.  If the table does not have any suitable key, then it can be set
-   to replica identity <quote>full</quote>, which means the entire row becomes
-   the key.  This, however, is very inefficient and should only be used as a
-   fallback if no other solution is possible.  If a replica identity other
+   identity. When replica identity <literal>FULL</literal> is specified,
+   indexes can be used on the subscriber side for searching the rows.

I think it is better to retain the first sentence (If the table does
not ... entire row becomes the key.) as that says what will be part of
the key.

3.
-   comprising the same or fewer columns must also be set on the subscriber
-   side.  See <xref linkend="sql-altertable-replica-identity"/> for details on
-   how to set the replica identity.  If a table without a replica identity is
-   added to a publication that replicates <command>UPDATE</command>
+   comprising the same or fewer columns must also be set on the
subscriber side.
+   See <xref linkend="sql-altertable-replica-identity"/> for
+   details on how to set the replica identity.  If a table without a replica
+   identity is added to a publication that replicates <command>UPDATE</command>

I don't see any change in this except line length. If so, I don't
think we should change it as part of this patch.

4.
 /*
  * Setup a ScanKey for a search in the relation 'rel' for a tuple 'key' that
  * is setup to match 'rel' (*NOT* idxrel!).
  *
- * Returns whether any column contains NULLs.
+ * Returns how many columns should be used for the index scan.
+ *
+ * This is not generic routine, it expects the idxrel to be
+ * a btree, non-partial and have at least one column
+ * reference (e.g., should not consist of only expressions).
  *
- * This is not generic routine, it expects the idxrel to be replication
- * identity of a rel and meet all limitations associated with that.
+ * By definition, replication identity of a rel meets all
+ * limitations associated with that. Note that any other
+ * index could also meet these limitations.

The comment changes look quite asymmetric to me. Normally, we break
the line if the line length goes beyond 80 cols. Please check and
change other places in the patch if they have a similar symptom.

5.
+ * There are no fundamental problems for supporting non-btree
+ * and/or partial indexes.

Can we mention partial indexes in the above comment? It seems to me
that because the required tuple may not satisfy the expression (in the
case of partial indexes) it may not be easy to support it.

6.
build_replindex_scan_key()
{
...
+ for (index_attoff = 0; index_attoff <
IndexRelationGetNumberOfKeyAttributes(idxrel);
+ index_attoff++)
...
...
+#ifdef USE_ASSERT_CHECKING
+ IndexInfo  *indexInfo = BuildIndexInfo(idxrel);
+
+ Assert(!IsIndexOnlyOnExpression(indexInfo));
+#endif
...
}

We can avoid building index info multiple times. This can be either
checked at the beginning of the function outside attribute offset loop
or we can probably cache it. I understand this is for assert builds
but seems easy to avoid it doing multiple times and it also looks odd
to do it multiple times for the same index.

7.
- /* Build scankey for every attribute in the index. */
- for (attoff = 0; attoff <
IndexRelationGetNumberOfKeyAttributes(idxrel); attoff++)
+ /* Build scankey for every non-expression attribute in the index. */
+ for (index_attoff = 0; index_attoff <
IndexRelationGetNumberOfKeyAttributes(idxrel);
+ index_attoff++)
  {
  Oid operator;
  Oid opfamily;
+ Oid optype = get_opclass_input_type(opclass->values[index_attoff]);
  RegProcedure regop;
- int pkattno = attoff + 1;
- int mainattno = indkey->values[attoff];
- Oid optype = get_opclass_input_type(opclass->values[attoff]);
+ int table_attno = indkey->values[index_attoff];

I don't think here we need to change variable names if we retain
mainattno as it is instead of changing it to table_attno. The current
naming doesn't seem bad for the current usage of the patch.

8.
+ TypeCacheEntry **eq = NULL; /* only used when the index is not RI or PK */

Normally, we don't add such comments as the usage is quite obvious by
looking at the code.


--
With Regards,
Amit Kapila.



On Thu, 2 Mar 2023 at 20:53, Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Hi,
>
>>
>>  Is it just a matter of testing other
>> index types
>
>
> Yes, there are more to it. build_replindex_scan_key()
> only works for btree indexes, as it does BTEqualStrategyNumber.
>
> I might expect a few more limitations like that. I added comment
> in the code (see FindUsableIndexForReplicaIdentityFull)
>
>> or there is something more to it, if so, we should add
>> comments so that they can be supported in the future if it is feasible
>> to do so.
>
>
> I really don't see any fundamental issues regarding expanding the
> support for more index types, it is just some more coding & testing.
>
> And, I can (and willing to) work on that as a follow-up. I explicitly
> try to keep this patch as small as possible.
>
>>
>> >
>> > Attached are both patches: the main patch, and the patch that
>> > optionally disables the index scans.
>> >
>>
>> Both the patches are numbered 0001. It would be better to number them
>> as 0001 and 0002.
>>
>
> Alright, attached v27_0001_use_index_on_subs_when_pub_rep_ident_full.patch and
> v27_0002_use_index_on_subs_when_pub_rep_ident_full.patch.
>
> I also added one more test which Andres asked me on a private chat
> (Testcase start: SUBSCRIPTION USES INDEX WITH PUB/SUB different data).

Thanks for the patch. Few comments:
1) We are currently calling RelationGetIndexList twice, once in
FindUsableIndexForReplicaIdentityFull function and in the caller too,
we could avoid one of the calls by passing the indexlist to the
function or removing the check here, index list check can be handled
in FindUsableIndexForReplicaIdentityFull.
+       if (remoterel->replident == REPLICA_IDENTITY_FULL &&
+               RelationGetIndexList(localrel) != NIL)
+       {
+               /*
+                * If we had a primary key or relation identity with a
unique index,
+                * we would have already found and returned that oid.
At this point,
+                * the remote relation has replica identity full and
we have at least
+                * one local index defined.
+                *
+                * We are looking for one more opportunity for using
an index. If
+                * there are any indexes defined on the local
relation, try to pick
+                * a suitable index.
+                *
+                * The index selection safely assumes that all the
columns are going
+                * to be available for the index scan given that
remote relation has
+                * replica identity full.
+                */
+               return FindUsableIndexForReplicaIdentityFull(localrel);
+       }
+

2) Copyright year should be mentioned as 2023
diff --git a/src/test/subscription/t/032_subscribe_use_index.pl
b/src/test/subscription/t/032_subscribe_use_index.pl
new file mode 100644
index 0000000000..db0a7ea2a0
--- /dev/null
+++ b/src/test/subscription/t/032_subscribe_use_index.pl
@@ -0,0 +1,861 @@
+# Copyright (c) 2021-2022, PostgreSQL Global Development Group
+
+# Test logical replication behavior with subscriber uses available index
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+

3) Many of the tests are using the same tables, we need not
drop/create publication/subscription for each of the team, we could
just drop and create required indexes and verify the update/delete
statements.
+# ====================================================================
+# Testcase start: SUBSCRIPTION USES INDEX
+#
+# Basic test where the subscriber uses index
+# and only updates 1 row and deletes
+# 1 other row
+#
+
+# create tables pub and sub
+$node_publisher->safe_psql('postgres',
+       "CREATE TABLE test_replica_id_full (x int)");
+$node_publisher->safe_psql('postgres',
+       "ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;");
+$node_subscriber->safe_psql('postgres',
+       "CREATE TABLE test_replica_id_full (x int)");
+$node_subscriber->safe_psql('postgres',
+       "CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x)");

+# ====================================================================
+# Testcase start: SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES
+#
+# This test ensures that after CREATE INDEX, the subscriber can automatically
+# use one of the indexes (provided that it fulfils the requirements).
+# Similarly, after DROP index, the subscriber can automatically switch to
+# sequential scan
+
+# create tables pub and sub
+$node_publisher->safe_psql('postgres',
+       "CREATE TABLE test_replica_id_full (x int NOT NULL, y int)");
+$node_publisher->safe_psql('postgres',
+       "ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;");
+$node_subscriber->safe_psql('postgres',
+       "CREATE TABLE test_replica_id_full (x int NOT NULL, y int)");

4) These additional blank lines can be removed to keep it consistent:
4.a)
+# Testcase end: SUBSCRIPTION DOES NOT USE PARTIAL INDEX
+# ====================================================================
+
+
+# ====================================================================
+# Testcase start: SUBSCRIPTION DOES NOT USE INDEXES WITH ONLY EXPRESSIONS

4.b)
+# Testcase end: Unique index that is not primary key or replica identity
+# ====================================================================
+
+
+
+# ====================================================================
+# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN

Regards,
Vignesh



Hi Hou zj, all



1.
+       usableIndexContext = AllocSetContextCreate(CurrentMemoryContext,
+                                                                                          "usableIndexContext",
+                                                                                          ALLOCSET_DEFAULT_SIZES);
+       oldctx = MemoryContextSwitchTo(usableIndexContext);
+
+       /* Get index list of the local relation */
+       indexlist = RelationGetIndexList(localrel);
+       Assert(indexlist != NIL);
+
+       foreach(lc, indexlist)

Is it necessary to create a memory context here ? I thought the memory will be
freed after this apply action soon.

Yeah, probably not useful anymore, removed.

In the earlier versions of this patch, this code block was relying on some
planner functions. In that case, it felt safer to use a memory context. Now,
it seems useless.  



2.

+                       /*
+                        * Furthermore, because primary key and unique key indexes can't
+                        * include expressions we also sanity check the index is neither
+                        * of those kinds.
+                        */
+                       Assert(!IdxIsRelationIdentityOrPK(rel, idxrel->rd_id));

It seems you mean "replica identity key" instead of "unique key" in the comments.

Right, I fixed this comment. Though, are you mentioning multiple comments? I couldn't
see any other in the patch. Let me know if you see.
 


3.
--- a/src/include/replication/logicalrelation.h
+++ b/src/include/replication/logicalrelation.h
...
+extern bool IsIndexOnlyOnExpression(IndexInfo *indexInfo);

The definition function seems better to be placed in execReplication.c

Hmm, why do you think so? IsIndexOnlyOnExpression() is used in
logical/relaton.c, and used for an assertion on execReplication.c

I think it is better suited for relaton.c, but let me know about
your perspective as well.
 

4.

+extern Oid GetRelationIdentityOrPK(Relation rel);

The function is only used in relation.c, so we can make it a static
function.


In the recent iteration of the patch (I think v27), we also use this
function in check_relation_updatable() in logical/worker.c.

One could argue that we can move the definition back to worker.c,
but it feels better suited for in relation.c, as the perimeter of the function
is a Rel, and the function is looking for a property of a relation.

Let me know if you think otherwise, I don't have strong opinions
on this.
 

5.

+       /*
+        * If index scans are disabled, use a sequential scan.
+        *
+        * Note that we do not use index scans below when enable_indexscan is
+        * false. Allowing primary key or replica identity even when index scan is
+        * disabled is the legacy behaviour. So we hesitate to move the below
+        * enable_indexscan check to be done earlier in this function.
+        */
+       if (!enable_indexscan)
+               return InvalidOid;

Since the document of enable_indexscan says "Enables or disables the query
planner's use of index-scan plan types. The default is on.", and we don't use
planner here, so I am not sure should we allow/disallow index scan in apply
worker based on this GUC.


Given Amit's suggestion on [1], I'm planning to drop this check altogether, and
rely on table storage parameters. 

(I'll incorporate these changes with a patch that I'm going to reply to Peter's e-mail).

Thanks,
Onder 

Hi Peter, all

======
Commit Message

1.
There is no smart mechanism to pick the index. Instead, we choose
the first index that fulfils the requirements mentioned above.

~

1a.
I think this paragraph should immediately follow the earlier one
("With this patch...") which talked about the index requirements.


makes sense
 

1b.
Slight rewording

SUGGESTION
If there is more than one index that satisfies these requirements, we
just pick the first one.


applied
 
======
doc/src/sgml/logical-replication.sgml

2.
A published table must have a “replica identity” configured in order
to be able to replicate UPDATE and DELETE operations, so that
appropriate rows to update or delete can be identified on the
subscriber side. By default, this is the primary key, if there is one.
Another unique index (with certain additional requirements) can also
be set to be the replica identity. When replica identity FULL is
specified, indexes can be used on the subscriber side for searching
the rows. These indexes should be btree, non-partial and have at least
one column reference (e.g., should not consist of only expressions).
These restrictions on the non-unique index properties are in essence
the same restrictions that are enforced for primary keys. Internally,
we follow the same approach for supporting index scans within logical
replication scope. If there are no such suitable indexes, the search
on the subscriber s ide can be very inefficient, therefore replica
identity FULL should only be used as a fallback if no other solution
is possible. If a replica identity other than “full” is set on the
publisher side, a replica identity comprising the same or fewer
columns must also be set on the subscriber side. See REPLICA IDENTITY
for details on how to set the replica identity. If a table without a
replica identity is added to a publication that replicates UPDATE or
DELETE operations then subsequent UPDATE or DELETE operations will
cause an error on the publisher. INSERT operations can proceed
regardless of any replica identity.

~

2a.
IMO the <quote>replica identity</quote> in the first sentence should
be changed to be <firstterm>replica identity</firstterm>

 

~

2b.
Typo: "subscriber s ide" --> "subscriber side"

fixed


2c.
There is still one remaining "full" in this text. I think ought to be
changed to <literal>FULL</literal> to match the others.


changed 
 
======
src/backend/executor/execReplication.c

3. IdxIsRelationIdentityOrPK

+/*
+ * Given a relation and OID of an index, returns true if the
+ * index is relation's primary key's index or relation's
+ * replica identity index.
+ *
+ * Returns false otherwise.
+ */
+bool
+IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
+{
+ Assert(OidIsValid(idxoid));
+
+ return RelationGetReplicaIndex(rel) == idxoid ||
+ RelationGetPrimaryKeyIndex(rel) == idxoid;
 }

~

Since the function name mentions RI (1st) and then PK (2nd), and since
the implementation also has the same order, I think the function
comment should use the same consistent order when describing what it
does.

alright, done
 

======
src/backend/replication/logical/relation.c

4. FindUsableIndexForReplicaIdentityFull

+/*
+ * Returns an index oid if there is an index that can be used
+ * via the apply worker. The index should be btree, non-partial
+ * and have at least one column reference (e.g., should
+ * not consist of only expressions). The limitations arise from
+ * RelationFindReplTupleByIndex(), which is designed to handle
+ * PK/RI and these limitations are inherent to PK/RI.
+ *
+ * There are no fundamental problems for supporting non-btree
+ * and/or partial indexes. We should mostly relax the limitations
+ * in RelationFindReplTupleByIndex().
+ *
+ * If no suitable index is found, returns InvalidOid.
+ *
+ * Note that this is not a generic function, it expects REPLICA
+ * IDENTITY FULL for the remote relation.
+ */

~

4a.
Minor rewording of 1st sentence.

BEFORE
Returns an index oid if there is an index that can be used via the apply worker.

SUGGESTION
Returns the oid of an index that can be used via the apply worker.


looks better, applied 
 

4b.
+ * There are no fundamental problems for supporting non-btree
+ * and/or partial indexes. We should mostly relax the limitations
+ * in RelationFindReplTupleByIndex().

I think this paragraph should come later in the comment (just before
the Note) and should also have "XXX" prefix to indicate it is some
implementation note for future versions.



done
 

5. GetRelationIdentityOrPK

+/*
+ * Get replica identity index or if it is not defined a primary key.
+ *
+ * If neither is defined, returns InvalidOid
+ */
+Oid
+GetRelationIdentityOrPK(Relation rel)
+{
+ Oid idxoid;
+
+ idxoid = RelationGetReplicaIndex(rel);
+
+ if (!OidIsValid(idxoid))
+ idxoid = RelationGetPrimaryKeyIndex(rel);
+
+ return idxoid;
+}

This is really very similar code to the other new function called
IdxIsRelationIdentityOrPK. I wondered if such similar functions could
be defined together.

Makes sense, moved them closer, also changed IdxIsRelationIdentityOrPK to rely on
GetRelationIdentityOrPK()
 

~~~

6. FindLogicalRepUsableIndex

+/*
+ * Returns an index oid if we can use an index for subscriber. If not,
+ * returns InvalidOid.
+ */

SUGGESTION
Returns the oid of an index that can be used by a subscriber.
Otherwise, returns InvalidOid.


applies.

Now attaching v28 of the patch, which includes the reviews from this mail
and [1].

Thanks,
Onder

Attachment

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"Hayato Kuroda (Fujitsu)"
Date:
Dear Önder,

Thanks for updating the patch!
I played with your patch and I confirmed that parallel apply worker and tablesync worker
could pick the index in typical case.

Followings are comments for v27-0001. Please ignore if it is fixed in newer one.

execReplication.c

```
+       /* Build scankey for every non-expression attribute in the index. */
```

I think that single line comments should not terminated by ".".

```
+       /* There should always be at least one attribute for the index scan. */
```

Same as above.


```
+#ifdef USE_ASSERT_CHECKING
+                       IndexInfo  *indexInfo = BuildIndexInfo(idxrel);
+
+                       Assert(!IsIndexOnlyOnExpression(indexInfo));
+#endif
```

I may misunderstand, but the condition of usable index has alteady been checked
when the oid was set but anyway the you confirmed the condition again before
really using that, right?
So is it OK not to check another assumption that the index is b-tree, non-partial,
and one column reference?

IIUC we can do that by adding new function like IsIndexUsableForReplicaIdentityFull()
that checks these condition, and then call at RelationFindReplTupleByIndex() if
idxIsRelationIdentityOrPK is false.

032_subscribe_use_index.pl

```
+# Testcase start: SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES
...
+# Testcase end: SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP INDEX
```

There is still non-consistent case.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Hi Amit, all
 
Few comments on 0001
====================
1.
+   such suitable indexes, the search on the subscriber s ide can be
very inefficient,

unnecessary space in 'side'

Fixed in v28
 

2.
-   identity.  If the table does not have any suitable key, then it can be set
-   to replica identity <quote>full</quote>, which means the entire row becomes
-   the key.  This, however, is very inefficient and should only be used as a
-   fallback if no other solution is possible.  If a replica identity other
+   identity. When replica identity <literal>FULL</literal> is specified,
+   indexes can be used on the subscriber side for searching the rows.

I think it is better to retain the first sentence (If the table does
not ... entire row becomes the key.) as that says what will be part of
the key.


right, that sentence looks useful, added back.
 
3.
-   comprising the same or fewer columns must also be set on the subscriber
-   side.  See <xref linkend="sql-altertable-replica-identity"/> for details on
-   how to set the replica identity.  If a table without a replica identity is
-   added to a publication that replicates <command>UPDATE</command>
+   comprising the same or fewer columns must also be set on the
subscriber side.
+   See <xref linkend="sql-altertable-replica-identity"/> for
+   details on how to set the replica identity.  If a table without a replica
+   identity is added to a publication that replicates <command>UPDATE</command>

I don't see any change in this except line length. If so, I don't
think we should change it as part of this patch.


Yes, fixed. But the first line (starting with See <xref ..) still shows as if it is changed,
probably because its line has changed.  I couldn't make that line show as it had not
changed.
 
4.
 /*
  * Setup a ScanKey for a search in the relation 'rel' for a tuple 'key' that
  * is setup to match 'rel' (*NOT* idxrel!).
  *
- * Returns whether any column contains NULLs.
+ * Returns how many columns should be used for the index scan.
+ *
+ * This is not generic routine, it expects the idxrel to be
+ * a btree, non-partial and have at least one column
+ * reference (e.g., should not consist of only expressions).
  *
- * This is not generic routine, it expects the idxrel to be replication
- * identity of a rel and meet all limitations associated with that.
+ * By definition, replication identity of a rel meets all
+ * limitations associated with that. Note that any other
+ * index could also meet these limitations.

The comment changes look quite asymmetric to me. Normally, we break
the line if the line length goes beyond 80 cols. Please check and
change other places in the patch if they have a similar symptom.

Went over the patch, and expanded each line ~80 chars.

I'm guessing 80 is not the hard limit, in some cases I went over 81-82.
 

5.
+ * There are no fundamental problems for supporting non-btree
+ * and/or partial indexes.

Can we mention partial indexes in the above comment? It seems to me
that because the required tuple may not satisfy the expression (in the
case of partial indexes) it may not be easy to support it.

Expanded the comment and explained the differences a little further.
 

6.
build_replindex_scan_key()
{
...
+ for (index_attoff = 0; index_attoff <
IndexRelationGetNumberOfKeyAttributes(idxrel);
+ index_attoff++)
...
...
+#ifdef USE_ASSERT_CHECKING
+ IndexInfo  *indexInfo = BuildIndexInfo(idxrel);
+
+ Assert(!IsIndexOnlyOnExpression(indexInfo));
+#endif
...
}

We can avoid building index info multiple times. This can be either
checked at the beginning of the function outside attribute offset loop
or we can probably cache it. I understand this is for assert builds
but seems easy to avoid it doing multiple times and it also looks odd
to do it multiple times for the same index.

Applied your suggestions. Although I do not have strong opinions, I think that
it was easier to follow with building the indexInfo for each iteration.
 

7.
- /* Build scankey for every attribute in the index. */
- for (attoff = 0; attoff <
IndexRelationGetNumberOfKeyAttributes(idxrel); attoff++)
+ /* Build scankey for every non-expression attribute in the index. */
+ for (index_attoff = 0; index_attoff <
IndexRelationGetNumberOfKeyAttributes(idxrel);
+ index_attoff++)
  {
  Oid operator;
  Oid opfamily;
+ Oid optype = get_opclass_input_type(opclass->values[index_attoff]);
  RegProcedure regop;
- int pkattno = attoff + 1;
- int mainattno = indkey->values[attoff];
- Oid optype = get_opclass_input_type(opclass->values[attoff]);
+ int table_attno = indkey->values[index_attoff];

I don't think here we need to change variable names if we retain
mainattno as it is instead of changing it to table_attno. The current
naming doesn't seem bad for the current usage of the patch.

Hmm, I'm actually not convinced that the variable naming on HEAD is good for
the current patch. The main difference is that now we allow indexes like:
    CREATE INDEX idx ON table(foo(col), col_2)

(See # Testcase start: SUBSCRIPTION CAN USE INDEXES WITH
EXPRESSIONS AND COLUMNS)

In such indexes, we could skip the attributes of the index. So, skey_attoff is not
equal to  index_attoff anymore. So, calling out this explicitly via the variable names
seems more robust to me. Plus, mainattno sounded vague to me when I first read
this function.

So, unless you have strong objections, I'm leaning towards having variable
names more explicit. I'm also open for suggestions if you think the names
I picked is not clear enough.  
 

8.
+ TypeCacheEntry **eq = NULL; /* only used when the index is not RI or PK */

Normally, we don't add such comments as the usage is quite obvious by
looking at the code.


Sure, I also don't see much value for it, removed.

Attached v29 for this review. Note that I'll be working on the disable index scan changes after
I reply to some of the other pending reviews.

Thanks,
Onder 
Attachment
On Fri, Mar 3, 2023 at 3:02 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>>
>> 7.
>> - /* Build scankey for every attribute in the index. */
>> - for (attoff = 0; attoff <
>> IndexRelationGetNumberOfKeyAttributes(idxrel); attoff++)
>> + /* Build scankey for every non-expression attribute in the index. */
>> + for (index_attoff = 0; index_attoff <
>> IndexRelationGetNumberOfKeyAttributes(idxrel);
>> + index_attoff++)
>>   {
>>   Oid operator;
>>   Oid opfamily;
>> + Oid optype = get_opclass_input_type(opclass->values[index_attoff]);
>>   RegProcedure regop;
>> - int pkattno = attoff + 1;
>> - int mainattno = indkey->values[attoff];
>> - Oid optype = get_opclass_input_type(opclass->values[attoff]);
>> + int table_attno = indkey->values[index_attoff];
>>
>> I don't think here we need to change variable names if we retain
>> mainattno as it is instead of changing it to table_attno. The current
>> naming doesn't seem bad for the current usage of the patch.
>
>
> Hmm, I'm actually not convinced that the variable naming on HEAD is good for
> the current patch. The main difference is that now we allow indexes like:
>     CREATE INDEX idx ON table(foo(col), col_2)
>
> (See # Testcase start: SUBSCRIPTION CAN USE INDEXES WITH
> EXPRESSIONS AND COLUMNS)
>
> In such indexes, we could skip the attributes of the index. So, skey_attoff is not
> equal to  index_attoff anymore. So, calling out this explicitly via the variable names
> seems more robust to me. Plus, mainattno sounded vague to me when I first read
> this function.
>

Yeah, I understand this part. By looking at the diff, it appeared to
me that this was an unnecessary change. Anyway, I see your point, so
if you want to keep the naming as you proposed at least don't change
the ordering for get_opclass_input_type() call because that looks odd
to me.
>
> Attached v29 for this review. Note that I'll be working on the disable index scan changes after
>

Okay, thanks!

--
With Regards,
Amit Kapila.



Hi Vignesh,

Thanks for the review


1) We are currently calling RelationGetIndexList twice, once in
FindUsableIndexForReplicaIdentityFull function and in the caller too,
we could avoid one of the calls by passing the indexlist to the
function or removing the check here, index list check can be handled
in FindUsableIndexForReplicaIdentityFull.
+       if (remoterel->replident == REPLICA_IDENTITY_FULL &&
+               RelationGetIndexList(localrel) != NIL)
+       {
+               /*
+                * If we had a primary key or relation identity with a
unique index,
+                * we would have already found and returned that oid.
At this point,
+                * the remote relation has replica identity full and
we have at least
+                * one local index defined.
+                *
+                * We are looking for one more opportunity for using
an index. If
+                * there are any indexes defined on the local
relation, try to pick
+                * a suitable index.
+                *
+                * The index selection safely assumes that all the
columns are going
+                * to be available for the index scan given that
remote relation has
+                * replica identity full.
+                */
+               return FindUsableIndexForReplicaIdentityFull(localrel);
+       }
+

makes sense, done
 

2) Copyright year should be mentioned as 2023
diff --git a/src/test/subscription/t/032_subscribe_use_index.pl
b/src/test/subscription/t/032_subscribe_use_index.pl
new file mode 100644
index 0000000000..db0a7ea2a0
--- /dev/null
+++ b/src/test/subscription/t/032_subscribe_use_index.pl
@@ -0,0 +1,861 @@
+# Copyright (c) 2021-2022, PostgreSQL Global Development Group
+
+# Test logical replication behavior with subscriber uses available index
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+

I changed it to #Copyright (c) 2022-2023, but I'm not sure if it should be only 2023 or 
like this.
 

3) Many of the tests are using the same tables, we need not
drop/create publication/subscription for each of the team, we could
just drop and create required indexes and verify the update/delete
statements.
+# ====================================================================
+# Testcase start: SUBSCRIPTION USES INDEX
+#
+# Basic test where the subscriber uses index
+# and only updates 1 row and deletes
+# 1 other row
+#
+
+# create tables pub and sub
+$node_publisher->safe_psql('postgres',
+       "CREATE TABLE test_replica_id_full (x int)");
+$node_publisher->safe_psql('postgres',
+       "ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;");
+$node_subscriber->safe_psql('postgres',
+       "CREATE TABLE test_replica_id_full (x int)");
+$node_subscriber->safe_psql('postgres',
+       "CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x)");

+# ====================================================================
+# Testcase start: SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES
+#
+# This test ensures that after CREATE INDEX, the subscriber can automatically
+# use one of the indexes (provided that it fulfils the requirements).
+# Similarly, after DROP index, the subscriber can automatically switch to
+# sequential scan
+
+# create tables pub and sub
+$node_publisher->safe_psql('postgres',
+       "CREATE TABLE test_replica_id_full (x int NOT NULL, y int)");
+$node_publisher->safe_psql('postgres',
+       "ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;");
+$node_subscriber->safe_psql('postgres',
+       "CREATE TABLE test_replica_id_full (x int NOT NULL, y int)");

Well, not all the tables are exactly the same, there are 4-5 different
tables. Mostly the table names are the same.

Plus, the overhead does not seem to be large enough to complicate
the test. Many of the src/test/subscription/t files follow this pattern. 

Do you have strong opinions on changing this? 


4) These additional blank lines can be removed to keep it consistent:
4.a)
+# Testcase end: SUBSCRIPTION DOES NOT USE PARTIAL INDEX
+# ====================================================================
+
+
+# ====================================================================
+# Testcase start: SUBSCRIPTION DOES NOT USE INDEXES WITH ONLY EXPRESSIONS

4.b)
+# Testcase end: Unique index that is not primary key or replica identity
+# ====================================================================
+
+
+
+# ====================================================================
+# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN
 
Thanks, fixed.

Attached v30

Attachment
Hi Hayato, Amit, all




```
+       /* Build scankey for every non-expression attribute in the index. */
```

I think that single line comments should not terminated by ".".

Hmm, looking into execReplication.c, many of the single line comments
terminated by ".".  Also, On the HEAD, the same comment has single
line comment. So, I'd rather stick to that?

 

```
+       /* There should always be at least one attribute for the index scan. */
```

Same as above.

Same as above :) 
 


```
+#ifdef USE_ASSERT_CHECKING
+                       IndexInfo  *indexInfo = BuildIndexInfo(idxrel);
+
+                       Assert(!IsIndexOnlyOnExpression(indexInfo));
+#endif
```

I may misunderstand, but the condition of usable index has alteady been checked
when the oid was set but anyway the you confirmed the condition again before
really using that, right?
So is it OK not to check another assumption that the index is b-tree, non-partial,
and one column reference?
IIUC we can do that by adding new function like IsIndexUsableForReplicaIdentityFull()
that checks these condition, and then call at RelationFindReplTupleByIndex() if
idxIsRelationIdentityOrPK is false.

I think adding a function like IsIndexUsableForReplicaIdentityFull is useful. I can use it inside
FindUsableIndexForReplicaIdentityFull() and assert here. Also good for readability.

So, I mainly moved this assert to a more generic place with a more generic check
to RelationFindReplTupleByIndex
 

032_subscribe_use_index.pl

```
+# Testcase start: SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES
...
+# Testcase end: SUBSCRIPTION RE-CALCULATES INDEX AFTER CREATE/DROP INDEX
```

There is still non-consistent case.


Fixed, thanks

Anyway, I see your point, so
if you want to keep the naming as you proposed at least don't change
the ordering for get_opclass_input_type() call because that looks odd
to me.

(A small comment from Amit's previous e-mail)

Sure, applied now.

Attaching v31.

Thanks for the reviews!
Onder KALACI
 
Attachment
On Fri, Mar 3, 2023 at 1:09 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>>
>> 5.
>>
>> +       /*
>> +        * If index scans are disabled, use a sequential scan.
>> +        *
>> +        * Note that we do not use index scans below when enable_indexscan is
>> +        * false. Allowing primary key or replica identity even when index scan is
>> +        * disabled is the legacy behaviour. So we hesitate to move the below
>> +        * enable_indexscan check to be done earlier in this function.
>> +        */
>> +       if (!enable_indexscan)
>> +               return InvalidOid;
>>
>> Since the document of enable_indexscan says "Enables or disables the query
>> planner's use of index-scan plan types. The default is on.", and we don't use
>> planner here, so I am not sure should we allow/disallow index scan in apply
>> worker based on this GUC.
>>
>
> Given Amit's suggestion on [1], I'm planning to drop this check altogether, and
> rely on table storage parameters.
>

This still seems to be present in the latest version. I think we can
just remove this and then add the additional check as suggested by you
as part of the second patch.

Few other comments on latest version:
==============================
1.
+/*
+ * Returns true if the index is usable for replica identity full. For details,
+ * see FindUsableIndexForReplicaIdentityFull.
+ */
+bool
+IsIndexUsableForReplicaIdentityFull(IndexInfo *indexInfo)
+{
+ bool is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
+ bool is_partial = (indexInfo->ii_Predicate != NIL);
+ bool is_only_on_expression = IsIndexOnlyOnExpression(indexInfo);
+
+ if (is_btree && !is_partial && !is_only_on_expression)
+ {
+ return true;
...
...
+/*
+ * Returns the oid of an index that can be used via the apply worker. The index
+ * should be btree, non-partial and have at least one column reference (e.g.,
+ * should not consist of only expressions). The limitations arise from
+ * RelationFindReplTupleByIndex(), which is designed to handle PK/RI and these
+ * limitations are inherent to PK/RI.

By these two, the patch infers that it picks an index that adheres to
the limitations of PK/RI. Apart from unique, the other properties of
RI are "not partial, not deferrable, and include only columns marked
NOT NULL". See ATExecReplicaIdentity() for corresponding checks. We
don't try to ensure the last two from the list. It is fine to do so if
we document the reasons for the same in comments or we can even try to
enforce the remaining restrictions as well. For example, it should be
okay to allow NULL column values because we anyway compare the entire
tuple after getting the value from the index.

2.
+ {
+ /*
+ * This attribute is an expression, and
+ * FindUsableIndexForReplicaIdentityFull() was called earlier
+ * when the index for subscriber was selected. There, the indexes
+ * comprising *only* expressions have already been eliminated.
+ *
+ * Also, because PK/RI can't include expressions we
+ * sanity check the index is neither of those kinds.
+ */
+ Assert(!IdxIsRelationIdentityOrPK(rel, idxrel->rd_id));

This comment doesn't make much sense after you have moved the
corresponding Assert in RelationFindReplTupleByIndex(). Either we
should move or remove this Assert as well or at least update the
comments to reflect the latest code.

3. When FindLogicalRepUsableIndex() is invoked from
logicalrep_partition_open(), the current memory context would be
LogicalRepPartMapContext which would be a long-lived context and we
allocate memory for indexes in FindLogicalRepUsableIndex() which can
accumulate over a period of time. So, I think it would be better to
switch to the old context in logicalrep_partition_open() before
invoking FindLogicalRepUsableIndex() provided that is not a long-lived
context.

--
With Regards,
Amit Kapila.



Here are some review comments for v28-0001.

======
doc/src/sgml/logical-replication.sgml

1.
A published table must have a replica identity configured in order to
be able to replicate UPDATE and DELETE operations, so that appropriate
rows to update or delete can be identified on the subscriber side. By
default, this is the primary key, if there is one. Another unique
index (with certain additional requirements) can also be set to be the
replica identity. When replica identity FULL is specified, indexes can
be used on the subscriber side for searching the rows. These indexes
should be btree, non-partial and have at least one column reference
(e.g., should not consist of only expressions). These restrictions on
the non-unique index properties are in essence the same restrictions
that are enforced for primary keys. Internally, we follow the same
approach for supporting index scans within logical replication scope.
If there are no such suitable indexes, the search on the subscriber
side can be very inefficient, therefore replica identity FULL should
only be used as a fallback if no other solution is possible. If a
replica identity other than full is set on the publisher side, a
replica identity comprising the same or fewer columns must also be set
on the subscriber side. See REPLICA IDENTITY for details on how to set
the replica identity. If a table without a replica identity is added
to a publication that replicates UPDATE or DELETE operations then
subsequent UPDATE or DELETE operations will cause an error on the
publisher. INSERT operations can proceed regardless of any replica
identity.

~

1a.
Changes include:
"should" --> "must"
"e.g." --> "i.e."

BEFORE
These indexes should be btree, non-partial and have at least one
column reference (e.g., should not consist of only expressions).

SUGGESTION
Candidate indexes must be btree, non-partial, and have at least one
column reference (i.e., cannot consist of only expressions).

~

1b.
The fix for my v27 review comment #2b (changing "full" to FULL) was
not made correctly. It should be uppercase FULL, not full:
"other than full" --> "other than FULL"

======
src/backend/executor/execReplication.c

2.
 /*
  * Setup a ScanKey for a search in the relation 'rel' for a tuple 'key' that
  * is setup to match 'rel' (*NOT* idxrel!).
  *
- * Returns whether any column contains NULLs.
+ * Returns how many columns should be used for the index scan.
+ *
+ * This is not generic routine, it expects the idxrel to be
+ * a btree, non-partial and have at least one column
+ * reference (e.g., should not consist of only expressions).
  *
- * This is not generic routine, it expects the idxrel to be replication
- * identity of a rel and meet all limitations associated with that.
+ * By definition, replication identity of a rel meets all
+ * limitations associated with that. Note that any other
+ * index could also meet these limitations.
  */
-static bool
+static int
 build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
  TupleTableSlot *searchslot)

~

"(e.g., should not consist of only expressions)" --> "(i.e., cannot
consist of only expressions)"

======
src/backend/replication/logical/relation.c

3. FindUsableIndexForReplicaIdentityFull

+/*
+ * Returns the oid of an index that can be used via the apply
+ * worker. The index should be btree, non-partial and have at
+ * least one column reference (e.g., should not consist of
+ * only expressions). The limitations arise from
+ * RelationFindReplTupleByIndex(), which is designed to handle
+ * PK/RI and these limitations are inherent to PK/RI.

The 2nd sentence of this comment should match the same changes in the
Commit message --- "must not" instead of "should not", "i.e." instead
of "e.g.", etc. See the review comment #1a above.

~~~

4. IdxIsRelationIdentityOrPK

+/*
+ * Given a relation and OID of an index, returns true if the
+ * index is relation's replica identity index or relation's
+ * primary key's index.
+ *
+ * Returns false otherwise.
+ */
+bool
+IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
+{
+ Assert(OidIsValid(idxoid));
+
+ return GetRelationIdentityOrPK(rel) == idxoid;
+}

I think you've "simplified" this function in v28 but AFAICT now it has
a different logic to v27.

PREVIOUSLY it was coded like
+ return RelationGetReplicaIndex(rel) == idxoid ||
+ RelationGetPrimaryKeyIndex(rel) == idxoid;

You can see if 'idxoid' did NOT match RI but if it DID match PK
previously it would return true. But now in that scenario, it won't
even check the PK if there was a valid RI. So it might return false
when previously it returned true. Is it deliberate?

======
.../subscription/t/032_subscribe_use_index.pl

5.
+# Testcase start: SUBSCRIPTION USES INDEX WITH PUB/SUB different data
+#
+# The subscriber has duplicate tuples that publisher does not have.
+# When publsher updates/deletes 1 row, subscriber uses indexes and
+# exactly updates/deletes 1 row.

"and exactly updates/deletes 1 row." --> "and updates/deletes exactly 1 row."


------
Kind Regards,
Peter Smith.
Fujitsu Australia



On Mon, Mar 6, 2023 at 10:12 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> 4. IdxIsRelationIdentityOrPK
>
> +/*
> + * Given a relation and OID of an index, returns true if the
> + * index is relation's replica identity index or relation's
> + * primary key's index.
> + *
> + * Returns false otherwise.
> + */
> +bool
> +IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
> +{
> + Assert(OidIsValid(idxoid));
> +
> + return GetRelationIdentityOrPK(rel) == idxoid;
> +}
>
> I think you've "simplified" this function in v28 but AFAICT now it has
> a different logic to v27.
>
> PREVIOUSLY it was coded like
> + return RelationGetReplicaIndex(rel) == idxoid ||
> + RelationGetPrimaryKeyIndex(rel) == idxoid;
>
> You can see if 'idxoid' did NOT match RI but if it DID match PK
> previously it would return true. But now in that scenario, it won't
> even check the PK if there was a valid RI. So it might return false
> when previously it returned true. Is it deliberate?
>

I don't see any problem with this because by default PK will be a
replica identity. So only if the user specifies the replica identity
full or changes the replica identity to some other index, we will try
to get PK which seems valid for this case. Am, I missing something
which makes this code do something bad?

Few other comments on latest code:
============================
1.
   <para>
-   A published table must have a <quote>replica identity</quote> configured in
+   A published table must have a <firstterm>replica
identity</firstterm> configured in

How the above change is related to this patch?

2.
    certain additional requirements) can also be set to be the replica
-   identity.  If the table does not have any suitable key, then it can be set
+   identity. If the table does not have any suitable key, then it can be set

I think we should change the spacing of existing docs (two spaces
after fullstop to one space) and that too inconsistently. I suggest to
add new changes with same spacing as existing doc. If you are adding
entirely new section then we can consider differently.

3.
    to replica identity <quote>full</quote>, which means the entire row becomes
-   the key.  This, however, is very inefficient and should only be used as a
-   fallback if no other solution is possible.  If a replica identity other
-   than <quote>full</quote> is set on the publisher side, a replica identity
-   comprising the same or fewer columns must also be set on the subscriber
-   side.  See <xref linkend="sql-altertable-replica-identity"/> for details on
+   the key. When replica identity <literal>FULL</literal> is specified,
+   indexes can be used on the subscriber side for searching the rows. These

Shouldn't specifying <literal>FULL</literal> be consistent wih existing docs?

--
With Regards,
Amit Kapila.



On Mon, Mar 6, 2023 at 5:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Mar 6, 2023 at 10:12 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > 4. IdxIsRelationIdentityOrPK
> >
> > +/*
> > + * Given a relation and OID of an index, returns true if the
> > + * index is relation's replica identity index or relation's
> > + * primary key's index.
> > + *
> > + * Returns false otherwise.
> > + */
> > +bool
> > +IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
> > +{
> > + Assert(OidIsValid(idxoid));
> > +
> > + return GetRelationIdentityOrPK(rel) == idxoid;
> > +}
> >
> > I think you've "simplified" this function in v28 but AFAICT now it has
> > a different logic to v27.
> >
> > PREVIOUSLY it was coded like
> > + return RelationGetReplicaIndex(rel) == idxoid ||
> > + RelationGetPrimaryKeyIndex(rel) == idxoid;
> >
> > You can see if 'idxoid' did NOT match RI but if it DID match PK
> > previously it would return true. But now in that scenario, it won't
> > even check the PK if there was a valid RI. So it might return false
> > when previously it returned true. Is it deliberate?
> >
>
> I don't see any problem with this because by default PK will be a
> replica identity. So only if the user specifies the replica identity
> full or changes the replica identity to some other index, we will try
> to get PK which seems valid for this case. Am, I missing something
> which makes this code do something bad?

I don't know if there is anything bad; the point was that the function
now seems to require a deeper understanding of the interrelationship
of RelationGetReplicaIndex and RelationGetPrimaryKeyIndex, which is
something the previous implementation did not require.

>
> Few other comments on latest code:
> ============================
> 1.
>    <para>
> -   A published table must have a <quote>replica identity</quote> configured in
> +   A published table must have a <firstterm>replica
> identity</firstterm> configured in
>
> How the above change is related to this patch?

That comes from a previous suggestion of mine. Strictly speaking, it
is unrelated to this patch. But since we are modifying this paragraph
in a major way anyhow it seemed harmless to just fix this in passing
too. OTOH I could make another patch for this but it seemed like
unnecessary extra work.

>
> 2.
>     certain additional requirements) can also be set to be the replica
> -   identity.  If the table does not have any suitable key, then it can be set
> +   identity. If the table does not have any suitable key, then it can be set
>
> I think we should change the spacing of existing docs (two spaces
> after fullstop to one space) and that too inconsistently. I suggest to
> add new changes with same spacing as existing doc. If you are adding
> entirely new section then we can consider differently.
>
> 3.
>     to replica identity <quote>full</quote>, which means the entire row becomes
> -   the key.  This, however, is very inefficient and should only be used as a
> -   fallback if no other solution is possible.  If a replica identity other
> -   than <quote>full</quote> is set on the publisher side, a replica identity
> -   comprising the same or fewer columns must also be set on the subscriber
> -   side.  See <xref linkend="sql-altertable-replica-identity"/> for details on
> +   the key. When replica identity <literal>FULL</literal> is specified,
> +   indexes can be used on the subscriber side for searching the rows. These
>
> Shouldn't specifying <literal>FULL</literal> be consistent wih existing docs?
>

That comes from a previous suggestion of mine too. The RI is specified
as FULL, not "full".
See https://www.postgresql.org/docs/current/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY

Sure, there was an existing <quote>full</quote> in this paragraph so
strictly speaking the RI patch could follow that style. But IMO that
style was wrong so all we are doing compounding the mistake instead of
just fixing everything in passing. OTOH, I could make a separate patch
to fix "full" to FULL, but again that seemed like unnecessary extra
work.

~

Anyhow, if you feel those firstterm and FULL changes ought to be kept
separate from this RI patch, please let me know and I will propose
those changes in a new thread,

------
Kind Regards,
Peter Smith.
Fujitsu Australia



On Mon, Mar 6, 2023 at 1:40 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Mon, Mar 6, 2023 at 5:44 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Mon, Mar 6, 2023 at 10:12 AM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > 4. IdxIsRelationIdentityOrPK
> > >
> > > +/*
> > > + * Given a relation and OID of an index, returns true if the
> > > + * index is relation's replica identity index or relation's
> > > + * primary key's index.
> > > + *
> > > + * Returns false otherwise.
> > > + */
> > > +bool
> > > +IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
> > > +{
> > > + Assert(OidIsValid(idxoid));
> > > +
> > > + return GetRelationIdentityOrPK(rel) == idxoid;
> > > +}
> > >
> > > I think you've "simplified" this function in v28 but AFAICT now it has
> > > a different logic to v27.
> > >
> > > PREVIOUSLY it was coded like
> > > + return RelationGetReplicaIndex(rel) == idxoid ||
> > > + RelationGetPrimaryKeyIndex(rel) == idxoid;
> > >
> > > You can see if 'idxoid' did NOT match RI but if it DID match PK
> > > previously it would return true. But now in that scenario, it won't
> > > even check the PK if there was a valid RI. So it might return false
> > > when previously it returned true. Is it deliberate?
> > >
> >
> > I don't see any problem with this because by default PK will be a
> > replica identity. So only if the user specifies the replica identity
> > full or changes the replica identity to some other index, we will try
> > to get PK which seems valid for this case. Am, I missing something
> > which makes this code do something bad?
>
> I don't know if there is anything bad; the point was that the function
> now seems to require a deeper understanding of the interrelationship
> of RelationGetReplicaIndex and RelationGetPrimaryKeyIndex, which is
> something the previous implementation did not require.
>

But the same understanding is required for the existing function
GetRelationIdentityOrPK(), so I feel it is better to be consistent
unless we see some problem here.

>
> Anyhow, if you feel those firstterm and FULL changes ought to be kept
> separate from this RI patch, please let me know and I will propose
> those changes in a new thread,
>

Personally, I would prefer to keep those separate. So, feel free to
propose them in a new thread.

--
With Regards,
Amit Kapila.



Hi Amit, all


>
> Given Amit's suggestion on [1], I'm planning to drop this check altogether, and
> rely on table storage parameters.
>

This still seems to be present in the latest version. I think we can
just remove this and then add the additional check as suggested by you
as part of the second patch.


Now attaching the second patch as well, which implements a new storage parameter as
you suggested earlier.

I'm open for naming suggestions, I wanted to make the name explicit, so it is a little long.

I'm also not very familiar with the sgml format. I mostly followed the existing docs and 
built the docs for inspection, but it would be good to have a look into that part
a little bit further in case there I missed something important etc.

 
Few other comments on latest version:
==============================
1.
+/*
+ * Returns true if the index is usable for replica identity full. For details,
+ * see FindUsableIndexForReplicaIdentityFull.
+ */
+bool
+IsIndexUsableForReplicaIdentityFull(IndexInfo *indexInfo)
+{
+ bool is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
+ bool is_partial = (indexInfo->ii_Predicate != NIL);
+ bool is_only_on_expression = IsIndexOnlyOnExpression(indexInfo);
+
+ if (is_btree && !is_partial && !is_only_on_expression)
+ {
+ return true;
...
...
+/*
+ * Returns the oid of an index that can be used via the apply worker. The index
+ * should be btree, non-partial and have at least one column reference (e.g.,
+ * should not consist of only expressions). The limitations arise from
+ * RelationFindReplTupleByIndex(), which is designed to handle PK/RI and these
+ * limitations are inherent to PK/RI.

By these two, the patch infers that it picks an index that adheres to
the limitations of PK/RI. Apart from unique, the other properties of
RI are "not partial, not deferrable, and include only columns marked
NOT NULL". See ATExecReplicaIdentity() for corresponding checks. We
don't try to ensure the last two from the list. It is fine to do so if
we document the reasons for the same in comments or we can even try to
enforce the remaining restrictions as well. For example, it should be
okay to allow NULL column values because we anyway compare the entire
tuple after getting the value from the index.

I think this is a documentation issue of this patch. I improved the wording a bit
more. Does that look better? 

I also went over the code / docs to see if we have
any other such documentation issues, I also updated logical-replication.sgml.

I'd prefer to support NULL values as there is no harm in doing that and it is
pretty useful feature (we also have tests covering it).

To my knowledge, I don't see any problems with deferrable as we are only
interested in the indexes, not with the constraint. Still, if you see any, I can
add the check for that. (Here, the user could still have unique index that
is associated with a constraint, but still, I don't see any edge cases
regarding deferrable constraints).



2.
+ {
+ /*
+ * This attribute is an expression, and
+ * FindUsableIndexForReplicaIdentityFull() was called earlier
+ * when the index for subscriber was selected. There, the indexes
+ * comprising *only* expressions have already been eliminated.
+ *
+ * Also, because PK/RI can't include expressions we
+ * sanity check the index is neither of those kinds.
+ */
+ Assert(!IdxIsRelationIdentityOrPK(rel, idxrel->rd_id));

This comment doesn't make much sense after you have moved the
corresponding Assert in RelationFindReplTupleByIndex(). Either we
should move or remove this Assert as well or at least update the
comments to reflect the latest code.

I think removing that Assert is fine after having a more generic
Assert in RelationFindReplTupleByIndex().

I mostly left that comment so that the meaning of
AttributeNumberIsValid() is easier for readers to follow. But, now
I'm also leaning towards removing the comment and Assert. 
 

3. When FindLogicalRepUsableIndex() is invoked from
logicalrep_partition_open(), the current memory context would be
LogicalRepPartMapContext which would be a long-lived context and we
allocate memory for indexes in FindLogicalRepUsableIndex() which can
accumulate over a period of time. So, I think it would be better to
switch to the old context in logicalrep_partition_open() before
invoking FindLogicalRepUsableIndex() provided that is not a long-lived
context.



Hmm, makes sense, that'd avoid any potential leaks that this patch
might bring. Applied your suggestion. Also, looking at the same function
call in logicalrep_rel_open(), that already seems safe regarding leaks. Do
you see any problems with that?



Attached v32. I'll continue replying to the e-mails on this thread with different
patches. I'm assuming this is easier for you to review such that we have different
patches for each review. If not, please let me know, I can reply to all mails
at once.

Thanks,
Onder KALACI

 
Attachment
On Fri, Mar 3, 2023 at 6:40 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Hi Vignesh,
>
> Thanks for the review
>
>>
>> 1) We are currently calling RelationGetIndexList twice, once in
>> FindUsableIndexForReplicaIdentityFull function and in the caller too,
>> we could avoid one of the calls by passing the indexlist to the
>> function or removing the check here, index list check can be handled
>> in FindUsableIndexForReplicaIdentityFull.
>> +       if (remoterel->replident == REPLICA_IDENTITY_FULL &&
>> +               RelationGetIndexList(localrel) != NIL)
>> +       {
>> +               /*
>> +                * If we had a primary key or relation identity with a
>> unique index,
>> +                * we would have already found and returned that oid.
>> At this point,
>> +                * the remote relation has replica identity full and
>> we have at least
>> +                * one local index defined.
>> +                *
>> +                * We are looking for one more opportunity for using
>> an index. If
>> +                * there are any indexes defined on the local
>> relation, try to pick
>> +                * a suitable index.
>> +                *
>> +                * The index selection safely assumes that all the
>> columns are going
>> +                * to be available for the index scan given that
>> remote relation has
>> +                * replica identity full.
>> +                */
>> +               return FindUsableIndexForReplicaIdentityFull(localrel);
>> +       }
>> +
>
> makes sense, done
>

Today, I was looking at this comment and the fix for it. It seems to
me that it would be better to not add the check (indexlist != NIL)
here and rather get the indexlist in
FindUsableIndexForReplicaIdentityFull(). It will anyway return
InvalidOid, if there is no index and that way code will look a bit
cleaner.

--
With Regards,
Amit Kapila.



On Mon, Mar 6, 2023 at 2:38 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
I was going through the thread and patch,  I noticed that in the
initial version, we were depending upon the planner to let it decide
whether index scan is cheaper or not and which index to pick.  But in
the latest patch if a useful index exists then we chose that without
comparing the cost of whether it is cheaper than sequential scan or
not.  Is my understanding correct?  What is the reason for the same,
one reason I could see while looking into the thread is that we can
not just decide once whether the index scan is cheaper or not because
that decision could change in the future but isn't that better than
never checking whether index scan is cheaper or not.  Because in some
cases where column selectivity is high like 80-90% then the index can
be very costly due to random page fetches.  So I think we could easily
produce regression in some cases, have we tested those cases?

Let me if I am missing something.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



On Mon, Mar 6, 2023 at 4:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Mar 6, 2023 at 2:38 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> >
> I was going through the thread and patch,  I noticed that in the
> initial version, we were depending upon the planner to let it decide
> whether index scan is cheaper or not and which index to pick.  But in
> the latest patch if a useful index exists then we chose that without
> comparing the cost of whether it is cheaper than sequential scan or
> not.  Is my understanding correct?  What is the reason for the same,
>

Yes, your understanding is correct. The main reason is that we don't
have an agreement on using the internal planner APIs for apply. That
will be a long-term maintenance burden. See discussion around email
[1]. So, we decided to use the current infrastructure to achieve index
scans during apply when publisher has replica identity full. This will
still be win in many cases and we are planning to provide a knob to
disable this feature.

[1] - https://www.postgresql.org/message-id/3466340.1673117404%40sss.pgh.pa.us

--
With Regards,
Amit Kapila.



Hi Amit, all



>
> I think you've "simplified" this function in v28 but AFAICT now it has
> a different logic to v27.
>
> PREVIOUSLY it was coded like
> + return RelationGetReplicaIndex(rel) == idxoid ||
> + RelationGetPrimaryKeyIndex(rel) == idxoid;
>
> You can see if 'idxoid' did NOT match RI but if it DID match PK
> previously it would return true. But now in that scenario, it won't
> even check the PK if there was a valid RI. So it might return false
> when previously it returned true. Is it deliberate?
>

I don't see any problem with this because by default PK will be a
replica identity. So only if the user specifies the replica identity
full or changes the replica identity to some other index, we will try
to get PK which seems valid for this case. Am, I missing something
which makes this code do something bad?

I also re-investigated the code, and I also don't see any issues with that.

See my comment to Peter's original review on this.
 

Few other comments on latest code:
============================
1.
   <para>
-   A published table must have a <quote>replica identity</quote> configured in
+   A published table must have a <firstterm>replica
identity</firstterm> configured in

How the above change is related to this patch?

As you suggest, I'm undoing this change.
 

2.
    certain additional requirements) can also be set to be the replica
-   identity.  If the table does not have any suitable key, then it can be set
+   identity. If the table does not have any suitable key, then it can be set

I think we should change the spacing of existing docs (two spaces
after fullstop to one space) and that too inconsistently. I suggest to
add new changes with same spacing as existing doc. If you are adding
entirely new section then we can consider differently.

Alright, so changed all this section to two spaces after fullstop.
 


3.
    to replica identity <quote>full</quote>, which means the entire row becomes
-   the key.  This, however, is very inefficient and should only be used as a
-   fallback if no other solution is possible.  If a replica identity other
-   than <quote>full</quote> is set on the publisher side, a replica identity
-   comprising the same or fewer columns must also be set on the subscriber
-   side.  See <xref linkend="sql-altertable-replica-identity"/> for details on
+   the key. When replica identity <literal>FULL</literal> is specified,
+   indexes can be used on the subscriber side for searching the rows. These

Shouldn't specifying <literal>FULL</literal> be consistent wih existing docs?


 
Considering the discussion below, I'm switching all back to <quote>full</quote>. Let's
be consistent with the existing code. Peter already suggested to improve that with a follow-up
patch. If that lands in, I can reflect the changes on this patch as well. 

Given the changes are small, I'll incorporate the changes with v33 in my next e-mail.

Thanks,
Onder 
Hi Peter, all
 

1.
A published table must have a replica identity configured in order to
be able to replicate UPDATE and DELETE operations, so that appropriate
rows to update or delete can be identified on the subscriber side. By
default, this is the primary key, if there is one. Another unique
index (with certain additional requirements) can also be set to be the
replica identity. When replica identity FULL is specified, indexes can
be used on the subscriber side for searching the rows. These indexes
should be btree, non-partial and have at least one column reference
(e.g., should not consist of only expressions). These restrictions on
the non-unique index properties are in essence the same restrictions
that are enforced for primary keys. Internally, we follow the same
approach for supporting index scans within logical replication scope.
If there are no such suitable indexes, the search on the subscriber
side can be very inefficient, therefore replica identity FULL should
only be used as a fallback if no other solution is possible. If a
replica identity other than full is set on the publisher side, a
replica identity comprising the same or fewer columns must also be set
on the subscriber side. See REPLICA IDENTITY for details on how to set
the replica identity. If a table without a replica identity is added
to a publication that replicates UPDATE or DELETE operations then
subsequent UPDATE or DELETE operations will cause an error on the
publisher. INSERT operations can proceed regardless of any replica
identity.

~

1a.
Changes include:
"should" --> "must"
"e.g." --> "i.e."


makes sense
 
BEFORE
These indexes should be btree, non-partial and have at least one
column reference (e.g., should not consist of only expressions).

SUGGESTION
Candidate indexes must be btree, non-partial, and have at least one
column reference (i.e., cannot consist of only expressions).

~

1b.
The fix for my v27 review comment #2b (changing "full" to FULL) was
not made correctly. It should be uppercase FULL, not full:
"other than full" --> "other than FULL"

Right, changed that.
 

======
src/backend/executor/execReplication.c

2.
 /*
  * Setup a ScanKey for a search in the relation 'rel' for a tuple 'key' that
  * is setup to match 'rel' (*NOT* idxrel!).
  *
- * Returns whether any column contains NULLs.
+ * Returns how many columns should be used for the index scan.
+ *
+ * This is not generic routine, it expects the idxrel to be
+ * a btree, non-partial and have at least one column
+ * reference (e.g., should not consist of only expressions).
  *
- * This is not generic routine, it expects the idxrel to be replication
- * identity of a rel and meet all limitations associated with that.
+ * By definition, replication identity of a rel meets all
+ * limitations associated with that. Note that any other
+ * index could also meet these limitations.
  */
-static bool
+static int
 build_replindex_scan_key(ScanKey skey, Relation rel, Relation idxrel,
  TupleTableSlot *searchslot)

~

"(e.g., should not consist of only expressions)" --> "(i.e., cannot
consist of only expressions)"


fixed
 
======
src/backend/replication/logical/relation.c

3. FindUsableIndexForReplicaIdentityFull

+/*
+ * Returns the oid of an index that can be used via the apply
+ * worker. The index should be btree, non-partial and have at
+ * least one column reference (e.g., should not consist of
+ * only expressions). The limitations arise from
+ * RelationFindReplTupleByIndex(), which is designed to handle
+ * PK/RI and these limitations are inherent to PK/RI.

The 2nd sentence of this comment should match the same changes in the
Commit message --- "must not" instead of "should not", "i.e." instead
of "e.g.", etc. See the review comment #1a above.


Isn't "cannot" better than "must not" ? You also seem to suggest "cannot"
just above.

I changed it to "cannot" in all places.

 
~~~

4. IdxIsRelationIdentityOrPK

+/*
+ * Given a relation and OID of an index, returns true if the
+ * index is relation's replica identity index or relation's
+ * primary key's index.
+ *
+ * Returns false otherwise.
+ */
+bool
+IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
+{
+ Assert(OidIsValid(idxoid));
+
+ return GetRelationIdentityOrPK(rel) == idxoid;
+}

I think you've "simplified" this function in v28 but AFAICT now it has
a different logic to v27.

PREVIOUSLY it was coded like
+ return RelationGetReplicaIndex(rel) == idxoid ||
+ RelationGetPrimaryKeyIndex(rel) == idxoid;

But now in that scenario, it won't
even check the PK if there was a valid RI. So it might return false
when previously it returned true. Is it deliberate?


Thanks for detailed review/investigation on this. But, I also agree that
there is no difference in terms of correctness. Also, it is probably better
to be consistent with the existing code. So, making IdxIsRelationIdentityOrPK()
relying on GetRelationIdentityOrPK() still sounds better to me.

You can see if 'idxoid' did NOT match RI but if it DID match PK
previously it would return true. 

Still, I cannot see how this check would yield a different result with how
RI/PK works -- as Amit also noted in the next e-mail.

Do you see any cases where this check would produce a different result?
I cannot, but wanted to double check with you.

 
======
.../subscription/t/032_subscribe_use_index.pl

5.
+# Testcase start: SUBSCRIPTION USES INDEX WITH PUB/SUB different data
+#
+# The subscriber has duplicate tuples that publisher does not have.
+# When publsher updates/deletes 1 row, subscriber uses indexes and
+# exactly updates/deletes 1 row.

"and exactly updates/deletes 1 row." --> "and updates/deletes exactly 1 row."


Fixed.

Given the changes are small, I'll incorporate the changes with v33 in my next e-mail.
  
Thanks,
Onder KALACI
Hi Amit, all


Amit Kapila <amit.kapila16@gmail.com>, 6 Mar 2023 Pzt, 12:40 tarihinde şunu yazdı:
On Fri, Mar 3, 2023 at 6:40 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Hi Vignesh,
>
> Thanks for the review
>
>>
>> 1) We are currently calling RelationGetIndexList twice, once in
>> FindUsableIndexForReplicaIdentityFull function and in the caller too,
>> we could avoid one of the calls by passing the indexlist to the
>> function or removing the check here, index list check can be handled
>> in FindUsableIndexForReplicaIdentityFull.
>> +       if (remoterel->replident == REPLICA_IDENTITY_FULL &&
>> +               RelationGetIndexList(localrel) != NIL)
>> +       {
>> +               /*
>> +                * If we had a primary key or relation identity with a
>> unique index,
>> +                * we would have already found and returned that oid.
>> At this point,
>> +                * the remote relation has replica identity full and
>> we have at least
>> +                * one local index defined.
>> +                *
>> +                * We are looking for one more opportunity for using
>> an index. If
>> +                * there are any indexes defined on the local
>> relation, try to pick
>> +                * a suitable index.
>> +                *
>> +                * The index selection safely assumes that all the
>> columns are going
>> +                * to be available for the index scan given that
>> remote relation has
>> +                * replica identity full.
>> +                */
>> +               return FindUsableIndexForReplicaIdentityFull(localrel);
>> +       }
>> +
>
> makes sense, done
>

Today, I was looking at this comment and the fix for it. It seems to
me that it would be better to not add the check (indexlist != NIL)
here and rather get the indexlist in
FindUsableIndexForReplicaIdentityFull(). It will anyway return
InvalidOid, if there is no index and that way code will look a bit
cleaner.


Yeah, seems easier to follow to me as well. Reflected it in the comment as well.  


Thanks,
Onder
 
Attachment
On Thu, Mar 2, 2023 at 2:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Mar 2, 2023 at 1:37 PM shiy.fnst@fujitsu.com
> <shiy.fnst@fujitsu.com> wrote:
> > - results of `gprof`
> > case1:
> > master
> >   %   cumulative   self              self     total
> >  time   seconds   seconds    calls  ms/call  ms/call  name
> >   1.37      0.66     0.01   654312     0.00     0.00  LWLockAttemptLock
> >   0.00      0.73     0.00   573358     0.00     0.00  LockBuffer
> >   0.00      0.73     0.00    10014     0.00     0.06  heap_getnextslot
> >
> > patched
> >   %   cumulative   self              self     total
> >  time   seconds   seconds    calls  ms/call  ms/call  name
> >   9.70      1.27     0.36 50531459     0.00     0.00  LWLockAttemptLock
> >   3.23      2.42     0.12 100259200     0.00     0.00  LockBuffer
> >   6.20      1.50     0.23 50015101     0.00     0.00  heapam_index_fetch_tuple
> >   4.04      2.02     0.15 50015101     0.00     0.00  index_fetch_heap
> >   1.35      3.21     0.05    10119     0.00     0.00  index_getnext_slot
> >
>
> In the above profile number of calls to index_fetch_heap(),
> heapam_index_fetch_tuple() explains the reason for the regression you
> are seeing with the index scan. Because the update will generate dead
> tuples in the same transaction and those dead tuples won't be removed,
> we get those from the index and then need to perform
> index_fetch_heap() to find out whether the tuple is dead or not. Now,
> for sequence scan also we need to scan those dead tuples but there we
> don't need to do back-and-forth between index and heap. I think we can
> once check with more number of tuples (say with 20000, 50000, etc.)
> for case-1.
>

Andres, do you have any thoughts on this? We seem to have figured out
the cause of regression in the case Shi-San has reported and others
also agree with it. We can't think of doing anything better than what
the patch currently is doing, so thought of going with an option to
allow users to disable index scans. The current understanding is that
the patch will be a win in much more cases than the cases where one
can see regression but still having a knob could be useful in those
few cases.

--
With Regards,
Amit Kapila.



On Mon, Mar 6, 2023 at 4:45 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Mar 6, 2023 at 4:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Mon, Mar 6, 2023 at 2:38 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> > >
> > I was going through the thread and patch,  I noticed that in the
> > initial version, we were depending upon the planner to let it decide
> > whether index scan is cheaper or not and which index to pick.  But in
> > the latest patch if a useful index exists then we chose that without
> > comparing the cost of whether it is cheaper than sequential scan or
> > not.  Is my understanding correct?  What is the reason for the same,
> >
>
> Yes, your understanding is correct. The main reason is that we don't
> have an agreement on using the internal planner APIs for apply. That
> will be a long-term maintenance burden. See discussion around email
> [1]. So, we decided to use the current infrastructure to achieve index
> scans during apply when publisher has replica identity full. This will
> still be win in many cases and we are planning to provide a knob to
> disable this feature.
>
> [1] - https://www.postgresql.org/message-id/3466340.1673117404%40sss.pgh.pa.us

Okay, this makes sense, so basically, in "replica identify full" case
instead of doing the default sequence scan we will provide a knob to
either choose index scan or sequence scan, and that seems reasonable
to me.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Hi,

On 2023-03-01 14:10:07 +0530, Amit Kapila wrote:
> On Wed, Mar 1, 2023 at 12:09 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > > I see this as a way to provide this feature for users but I would
> > > prefer to proceed with this if we can get some more buy-in from senior
> > > community members (at least one more committer) and some user(s) if
> > > possible. So, I once again request others to chime in and share their
> > > opinion.
> >
> > I'd prefer not having an option, because we figure out the cause of the
> > performance regression (reducing it to be small enough to not care). After
> > that an option defaulting to using indexes.
> >
> 
> Sure, if we can reduce regression to be small enough then we don't
> need to keep the default as false, otherwise, also, we can consider it
> to keep an option defaulting to using indexes depending on the
> investigation for regression. Anyway, the main concern was whether it
> is okay to have an option for this which I think we have an agreement
> on, now I will continue my review.

I think even as-is it's reasonable to just use it. The sequential scan
approach is O(N^2), which, uh, is not good. And having an index over thousands
of non-differing values will generally perform badly, not just in this
context.

Greetings,

Andres Freund



Hi,

On 2023-03-06 17:02:38 +0530, Amit Kapila wrote:
> Andres, do you have any thoughts on this? We seem to have figured out
> the cause of regression in the case Shi-San has reported and others
> also agree with it. We can't think of doing anything better than what
> the patch currently is doing, so thought of going with an option to
> allow users to disable index scans. The current understanding is that
> the patch will be a win in much more cases than the cases where one
> can see regression but still having a knob could be useful in those
> few cases.

I think the case in which the patch regresses performance in is irrelevant in
practice. It's good to not regress unnecessarily for easily avoidable reason,
even in such cases, but it's not worth holding anything up due to it.

Greetings,

Andres Freund



On Tue, Mar 7, 2023 at 1:34 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2023-03-01 14:10:07 +0530, Amit Kapila wrote:
> > On Wed, Mar 1, 2023 at 12:09 AM Andres Freund <andres@anarazel.de> wrote:
> > >
> > > > I see this as a way to provide this feature for users but I would
> > > > prefer to proceed with this if we can get some more buy-in from senior
> > > > community members (at least one more committer) and some user(s) if
> > > > possible. So, I once again request others to chime in and share their
> > > > opinion.
> > >
> > > I'd prefer not having an option, because we figure out the cause of the
> > > performance regression (reducing it to be small enough to not care). After
> > > that an option defaulting to using indexes.
> > >
> >
> > Sure, if we can reduce regression to be small enough then we don't
> > need to keep the default as false, otherwise, also, we can consider it
> > to keep an option defaulting to using indexes depending on the
> > investigation for regression. Anyway, the main concern was whether it
> > is okay to have an option for this which I think we have an agreement
> > on, now I will continue my review.
>
> I think even as-is it's reasonable to just use it. The sequential scan
> approach is O(N^2), which, uh, is not good. And having an index over thousands
> of non-differing values will generally perform badly, not just in this
> context.
>

Yes, it is true that generally also index scan with a lot of
duplicates may not perform well but during the scan, we do costing to
ensure such cases and may prefer other index or sequence scan. Then we
have "enable_indexscan" GUC that the user can use if required. So, I
feel it is better to have a knob to disallow usage of such indexes and
the default would be to use an index, if available.

--
With Regards,
Amit Kapila.



RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"shiy.fnst@fujitsu.com"
Date:
On Monay, Mar 6, 2023 7:19 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> 
> Yeah, seems easier to follow to me as well. Reflected it in the comment as well.  
> 

Thanks for updating the patch. Here are some comments on v33-0001 patch.

1.
+    if (RelationReplicaIdentityFullIndexScanEnabled(localrel) &&
+        remoterel->replident == REPLICA_IDENTITY_FULL)

RelationReplicaIdentityFullIndexScanEnabled() is introduced in 0002 patch, so
the call to it should be moved to 0002 patch I think.

2.
+#include "optimizer/cost.h"

Do we need this in the latest patch? I tried and it looks it could be removed
from src/backend/replication/logical/relation.c.

3.
+# now, create a unique index and set the replica
+$node_publisher->safe_psql('postgres',
+    "CREATE UNIQUE INDEX test_replica_id_full_unique ON test_replica_id_full(x);");
+$node_subscriber->safe_psql('postgres',
+    "CREATE UNIQUE INDEX test_replica_id_full_unique ON test_replica_id_full(x);");
+

Should the comment be "now, create a unique index and set the replica identity"?

4.
+$node_publisher->safe_psql('postgres',
+    "ALTER TABLE test_replica_id_full REPLICA IDENTITY USING INDEX test_replica_id_full_unique;");
+$node_subscriber->safe_psql('postgres',
+    "ALTER TABLE test_replica_id_full REPLICA IDENTITY USING INDEX test_replica_id_full_unique;");
+
+# wait for the synchronization to finish
+$node_subscriber->wait_for_subscription_sync;

There's no new tables to need to be synchronized here, should we remove the call
to wait_for_subscription_sync?

Regards,
Shi Yu

On Tue, Mar 7, 2023 at 9:16 AM shiy.fnst@fujitsu.com
<shiy.fnst@fujitsu.com> wrote:
>
> On Monay, Mar 6, 2023 7:19 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> >
> > Yeah, seems easier to follow to me as well. Reflected it in the comment as well.
> >

Few comments:
=============
1.
+get_usable_indexoid(ApplyExecutionData *edata, ResultRelInfo *relinfo)
{
...
+ if (targetrelkind == RELKIND_PARTITIONED_TABLE)
+ {
+ /* Target is a partitioned table, so find relmapentry of the partition */
+ TupleConversionMap *map = ExecGetRootToChildMap(relinfo, edata->estate);
+ AttrMap    *attrmap = map ? map->attrMap : NULL;
+
+ relmapentry =
+ logicalrep_partition_open(relmapentry, relinfo->ri_RelationDesc,
+   attrmap);


When will we hit this part of the code? As per my understanding, for
partitioned tables, we always follow apply_handle_tuple_routing()
which should call logicalrep_partition_open(), and do the necessary
work for us.

2. In logicalrep_partition_open(), it would be better to set
localrelvalid after finding the required index. The entry should be
marked valid after initializing/updating all the required members. I
have changed this in the attached.

3.
@@ -31,6 +32,7 @@ typedef struct LogicalRepRelMapEntry
  Relation localrel; /* relcache entry (NULL when closed) */
  AttrMap    *attrmap; /* map of local attributes to remote ones */
  bool updatable; /* Can apply updates/deletes? */
+ Oid usableIndexOid; /* which index to use, or InvalidOid if none */

Would it be better to name this new variable as localindexoid to match
it with the existing variable localreloid? Also, the camel case for
this variable appears odd.

4. If you agree with the above, then we should probably change the
name of functions get_usable_indexoid() and
FindLogicalRepUsableIndex() accordingly.

5.
+ {
+ /*
+ * If we had a primary key or relation identity with a unique index,
+ * we would have already found and returned that oid. At this point,
+ * the remote relation has replica identity full.

These comments are not required as this just states what the code just
above is doing.

Apart from the above, I have made some modifications in the other
comments. If you are convinced with those, then kindly include them in
the next version.

--
With Regards,
Amit Kapila.

Attachment
On Mon, Mar 6, 2023 at 10:18 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>> 4. IdxIsRelationIdentityOrPK
>>
>> +/*
>> + * Given a relation and OID of an index, returns true if the
>> + * index is relation's replica identity index or relation's
>> + * primary key's index.
>> + *
>> + * Returns false otherwise.
>> + */
>> +bool
>> +IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
>> +{
>> + Assert(OidIsValid(idxoid));
>> +
>> + return GetRelationIdentityOrPK(rel) == idxoid;
>> +}
>>
>> I think you've "simplified" this function in v28 but AFAICT now it has
>> a different logic to v27.
>>
>> PREVIOUSLY it was coded like
>> + return RelationGetReplicaIndex(rel) == idxoid ||
>> + RelationGetPrimaryKeyIndex(rel) == idxoid;
>>
>> But now in that scenario, it won't
>> even check the PK if there was a valid RI. So it might return false
>> when previously it returned true. Is it deliberate?
>>
>
> Thanks for detailed review/investigation on this. But, I also agree that
> there is no difference in terms of correctness. Also, it is probably better
> to be consistent with the existing code. So, making IdxIsRelationIdentityOrPK()
> relying on GetRelationIdentityOrPK() still sounds better to me.
>
>> You can see if 'idxoid' did NOT match RI but if it DID match PK
>> previously it would return true.
>
>
> Still, I cannot see how this check would yield a different result with how
> RI/PK works -- as Amit also noted in the next e-mail.
>
> Do you see any cases where this check would produce a different result?
> I cannot, but wanted to double check with you.
>
>

Let me give an example to demonstrate why I thought something is fishy here:

Imagine rel has a (non-default) REPLICA IDENTITY with Oid=1111.
Imagine the same rel has a PRIMARY KEY with Oid=2222.

---

+/*
+ * Get replica identity index or if it is not defined a primary key.
+ *
+ * If neither is defined, returns InvalidOid
+ */
+Oid
+GetRelationIdentityOrPK(Relation rel)
+{
+ Oid idxoid;
+
+ idxoid = RelationGetReplicaIndex(rel);
+
+ if (!OidIsValid(idxoid))
+ idxoid = RelationGetPrimaryKeyIndex(rel);
+
+ return idxoid;
+}
+
+/*
+ * Given a relation and OID of an index, returns true if the
+ * index is relation's replica identity index or relation's
+ * primary key's index.
+ *
+ * Returns false otherwise.
+ */
+bool
+IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
+{
+ Assert(OidIsValid(idxoid));
+
+ return GetRelationIdentityOrPK(rel) == idxoid;
+}
+


So, according to the above function comment/name, I will expect
calling IdxIsRelationIdentityOrPK passing Oid=1111 or Oid-2222 will
both return true, right?

But AFAICT

IdxIsRelationIdentityOrPK(rel, 1111) --> GetRelationIdentityOrPK(rel)
returns 1111 (the valid oid of the RI) --> 1111 == 1111 --> true;

IdxIsRelationIdentityOrPK(rel, 2222) --> GetRelationIdentityOrPK(rel)
returns 1111 (the valid oid of the RI) --> 1111 == 2222 --> false;

~

Now two people are telling me this is OK, but I've been staring at it
for too long and I just don't see how it can be. (??)

------
Kind Regards,
Peter Smith.
Fujitsu Australia



On Tue, Mar 7, 2023 at 1:19 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Let me give an example to demonstrate why I thought something is fishy here:
>
> Imagine rel has a (non-default) REPLICA IDENTITY with Oid=1111.
> Imagine the same rel has a PRIMARY KEY with Oid=2222.
>
> ---
>
> +/*
> + * Get replica identity index or if it is not defined a primary key.
> + *
> + * If neither is defined, returns InvalidOid
> + */
> +Oid
> +GetRelationIdentityOrPK(Relation rel)
> +{
> + Oid idxoid;
> +
> + idxoid = RelationGetReplicaIndex(rel);
> +
> + if (!OidIsValid(idxoid))
> + idxoid = RelationGetPrimaryKeyIndex(rel);
> +
> + return idxoid;
> +}
> +
> +/*
> + * Given a relation and OID of an index, returns true if the
> + * index is relation's replica identity index or relation's
> + * primary key's index.
> + *
> + * Returns false otherwise.
> + */
> +bool
> +IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
> +{
> + Assert(OidIsValid(idxoid));
> +
> + return GetRelationIdentityOrPK(rel) == idxoid;
> +}
> +
>
>
> So, according to the above function comment/name, I will expect
> calling IdxIsRelationIdentityOrPK passing Oid=1111 or Oid-2222 will
> both return true, right?
>
> But AFAICT
>
> IdxIsRelationIdentityOrPK(rel, 1111) --> GetRelationIdentityOrPK(rel)
> returns 1111 (the valid oid of the RI) --> 1111 == 1111 --> true;
>
> IdxIsRelationIdentityOrPK(rel, 2222) --> GetRelationIdentityOrPK(rel)
> returns 1111 (the valid oid of the RI) --> 1111 == 2222 --> false;
>
> ~
>
> Now two people are telling me this is OK, but I've been staring at it
> for too long and I just don't see how it can be. (??)
>

The difference is that you are misunderstanding the intent of this
function. GetRelationIdentityOrPK() returns a "replica identity index
oid" if the same is defined, else return PK, if that is defined,
otherwise, return invalidOid. This is what is expected by its callers.
Now, one can argue to have a different function name and that may be a
valid argument but as far as I can see the function does what is
expected from it.

--
With Regards,
Amit Kapila.



On Tue, Mar 7, 2023 at 8:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Mar 7, 2023 at 1:19 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Let me give an example to demonstrate why I thought something is fishy here:
> >
> > Imagine rel has a (non-default) REPLICA IDENTITY with Oid=1111.
> > Imagine the same rel has a PRIMARY KEY with Oid=2222.
> >
> > ---
> >
> > +/*
> > + * Get replica identity index or if it is not defined a primary key.
> > + *
> > + * If neither is defined, returns InvalidOid
> > + */
> > +Oid
> > +GetRelationIdentityOrPK(Relation rel)
> > +{
> > + Oid idxoid;
> > +
> > + idxoid = RelationGetReplicaIndex(rel);
> > +
> > + if (!OidIsValid(idxoid))
> > + idxoid = RelationGetPrimaryKeyIndex(rel);
> > +
> > + return idxoid;
> > +}
> > +
> > +/*
> > + * Given a relation and OID of an index, returns true if the
> > + * index is relation's replica identity index or relation's
> > + * primary key's index.
> > + *
> > + * Returns false otherwise.
> > + */
> > +bool
> > +IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
> > +{
> > + Assert(OidIsValid(idxoid));
> > +
> > + return GetRelationIdentityOrPK(rel) == idxoid;
> > +}
> > +
> >
> >
> > So, according to the above function comment/name, I will expect
> > calling IdxIsRelationIdentityOrPK passing Oid=1111 or Oid-2222 will
> > both return true, right?
> >
> > But AFAICT
> >
> > IdxIsRelationIdentityOrPK(rel, 1111) --> GetRelationIdentityOrPK(rel)
> > returns 1111 (the valid oid of the RI) --> 1111 == 1111 --> true;
> >
> > IdxIsRelationIdentityOrPK(rel, 2222) --> GetRelationIdentityOrPK(rel)
> > returns 1111 (the valid oid of the RI) --> 1111 == 2222 --> false;
> >
> > ~
> >
> > Now two people are telling me this is OK, but I've been staring at it
> > for too long and I just don't see how it can be. (??)
> >
>
> The difference is that you are misunderstanding the intent of this
> function. GetRelationIdentityOrPK() returns a "replica identity index
> oid" if the same is defined, else return PK, if that is defined,
> otherwise, return invalidOid. This is what is expected by its callers.
> Now, one can argue to have a different function name and that may be a
> valid argument but as far as I can see the function does what is
> expected from it.
>

Sure, but I am questioning the function IdxIsRelationIdentityOrPK, not
GetRelationIdentityOrPK.

------
Kind Regards,
Peter Smith.
Fujitsu Australia



Hi Andres, Amit, all 

I think the case in which the patch regresses performance in is irrelevant in
practice. 

This is similar to what I think in this context.

I appreciate the effort from Shi Yu, so that we have a clear understanding on the overhead.
But the tests we do on [1] where we observe the regression are largely synthetic test cases
that aim to spot the overhead. 

And having an index over thousands
of non-differing values will generally perform badly, not just in this
context.

Similarly, maybe there are some eccentric use patterns that might follow this. But I also suspect
even if there are such patterns, could they really be performance sensitive?


Thanks,
Onder KALACI

 
Hi Shi Yu, all


Thanks for updating the patch. Here are some comments on v33-0001 patch.

1.
+       if (RelationReplicaIdentityFullIndexScanEnabled(localrel) &&
+               remoterel->replident == REPLICA_IDENTITY_FULL)

RelationReplicaIdentityFullIndexScanEnabled() is introduced in 0002 patch, so
the call to it should be moved to 0002 patch I think.

Ah, sure, a rebase issue, fixed in v34
 

2.
+#include "optimizer/cost.h"

Do we need this in the latest patch? I tried and it looks it could be removed
from src/backend/replication/logical/relation.c.

Hmm, probably an artifact of the initial versions of the patch where we needed some
costing functionality.
 

3.
+# now, create a unique index and set the replica
+$node_publisher->safe_psql('postgres',
+       "CREATE UNIQUE INDEX test_replica_id_full_unique ON test_replica_id_full(x);");
+$node_subscriber->safe_psql('postgres',
+       "CREATE UNIQUE INDEX test_replica_id_full_unique ON test_replica_id_full(x);");
+

Should the comment be "now, create a unique index and set the replica identity"?

yes, fixed
 

4.
+$node_publisher->safe_psql('postgres',
+       "ALTER TABLE test_replica_id_full REPLICA IDENTITY USING INDEX test_replica_id_full_unique;");
+$node_subscriber->safe_psql('postgres',
+       "ALTER TABLE test_replica_id_full REPLICA IDENTITY USING INDEX test_replica_id_full_unique;");
+
+# wait for the synchronization to finish
+$node_subscriber->wait_for_subscription_sync;

There's no new tables to need to be synchronized here, should we remove the call
to wait_for_subscription_sync?

right, probably a copy & paste typo, thanks for spotting.


I'll attach v34 with the next e-mail given the comments here only touch small parts
of the patch.



Thanks,
Onder KALACI  
 


Hi Amit, all

Few comments:
=============
1.
+get_usable_indexoid(ApplyExecutionData *edata, ResultRelInfo *relinfo)
{
...
+ if (targetrelkind == RELKIND_PARTITIONED_TABLE)
+ {
+ /* Target is a partitioned table, so find relmapentry of the partition */
+ TupleConversionMap *map = ExecGetRootToChildMap(relinfo, edata->estate);
+ AttrMap    *attrmap = map ? map->attrMap : NULL;
+
+ relmapentry =
+ logicalrep_partition_open(relmapentry, relinfo->ri_RelationDesc,
+   attrmap);


When will we hit this part of the code? As per my understanding, for
partitioned tables, we always follow apply_handle_tuple_routing()
which should call logicalrep_partition_open(), and do the necessary
work for us.


Looking closer, there is really no need for that. I changed the
code such that we pass usableLocalIndexOid. It looks simpler
to me. Do you agree?

 
2. In logicalrep_partition_open(), it would be better to set
localrelvalid after finding the required index. The entry should be
marked valid after initializing/updating all the required members. I
have changed this in the attached.


makes sense
 
3.
@@ -31,6 +32,7 @@ typedef struct LogicalRepRelMapEntry
  Relation localrel; /* relcache entry (NULL when closed) */
  AttrMap    *attrmap; /* map of local attributes to remote ones */
  bool updatable; /* Can apply updates/deletes? */
+ Oid usableIndexOid; /* which index to use, or InvalidOid if none */

Would it be better to name this new variable as localindexoid to match
it with the existing variable localreloid? Also, the camel case for
this variable appears odd.

yes, both makes sense
 

4. If you agree with the above, then we should probably change the
name of functions get_usable_indexoid() and
FindLogicalRepUsableIndex() accordingly.

I dropped get_usable_indexoid() as noted in (1).

Changed FindLogicalRepUsableIndex->FindLogicalRepLocalIndex



5.
+ {
+ /*
+ * If we had a primary key or relation identity with a unique index,
+ * we would have already found and returned that oid. At this point,
+ * the remote relation has replica identity full.

These comments are not required as this just states what the code just
above is doing.

I don't have any strong opinions on adding this comment, applied your
suggestion.
 

Apart from the above, I have made some modifications in the other
comments. If you are convinced with those, then kindly include them in
the next version.


Sure, they all look good. I think I have lost (and caused the reviewers to lose)
quite a bit of time on the comment reviews. Next time, I'll try to be more prepared
for the comments.

Attached v34

Thanks,
Onder KALACI
Attachment
On Tue, Mar 7, 2023 at 3:00 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Tue, Mar 7, 2023 at 8:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Mar 7, 2023 at 1:19 PM Peter Smith <smithpb2250@gmail.com> wrote:
> > >
> > > Let me give an example to demonstrate why I thought something is fishy here:
> > >
> > > Imagine rel has a (non-default) REPLICA IDENTITY with Oid=1111.
> > > Imagine the same rel has a PRIMARY KEY with Oid=2222.
> > >
> > > ---
> > >
> > > +/*
> > > + * Get replica identity index or if it is not defined a primary key.
> > > + *
> > > + * If neither is defined, returns InvalidOid
> > > + */
> > > +Oid
> > > +GetRelationIdentityOrPK(Relation rel)
> > > +{
> > > + Oid idxoid;
> > > +
> > > + idxoid = RelationGetReplicaIndex(rel);
> > > +
> > > + if (!OidIsValid(idxoid))
> > > + idxoid = RelationGetPrimaryKeyIndex(rel);
> > > +
> > > + return idxoid;
> > > +}
> > > +
> > > +/*
> > > + * Given a relation and OID of an index, returns true if the
> > > + * index is relation's replica identity index or relation's
> > > + * primary key's index.
> > > + *
> > > + * Returns false otherwise.
> > > + */
> > > +bool
> > > +IdxIsRelationIdentityOrPK(Relation rel, Oid idxoid)
> > > +{
> > > + Assert(OidIsValid(idxoid));
> > > +
> > > + return GetRelationIdentityOrPK(rel) == idxoid;
> > > +}
> > > +
> > >
> > >
> > > So, according to the above function comment/name, I will expect
> > > calling IdxIsRelationIdentityOrPK passing Oid=1111 or Oid-2222 will
> > > both return true, right?
> > >
> > > But AFAICT
> > >
> > > IdxIsRelationIdentityOrPK(rel, 1111) --> GetRelationIdentityOrPK(rel)
> > > returns 1111 (the valid oid of the RI) --> 1111 == 1111 --> true;
> > >
> > > IdxIsRelationIdentityOrPK(rel, 2222) --> GetRelationIdentityOrPK(rel)
> > > returns 1111 (the valid oid of the RI) --> 1111 == 2222 --> false;
> > >
> > > ~
> > >
> > > Now two people are telling me this is OK, but I've been staring at it
> > > for too long and I just don't see how it can be. (??)
> > >
> >
> > The difference is that you are misunderstanding the intent of this
> > function. GetRelationIdentityOrPK() returns a "replica identity index
> > oid" if the same is defined, else return PK, if that is defined,
> > otherwise, return invalidOid. This is what is expected by its callers.
> > Now, one can argue to have a different function name and that may be a
> > valid argument but as far as I can see the function does what is
> > expected from it.
> >
>
> Sure, but I am questioning the function IdxIsRelationIdentityOrPK, not
> GetRelationIdentityOrPK.
>

The intent of IdxIsRelationIdentityOrPK() is as follows: Returns true
for the following conditions (a) if the given index OID is the same as
replica identity (when the same is defined); else (if replica identity
is not defined then (b)) (b) if the given OID is the same as PK.
Returns false otherwise. Feel free to propose any better function name
or if you think comments can be changed which makes it easier to
understand.

--
With Regards,
Amit Kapila.



Hi Amit, Peter


> > Let me give an example to demonstrate why I thought something is fishy here:
> >
> > Imagine rel has a (non-default) REPLICA IDENTITY with Oid=1111.
> > Imagine the same rel has a PRIMARY KEY with Oid=2222.
> >

Hmm, alright, this is syntactically possible, but not sure if any user 
would do this. Still thanks for catching this.

And, you are right, if a user has created such a schema, IdxIsRelationIdentityOrPK() 
would return the wrong result and we'd use sequential scan instead of index scan. 
This would be a regression. I think we should change the function. 


Here is the example:
DROP TABLE tab1;
CREATE TABLE tab1 (a int NOT NULL);
CREATE UNIQUE INDEX replica_unique ON tab1(a);
ALTER TABLE tab1 REPLICA IDENTITY USING INDEX replica_unique;
ALTER TABLE tab1 ADD CONSTRAINT pkey PRIMARY KEY (a); 


I'm attaching v35. 

Does that make sense to you Amit?

Attachment
Hi,

On 2023-03-07 08:22:45 +0530, Amit Kapila wrote:
> On Tue, Mar 7, 2023 at 1:34 AM Andres Freund <andres@anarazel.de> wrote:
> > I think even as-is it's reasonable to just use it. The sequential scan
> > approach is O(N^2), which, uh, is not good. And having an index over thousands
> > of non-differing values will generally perform badly, not just in this
> > context.
> >
> Yes, it is true that generally also index scan with a lot of
> duplicates may not perform well but during the scan, we do costing to
> ensure such cases and may prefer other index or sequence scan. Then we
> have "enable_indexscan" GUC that the user can use if required. So, I
> feel it is better to have a knob to disallow usage of such indexes and
> the default would be to use an index, if available.

It just feels like we're optimizing for an irrelevant case here. If we add
GUCs for irrelevant things like this we'll explode the number of GUCs even
faster than we already are.

Greetings,

Andres Freund



On Mon, Mar 6, 2023 at 7:40 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Mar 6, 2023 at 1:40 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
...
> >
> > Anyhow, if you feel those firstterm and FULL changes ought to be kept
> > separate from this RI patch, please let me know and I will propose
> > those changes in a new thread,
> >
>
> Personally, I would prefer to keep those separate. So, feel free to
> propose them in a new thread.
>

Done. Those suggested pg docs changes now have their own new thread [1].

------
[1] RI quotes -
https://www.postgresql.org/message-id/CAHut%2BPst11ac2hcmePt1%3DoTmBwTT%3DDAssRR1nsdoy4BT%2B68%3DMg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"shiy.fnst@fujitsu.com"
Date:
On Tue, Mar 7, 2023 9:47 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> 
> I'm attaching v35. 
> 

I noticed that if the index column only exists on the subscriber side, this index
can also be chosen. This seems a bit odd because the index column isn't sent
from publisher.

e.g.
-- pub
CREATE TABLE test_replica_id_full (x int, y int);
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;
-- sub
CREATE TABLE test_replica_id_full (x int, y int, z int);
CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(z);
CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres port=5432' PUBLICATION tap_pub_rep_full;

I didn't see in any cases the behavior changed after applying the patch, which
looks good. Besides, I tested the performance for such case.

Steps:
1. create tables, index, publication, and subscription
-- pub
create table tbl (a int);
alter table tbl replica identity full;
create publication pub for table tbl;
-- sub
create table tbl (a int, b int);
create index idx_b on tbl(b);
create subscription sub connection 'dbname=postgres port=5432' publication pub;
2. setup synchronous replication
3. execute SQL:
truncate tbl;
insert into tbl select i from generate_series(0,10000)i;
update tbl set a=a+1;

The time of UPDATE (take the average of 10 runs):
master: 1356.06 ms
patched: 3968.14 ms

For the cases that all values of extra columns on the subscriber are NULL, index
scan can't do better than sequential scan. This is not a real scenario and I
think it only degrades when there are many NULL values in the index column, so
this is probably not a case to worry about. I just share this case and then we
can discuss should we pick the index which only contain the extra columns on the
subscriber.

Regards,
Shi Yu

On Tue, Mar 7, 2023 at 7:17 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>>
>> > > Let me give an example to demonstrate why I thought something is fishy here:
>> > >
>> > > Imagine rel has a (non-default) REPLICA IDENTITY with Oid=1111.
>> > > Imagine the same rel has a PRIMARY KEY with Oid=2222.
>> > >
>
>
> Hmm, alright, this is syntactically possible, but not sure if any user
> would do this. Still thanks for catching this.
>
> And, you are right, if a user has created such a schema, IdxIsRelationIdentityOrPK()
> would return the wrong result and we'd use sequential scan instead of index scan.
> This would be a regression. I think we should change the function.
>
>
> Here is the example:
> DROP TABLE tab1;
> CREATE TABLE tab1 (a int NOT NULL);
> CREATE UNIQUE INDEX replica_unique ON tab1(a);
> ALTER TABLE tab1 REPLICA IDENTITY USING INDEX replica_unique;
> ALTER TABLE tab1 ADD CONSTRAINT pkey PRIMARY KEY (a);
>

You have not given complete steps to reproduce the problem where
instead of the index scan, a sequential scan would be picked. I have
tried to reproduce by extending your steps but didn't see the problem.
Let me know if I am missing something.

Publisher
----------------
postgres=# CREATE TABLE tab1 (a int NOT NULL);
CREATE TABLE
postgres=# Alter Table tab1 replica identity full;
ALTER TABLE
postgres=# create publication pub2 for table tab1;
CREATE PUBLICATION
postgres=# insert into tab1 values(1);
INSERT 0 1
postgres=# update tab1 set a=2;

Subscriber
-----------------
postgres=# CREATE TABLE tab1 (a int NOT NULL);
CREATE TABLE
postgres=# CREATE UNIQUE INDEX replica_unique ON tab1(a);
CREATE INDEX
postgres=# ALTER TABLE tab1 REPLICA IDENTITY USING INDEX replica_unique;
ALTER TABLE
postgres=# ALTER TABLE tab1 ADD CONSTRAINT pkey PRIMARY KEY (a);
ALTER TABLE
postgres=# create subscription sub2 connection 'dbname=postgres'
publication pub2;
NOTICE:  created replication slot "sub2" on publisher
CREATE SUBSCRIPTION
postgres=# select * from tab1;
 a
---
 2
(1 row)

I have debugged the above example and it uses an index scan during
apply without your latest change which is what I expected. AFAICS, the
use of IdxIsRelationIdentityOrPK() is to decide whether we will do
tuples_equal() or not during the index scan and I see it gives the
correct results with the example you provided.

--
With Regards,
Amit Kapila.



Here are some review comments for v35-0001

======
General

1.
Saying the index "should" or "should not" do this or that sounds like
it is still OK but just not recommended. TO remove this ambigity IMO
most of the "should" ought to be changed to "must" because IIUC this
patch will simply not consider indexes which do not obey all your
rules.

This comment applies to a few places (search for "should")

e.g.1. - Commit Message
e.g.2. - /* There should always be at least one attribute for the index scan. */
e.g.3. - The function comment for
FindUsableIndexForReplicaIdentityFull(Relation localrel)

======
doc/src/sgml/logical-replication.sgml

2.
A published table must have a “replica identity” configured in order
to be able to replicate UPDATE and DELETE operations, so that
appropriate rows to update or delete can be identified on the
subscriber side. By default, this is the primary key, if there is one.
Another unique index (with certain additional requirements) can also
be set to be the replica identity. If the table does not have any
suitable key, then it can be set to replica identity “full”, which
means the entire row becomes the key. When replica identity “full” is
specified, indexes can be used on the subscriber side for searching
the rows. Candidate indexes must be btree, non-partial, and have at
least one column reference (i.e. cannot consist of only expressions).
These restrictions on the non-unique index properties adheres some of
the restrictions that are enforced for primary keys. Internally, we
follow a similar approach for supporting index scans within logical
replication scope. If there are no such suitable indexes, the search
on the subscriber side can be very inefficient, therefore replica
identity “full” should only be used as a fallback if no other solution
is possible. If a replica identity other than “full” is set on the
publisher side, a replica identity comprising the same or fewer
columns must also be set on the subscriber side. See REPLICA IDENTITY
for details on how to set the replica identity. If a table without a
replica identity is added to a publication that replicates UPDATE or
DELETE operations then subsequent UPDATE or DELETE operations will
cause an error on the publisher. INSERT operations can proceed
regardless of any replica identity.

~

"adheres some of" --> "adhere to some of" ?

======
src/backend/executor/execReplication.c

3. build_replindex_scan_key

  {
  Oid operator;
  Oid opfamily;
  RegProcedure regop;
- int pkattno = attoff + 1;
- int mainattno = indkey->values[attoff];
- Oid optype = get_opclass_input_type(opclass->values[attoff]);
+ int table_attno = indkey->values[index_attoff];
+ Oid optype = get_opclass_input_type(opclass->values[index_attoff]);

These variable declarations might look tidier if you kept all the Oids together.

======
src/backend/replication/logical/relation.c

4. logicalrep_partition_open

+ /*
+ * Finding a usable index is an infrequent task. It occurs when an
+ * operation is first performed on the relation, or after invalidation of
+ * the relation cache entry (such as ANALYZE or CREATE/DROP index on the
+ * relation).
+ *
+ * We also prefer to run this code on the oldctx such that we do not
+ * leak anything in the LogicalRepPartMapContext (hence
+ * CacheMemoryContext).
+ */
+ entry->localindexoid = FindLogicalRepLocalIndex(partrel, remoterel)

"such that" --> "so that" ?

~~~

5. IsIndexUsableForReplicaIdentityFull

+bool
+IsIndexUsableForReplicaIdentityFull(IndexInfo *indexInfo)
+{
+ bool is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
+ bool is_partial = (indexInfo->ii_Predicate != NIL);
+ bool is_only_on_expression = IsIndexOnlyOnExpression(indexInfo);
+
+ if (is_btree && !is_partial && !is_only_on_expression)
+ {
+ return true;
+ }
+
+ return false;
+}

SUGGESTION (no need for 2 returns here)
return is_btree && !is_partial && !is_only_on_expression;

======
src/backend/replication/logical/worker.c

6. FindReplTupleInLocalRel

static bool
FindReplTupleInLocalRel(EState *estate, Relation localrel,
LogicalRepRelation *remoterel,
Oid localidxoid,
TupleTableSlot *remoteslot,
TupleTableSlot **localslot)
{
bool found;

/*
* Regardless of the top-level operation, we're performing a read here, so
* check for SELECT privileges.
*/
TargetPrivilegesCheck(localrel, ACL_SELECT);

*localslot = table_slot_create(localrel, &estate->es_tupleTable);

Assert(OidIsValid(localidxoid) ||
   (remoterel->replident == REPLICA_IDENTITY_FULL));

if (OidIsValid(localidxoid))
found = RelationFindReplTupleByIndex(localrel, localidxoid,
LockTupleExclusive,
remoteslot, *localslot);
else
found = RelationFindReplTupleSeq(localrel, LockTupleExclusive,
remoteslot, *localslot);

return found;
}

~

Since that 'found' variable is not used, you might as well remove the
if/else and simplify the code.

SUGGESTION
static bool
FindReplTupleInLocalRel(EState *estate, Relation localrel,
LogicalRepRelation *remoterel,
Oid localidxoid,
TupleTableSlot *remoteslot,
TupleTableSlot **localslot)
{
/*
* Regardless of the top-level operation, we're performing a read here, so
* check for SELECT privileges.
*/
TargetPrivilegesCheck(localrel, ACL_SELECT);

*localslot = table_slot_create(localrel, &estate->es_tupleTable);

Assert(OidIsValid(localidxoid) ||
   (remoterel->replident == REPLICA_IDENTITY_FULL));

if (OidIsValid(localidxoid))
return RelationFindReplTupleByIndex(localrel, localidxoid,
LockTupleExclusive,
remoteslot, *localslot);

return RelationFindReplTupleSeq(localrel, LockTupleExclusive,
remoteslot, *localslot);

~~~

7. apply_handle_tuple_routing

@@ -2890,6 +2877,7 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
  TupleConversionMap *map;
  MemoryContext oldctx;
  LogicalRepRelMapEntry *part_entry = NULL;
+
  AttrMap    *attrmap = NULL;

  /* ModifyTableState is needed for ExecFindPartition(). */
The added whitespace seems unrelated to this patch.


======
src/include/replication/logicalrelation.h

8.
@@ -31,6 +32,7 @@ typedef struct LogicalRepRelMapEntry
  Relation localrel; /* relcache entry (NULL when closed) */
  AttrMap    *attrmap; /* map of local attributes to remote ones */
  bool updatable; /* Can apply updates/deletes? */
+ Oid localindexoid; /* which index to use, or InvalidOid if none */

Indentation is not correct for that new member comment.


------
Kind Regards,
Peter Smith.
Fujitsu Australia



On Wed, Mar 8, 2023 at 9:09 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
>
> ======
> src/backend/executor/execReplication.c
>
> 3. build_replindex_scan_key
>
>   {
>   Oid operator;
>   Oid opfamily;
>   RegProcedure regop;
> - int pkattno = attoff + 1;
> - int mainattno = indkey->values[attoff];
> - Oid optype = get_opclass_input_type(opclass->values[attoff]);
> + int table_attno = indkey->values[index_attoff];
> + Oid optype = get_opclass_input_type(opclass->values[index_attoff]);
>
> These variable declarations might look tidier if you kept all the Oids together.
>

I am not sure how much that would be an improvement over the current
but that will lead to an additional churn in the patch.

> ======
> src/backend/replication/logical/worker.c
>
> 6. FindReplTupleInLocalRel
>
> static bool
> FindReplTupleInLocalRel(EState *estate, Relation localrel,
> LogicalRepRelation *remoterel,
> Oid localidxoid,
> TupleTableSlot *remoteslot,
> TupleTableSlot **localslot)
> {
> bool found;
>
> /*
> * Regardless of the top-level operation, we're performing a read here, so
> * check for SELECT privileges.
> */
> TargetPrivilegesCheck(localrel, ACL_SELECT);
>
> *localslot = table_slot_create(localrel, &estate->es_tupleTable);
>
> Assert(OidIsValid(localidxoid) ||
>    (remoterel->replident == REPLICA_IDENTITY_FULL));
>
> if (OidIsValid(localidxoid))
> found = RelationFindReplTupleByIndex(localrel, localidxoid,
> LockTupleExclusive,
> remoteslot, *localslot);
> else
> found = RelationFindReplTupleSeq(localrel, LockTupleExclusive,
> remoteslot, *localslot);
>
> return found;
> }
>
> ~
>
> Since that 'found' variable is not used, you might as well remove the
> if/else and simplify the code.
>

Hmm, but that is an existing style/code, and this patch has done
nothing which requires that change. Personally, I find the current
style better for the readability purpose.

--
With Regards,
Amit Kapila.



On Wed, Mar 8, 2023 at 3:03 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Mar 8, 2023 at 9:09 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> >
> > ======
> > src/backend/executor/execReplication.c
> >
> > 3. build_replindex_scan_key
> >
> >   {
> >   Oid operator;
> >   Oid opfamily;
> >   RegProcedure regop;
> > - int pkattno = attoff + 1;
> > - int mainattno = indkey->values[attoff];
> > - Oid optype = get_opclass_input_type(opclass->values[attoff]);
> > + int table_attno = indkey->values[index_attoff];
> > + Oid optype = get_opclass_input_type(opclass->values[index_attoff]);
> >
> > These variable declarations might look tidier if you kept all the Oids together.
> >
>
> I am not sure how much that would be an improvement over the current
> but that will lead to an additional churn in the patch.

That suggestion was because IMO the 'optype' and  'opfamily' belong
together. TBH, really think the assignment of 'opttype' should happen
later with the 'opfamilly' assignment too because then it will be
*after*  the  (!AttributeNumberIsValid(table_attno)) check.

>
> > ======
> > src/backend/replication/logical/worker.c
> >
> > 6. FindReplTupleInLocalRel
> >
> > static bool
> > FindReplTupleInLocalRel(EState *estate, Relation localrel,
> > LogicalRepRelation *remoterel,
> > Oid localidxoid,
> > TupleTableSlot *remoteslot,
> > TupleTableSlot **localslot)
> > {
> > bool found;
> >
> > /*
> > * Regardless of the top-level operation, we're performing a read here, so
> > * check for SELECT privileges.
> > */
> > TargetPrivilegesCheck(localrel, ACL_SELECT);
> >
> > *localslot = table_slot_create(localrel, &estate->es_tupleTable);
> >
> > Assert(OidIsValid(localidxoid) ||
> >    (remoterel->replident == REPLICA_IDENTITY_FULL));
> >
> > if (OidIsValid(localidxoid))
> > found = RelationFindReplTupleByIndex(localrel, localidxoid,
> > LockTupleExclusive,
> > remoteslot, *localslot);
> > else
> > found = RelationFindReplTupleSeq(localrel, LockTupleExclusive,
> > remoteslot, *localslot);
> >
> > return found;
> > }
> >
> > ~
> >
> > Since that 'found' variable is not used, you might as well remove the
> > if/else and simplify the code.
> >
>
> Hmm, but that is an existing style/code, and this patch has done
> nothing which requires that change. Personally, I find the current
> style better for the readability purpose.
>

OK. I failed to notice that was same as the existing code.

------
Kind Regards,
Peter Smith.
Fujitsu Australia



RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"houzj.fnst@fujitsu.com"
Date:
On Tuesday, March 7, 2023 9:47 PM Önder Kalacı <onderkalaci@gmail.com>  wrote:

Hi,

> > > Let me give an example to demonstrate why I thought something is fishy here:
> > >
> > > Imagine rel has a (non-default) REPLICA IDENTITY with Oid=1111.
> > > Imagine the same rel has a PRIMARY KEY with Oid=2222.
> > >
> 
> Hmm, alright, this is syntactically possible, but not sure if any user 
> would do this. Still thanks for catching this.
> 
> And, you are right, if a user has created such a schema, IdxIsRelationIdentityOrPK() 
> would return the wrong result and we'd use sequential scan instead of index scan. 
> This would be a regression. I think we should change the function. 

I am looking at the latest patch and have a question about the following code.

     /* Try to find the tuple */
-    if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+    while (index_getnext_slot(scan, ForwardScanDirection, outslot))
     {
-        found = true;
+        /*
+         * Avoid expensive equality check if the index is primary key or
+         * replica identity index.
+         */
+        if (!idxIsRelationIdentityOrPK)
+        {
+            if (eq == NULL)
+            {
+#ifdef USE_ASSERT_CHECKING
+                /* apply assertions only once for the input idxoid */
+                IndexInfo  *indexInfo = BuildIndexInfo(idxrel);
+                Assert(IsIndexUsableForReplicaIdentityFull(indexInfo));
+#endif
+
+                /*
+                 * We only need to allocate once. This is allocated within per
+                 * tuple context -- ApplyMessageContext -- hence no need to
+                 * explicitly pfree().
+                 */
+                eq = palloc0(sizeof(*eq) * outslot->tts_tupleDescriptor->natts);
+            }
+
+            if (!tuples_equal(outslot, searchslot, eq))
+                continue;
+        }

IIRC, it invokes tuples_equal for all cases unless we are using replica
identity key or primary key to scan. But there seem some other cases where the
tuples_equal looks unnecessary.

For example, if the table on subscriber don't have a PK or RI key but have a
not-null, non-deferrable, unique key. And if the apply worker choose this index
to do the scan, it seems we can skip the tuples_equal as well.

--Example
pub:
CREATE TABLE test_replica_id_full (a int, b int not null);
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;

sub:
CREATE TABLE test_replica_id_full (a int, b int not null);
CREATE UNIQUE INDEX test_replica_id_full_idx ON test_replica_id_full(b);
--

I am not 100% sure if it's worth optimizing this by complicating the check in
idxIsRelationIdentityOrPK. What do you think ?

Best Regards,
Hou zj

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"houzj.fnst@fujitsu.com"
Date:
On Wednesday, March 8, 2023 2:51 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
> 
> On Tuesday, March 7, 2023 9:47 PM Önder Kalacı <onderkalaci@gmail.com>
> wrote:
> 
> Hi,
> 
> > > > Let me give an example to demonstrate why I thought something is fishy
> here:
> > > >
> > > > Imagine rel has a (non-default) REPLICA IDENTITY with Oid=1111.
> > > > Imagine the same rel has a PRIMARY KEY with Oid=2222.
> > > >
> >
> > Hmm, alright, this is syntactically possible, but not sure if any user
> > would do this. Still thanks for catching this.
> >
> > And, you are right, if a user has created such a schema,
> > IdxIsRelationIdentityOrPK() would return the wrong result and we'd use
> sequential scan instead of index scan.
> > This would be a regression. I think we should change the function.
> 
> I am looking at the latest patch and have a question about the following code.
> 
>      /* Try to find the tuple */
> -    if (index_getnext_slot(scan, ForwardScanDirection, outslot))
> +    while (index_getnext_slot(scan, ForwardScanDirection, outslot))
>      {
> -        found = true;
> +        /*
> +         * Avoid expensive equality check if the index is primary key or
> +         * replica identity index.
> +         */
> +        if (!idxIsRelationIdentityOrPK)
> +        {
> +            if (eq == NULL)
> +            {
> +#ifdef USE_ASSERT_CHECKING
> +                /* apply assertions only once for the input
> idxoid */
> +                IndexInfo  *indexInfo = BuildIndexInfo(idxrel);
> +
>     Assert(IsIndexUsableForReplicaIdentityFull(indexInfo));
> +#endif
> +
> +                /*
> +                 * We only need to allocate once. This is
> allocated within per
> +                 * tuple context -- ApplyMessageContext --
> hence no need to
> +                 * explicitly pfree().
> +                 */
> +                eq = palloc0(sizeof(*eq) *
> outslot->tts_tupleDescriptor->natts);
> +            }
> +
> +            if (!tuples_equal(outslot, searchslot, eq))
> +                continue;
> +        }
> 
> IIRC, it invokes tuples_equal for all cases unless we are using replica identity key
> or primary key to scan. But there seem some other cases where the
> tuples_equal looks unnecessary.
> 
> For example, if the table on subscriber don't have a PK or RI key but have a
> not-null, non-deferrable, unique key. And if the apply worker choose this index
> to do the scan, it seems we can skip the tuples_equal as well.
> 
> --Example
> pub:
> CREATE TABLE test_replica_id_full (a int, b int not null); ALTER TABLE
> test_replica_id_full REPLICA IDENTITY FULL; CREATE PUBLICATION
> tap_pub_rep_full FOR TABLE test_replica_id_full;
> 
> sub:
> CREATE TABLE test_replica_id_full (a int, b int not null); CREATE UNIQUE INDEX
> test_replica_id_full_idx ON test_replica_id_full(b);

Thinking again. This example is incorrect, sorry. I mean the case when
all the columns of the tuple to be compared are in the unique index on
subscriber side, like:

--Example
pub:
CREATE TABLE test_replica_id_full (a int);
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;

sub:
CREATE TABLE test_replica_id_full (a int not null);
CREATE UNIQUE INDEX test_replica_id_full_idx ON test_replica_id_full(a);
--

Best Regards,
Hou zj

Hi Amit, all


You have not given complete steps to reproduce the problem where
instead of the index scan, a sequential scan would be picked. I have
tried to reproduce by extending your steps but didn't see the problem.
Let me know if I am missing something.

I think the steps you shared are what I had in mind. 
 

I have debugged the above example and it uses an index scan during
apply without your latest change which is what I expected. AFAICS, the
use of IdxIsRelationIdentityOrPK() is to decide whether we will do
tuples_equal() or not during the index scan and I see it gives the
correct results with the example you provided.


Right, I got confused. IdxIsRelationIdentityOrPK is only called within
RelationFindReplTupleByIndex(). And, yes, it only impacts tuples_equal.

But, still, it feels safer to keep as the current patch if we don't change the
name of the function.

I really don't have any strong opinions for either way, only a slight preference
to keep as v35 for future callers not to get confused as we do here.

Let me know how you prefer this.


Thanks,
Onder
 
Hi Peter, all



1.
Saying the index "should" or "should not" do this or that sounds like
it is still OK but just not recommended. TO remove this ambigity IMO
most of the "should" ought to be changed to "must" because IIUC this
patch will simply not consider indexes which do not obey all your
rules.

This comment applies to a few places (search for "should")

e.g.1. - Commit Message
e.g.2. - /* There should always be at least one attribute for the index scan. */
e.g.3. - The function comment for
FindUsableIndexForReplicaIdentityFull(Relation localrel)

======
doc/src/sgml/logical-replication.sgml

I'm definitely not an expert on this subject (or native speaker). So, I'm following your
suggestion. 
 

2.
A published table must have a “replica identity” configured in order
to be able to replicate UPDATE and DELETE operations, so that
appropriate rows to update or delete can be identified on the
subscriber side. By default, this is the primary key, if there is one.
Another unique index (with certain additional requirements) can also
be set to be the replica identity. If the table does not have any
suitable key, then it can be set to replica identity “full”, which
means the entire row becomes the key. When replica identity “full” is
specified, indexes can be used on the subscriber side for searching
the rows. Candidate indexes must be btree, non-partial, and have at
least one column reference (i.e. cannot consist of only expressions).
These restrictions on the non-unique index properties adheres some of
the restrictions that are enforced for primary keys. Internally, we
follow a similar approach for supporting index scans within logical
replication scope. If there are no such suitable indexes, the search
on the subscriber side can be very inefficient, therefore replica
identity “full” should only be used as a fallback if no other solution
is possible. If a replica identity other than “full” is set on the
publisher side, a replica identity comprising the same or fewer
columns must also be set on the subscriber side. See REPLICA IDENTITY
for details on how to set the replica identity. If a table without a
replica identity is added to a publication that replicates UPDATE or
DELETE operations then subsequent UPDATE or DELETE operations will
cause an error on the publisher. INSERT operations can proceed
regardless of any replica identity.

~

"adheres some of" --> "adhere to some of" ?

sounds right, changed
 

======
src/backend/executor/execReplication.c

3. build_replindex_scan_key

  {
  Oid operator;
  Oid opfamily;
  RegProcedure regop;
- int pkattno = attoff + 1;
- int mainattno = indkey->values[attoff];
- Oid optype = get_opclass_input_type(opclass->values[attoff]);
+ int table_attno = indkey->values[index_attoff];
+ Oid optype = get_opclass_input_type(opclass->values[index_attoff]);

These variable declarations might look tidier if you kept all the Oids together.

======
src/backend/replication/logical/relation.c

Based on the discussions below, I kept as-is. I really don't want to do unrelated
changes in this patch, as I also got several feedback for not doing it,

 

4. logicalrep_partition_open

+ /*
+ * Finding a usable index is an infrequent task. It occurs when an
+ * operation is first performed on the relation, or after invalidation of
+ * the relation cache entry (such as ANALYZE or CREATE/DROP index on the
+ * relation).
+ *
+ * We also prefer to run this code on the oldctx such that we do not
+ * leak anything in the LogicalRepPartMapContext (hence
+ * CacheMemoryContext).
+ */
+ entry->localindexoid = FindLogicalRepLocalIndex(partrel, remoterel)

"such that" --> "so that" ?


fixed
 
~~~

5. IsIndexUsableForReplicaIdentityFull

+bool
+IsIndexUsableForReplicaIdentityFull(IndexInfo *indexInfo)
+{
+ bool is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
+ bool is_partial = (indexInfo->ii_Predicate != NIL);
+ bool is_only_on_expression = IsIndexOnlyOnExpression(indexInfo);
+
+ if (is_btree && !is_partial && !is_only_on_expression)
+ {
+ return true;
+ }
+
+ return false;
+}

SUGGESTION (no need for 2 returns here)
return is_btree && !is_partial && !is_only_on_expression;

true, done
 

======
src/backend/replication/logical/worker.c

6. FindReplTupleInLocalRel

static bool
FindReplTupleInLocalRel(EState *estate, Relation localrel,
LogicalRepRelation *remoterel,
Oid localidxoid,
TupleTableSlot *remoteslot,
TupleTableSlot **localslot)
{
bool found;

/*
* Regardless of the top-level operation, we're performing a read here, so
* check for SELECT privileges.
*/
TargetPrivilegesCheck(localrel, ACL_SELECT);

*localslot = table_slot_create(localrel, &estate->es_tupleTable);

Assert(OidIsValid(localidxoid) ||
   (remoterel->replident == REPLICA_IDENTITY_FULL));

if (OidIsValid(localidxoid))
found = RelationFindReplTupleByIndex(localrel, localidxoid,
LockTupleExclusive,
remoteslot, *localslot);
else
found = RelationFindReplTupleSeq(localrel, LockTupleExclusive,
remoteslot, *localslot);

return found;
}

~

Since that 'found' variable is not used, you might as well remove the
if/else and simplify the code.

SUGGESTION
static bool
FindReplTupleInLocalRel(EState *estate, Relation localrel,
LogicalRepRelation *remoterel,
Oid localidxoid,
TupleTableSlot *remoteslot,
TupleTableSlot **localslot)
{
/*
* Regardless of the top-level operation, we're performing a read here, so
* check for SELECT privileges.
*/
TargetPrivilegesCheck(localrel, ACL_SELECT);

*localslot = table_slot_create(localrel, &estate->es_tupleTable);

Assert(OidIsValid(localidxoid) ||
   (remoterel->replident == REPLICA_IDENTITY_FULL));

if (OidIsValid(localidxoid))
return RelationFindReplTupleByIndex(localrel, localidxoid,
LockTupleExclusive,
remoteslot, *localslot);

return RelationFindReplTupleSeq(localrel, LockTupleExclusive,
remoteslot, *localslot);


Maybe you are right, we don't need the variable. But I don't want to get into
further discussions just because I'd change unrelated code to the patch.

So, I think I prefer to skip this change unless you have strong objections.
 
~~~

7. apply_handle_tuple_routing

@@ -2890,6 +2877,7 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
  TupleConversionMap *map;
  MemoryContext oldctx;
  LogicalRepRelMapEntry *part_entry = NULL;
+
  AttrMap    *attrmap = NULL;

  /* ModifyTableState is needed for ExecFindPartition(). */
The added whitespace seems unrelated to this patch.

Thanks, fixed
 


======
src/include/replication/logicalrelation.h

8.
@@ -31,6 +32,7 @@ typedef struct LogicalRepRelMapEntry
  Relation localrel; /* relcache entry (NULL when closed) */
  AttrMap    *attrmap; /* map of local attributes to remote ones */
  bool updatable; /* Can apply updates/deletes? */
+ Oid localindexoid; /* which index to use, or InvalidOid if none */

Indentation is not correct for that new member comment.

fixed, thanks

 
I'm attaching v36.


Thanks,
Onder
Attachment
Hi Hou zj, all

IIRC, it invokes tuples_equal for all cases unless we are using replica
identity key or primary key to scan. But there seem some other cases where the
tuples_equal looks unnecessary.

For example, if the table on subscriber don't have a PK or RI key but have a
not-null, non-deferrable, unique key. And if the apply worker choose this index
to do the scan, it seems we can skip the tuples_equal as well.


Yeah, that's right. I also spotted this earlier, see # Testcase start: Unique index 
that is not primary key or replica identity

I'm thinking that we should not create any code complexity for this case, at least with
this patch. I have a few small follow-up ideas, like this one, or allow non-btree indexes etc.
but I'd rather not get those optional improvements to this patch, if that makes sense.



Thanks,
Onder KALACI


Hi Shi Yu, all



e.g.
-- pub
CREATE TABLE test_replica_id_full (x int, y int);
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;
-- sub
CREATE TABLE test_replica_id_full (x int, y int, z int);
CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(z);
CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres port=5432' PUBLICATION tap_pub_rep_full;

I didn't see in any cases the behavior changed after applying the patch, which
looks good. Besides, I tested the performance for such case.

Thanks for testing this edge case. I thought we had a test for this, but it seems to be missing.
 

For the cases that all values of extra columns on the subscriber are NULL, index
scan can't do better than sequential scan. This is not a real scenario and I
think it only degrades when there are many NULL values in the index column, so
this is probably not a case to worry about.

I also debugged this case as well, and don't see any problems with that either. But I think this is a valid
test case given at some point we might forget about this case and somehow break.

So, I'll add a new test with PUBLICATION LACKS THE COLUMN ON THE SUBS INDEX on v36.

 
I just share this case and then we
can discuss should we pick the index which only contain the extra columns on the
subscriber.


I think its performance implications come down to the discussion on [1]. Overall, I prefer
avoiding adding any additional complexity in the code for some edge cases. The code
can handle this sub-optimal user pattern, with a sub-optimal performance.

Still, happy to hear other thoughts on this.

Thanks,
Onder KALACI



On Wed, Mar 8, 2023 at 4:51 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>
>>
>> I just share this case and then we
>> can discuss should we pick the index which only contain the extra columns on the
>> subscriber.
>>
>
> I think its performance implications come down to the discussion on [1]. Overall, I prefer
> avoiding adding any additional complexity in the code for some edge cases. The code
> can handle this sub-optimal user pattern, with a sub-optimal performance.
>

It is fine to leave this and Hou-San's case if they make the patch
complex. However, it may be better to give it a try and see if this or
other regression/optimization can be avoided without adding much
complexity to the patch. You can prepare a top-up patch and then we
can discuss it.

--
With Regards,
Amit Kapila.



On Fri, 3 Mar 2023 at 18:40, Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Hi Vignesh,
>
> Thanks for the review
>
>>
>> 1) We are currently calling RelationGetIndexList twice, once in
>> FindUsableIndexForReplicaIdentityFull function and in the caller too,
>> we could avoid one of the calls by passing the indexlist to the
>> function or removing the check here, index list check can be handled
>> in FindUsableIndexForReplicaIdentityFull.
>> +       if (remoterel->replident == REPLICA_IDENTITY_FULL &&
>> +               RelationGetIndexList(localrel) != NIL)
>> +       {
>> +               /*
>> +                * If we had a primary key or relation identity with a
>> unique index,
>> +                * we would have already found and returned that oid.
>> At this point,
>> +                * the remote relation has replica identity full and
>> we have at least
>> +                * one local index defined.
>> +                *
>> +                * We are looking for one more opportunity for using
>> an index. If
>> +                * there are any indexes defined on the local
>> relation, try to pick
>> +                * a suitable index.
>> +                *
>> +                * The index selection safely assumes that all the
>> columns are going
>> +                * to be available for the index scan given that
>> remote relation has
>> +                * replica identity full.
>> +                */
>> +               return FindUsableIndexForReplicaIdentityFull(localrel);
>> +       }
>> +
>
>
> makes sense, done
>
>>
>>
>> 2) Copyright year should be mentioned as 2023
>> diff --git a/src/test/subscription/t/032_subscribe_use_index.pl
>> b/src/test/subscription/t/032_subscribe_use_index.pl
>> new file mode 100644
>> index 0000000000..db0a7ea2a0
>> --- /dev/null
>> +++ b/src/test/subscription/t/032_subscribe_use_index.pl
>> @@ -0,0 +1,861 @@
>> +# Copyright (c) 2021-2022, PostgreSQL Global Development Group
>> +
>> +# Test logical replication behavior with subscriber uses available index
>> +use strict;
>> +use warnings;
>> +use PostgreSQL::Test::Cluster;
>> +use PostgreSQL::Test::Utils;
>> +use Test::More;
>> +
>
>
> I changed it to #Copyright (c) 2022-2023, but I'm not sure if it should be only 2023 or
> like this.
>
>>
>>
>> 3) Many of the tests are using the same tables, we need not
>> drop/create publication/subscription for each of the team, we could
>> just drop and create required indexes and verify the update/delete
>> statements.
>> +# ====================================================================
>> +# Testcase start: SUBSCRIPTION USES INDEX
>> +#
>> +# Basic test where the subscriber uses index
>> +# and only updates 1 row and deletes
>> +# 1 other row
>> +#
>> +
>> +# create tables pub and sub
>> +$node_publisher->safe_psql('postgres',
>> +       "CREATE TABLE test_replica_id_full (x int)");
>> +$node_publisher->safe_psql('postgres',
>> +       "ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;");
>> +$node_subscriber->safe_psql('postgres',
>> +       "CREATE TABLE test_replica_id_full (x int)");
>> +$node_subscriber->safe_psql('postgres',
>> +       "CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x)");
>>
>> +# ====================================================================
>> +# Testcase start: SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES
>> +#
>> +# This test ensures that after CREATE INDEX, the subscriber can automatically
>> +# use one of the indexes (provided that it fulfils the requirements).
>> +# Similarly, after DROP index, the subscriber can automatically switch to
>> +# sequential scan
>> +
>> +# create tables pub and sub
>> +$node_publisher->safe_psql('postgres',
>> +       "CREATE TABLE test_replica_id_full (x int NOT NULL, y int)");
>> +$node_publisher->safe_psql('postgres',
>> +       "ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;");
>> +$node_subscriber->safe_psql('postgres',
>> +       "CREATE TABLE test_replica_id_full (x int NOT NULL, y int)");
>
>
> Well, not all the tables are exactly the same, there are 4-5 different
> tables. Mostly the table names are the same.
>
> Plus, the overhead does not seem to be large enough to complicate
> the test. Many of the src/test/subscription/t files follow this pattern.
>
> Do you have strong opinions on changing this?

I felt that once you remove the create publication/subscription/wait
for sync steps, the test execution might become faster and save some
time in the local execution, cfbot and the various machines in
buildfarm. If the execution time will not reduce, then no need to
change.

Regards,
Vignesh



On Tue, 7 Mar 2023 at 19:17, Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Hi Amit, Peter
>
>>
>> > > Let me give an example to demonstrate why I thought something is fishy here:
>> > >
>> > > Imagine rel has a (non-default) REPLICA IDENTITY with Oid=1111.
>> > > Imagine the same rel has a PRIMARY KEY with Oid=2222.
>> > >
>
>
> Hmm, alright, this is syntactically possible, but not sure if any user
> would do this. Still thanks for catching this.
>
> And, you are right, if a user has created such a schema, IdxIsRelationIdentityOrPK()
> would return the wrong result and we'd use sequential scan instead of index scan.
> This would be a regression. I think we should change the function.
>
>
> Here is the example:
> DROP TABLE tab1;
> CREATE TABLE tab1 (a int NOT NULL);
> CREATE UNIQUE INDEX replica_unique ON tab1(a);
> ALTER TABLE tab1 REPLICA IDENTITY USING INDEX replica_unique;
> ALTER TABLE tab1 ADD CONSTRAINT pkey PRIMARY KEY (a);
>
>
> I'm attaching v35.

Few comments
1) Maybe this change is not required:
    fallback if no other solution is possible.  If a replica identity other
    than <quote>full</quote> is set on the publisher side, a replica identity
-   comprising the same or fewer columns must also be set on the subscriber
-   side.  See <xref linkend="sql-altertable-replica-identity"/> for details on
+   comprising the same or fewer columns must also be set on the
subscriber side.
+   See <xref linkend="sql-altertable-replica-identity"/> for details on

2) Variable declaration and the assignment can be split so that the
readability is better:
+
+               bool            isUsableIndex =
+                       IsIndexUsableForReplicaIdentityFull(indexInfo);
+
+               index_close(indexRelation, AccessShareLock);
+

3) Since there is only one statement within the if condition, the
braces can be removed
+       if (is_btree && !is_partial && !is_only_on_expression)
+       {
+               return true;
+       }

4) There is minor indentation issue in this, we could run pgindent to fix it:
+static Oid     FindLogicalRepLocalIndex(Relation localrel,
+
   LogicalRepRelation *remoterel);
+

Regards,
Vignesh



Hi,


I felt that once you remove the create publication/subscription/wait
for sync steps, the test execution might become faster and save some
time in the local execution, cfbot and the various machines in
buildfarm. If the execution time will not reduce, then no need to
change.


So, as I noted earlier, there are different schemas. As far as I count, there are at least
7 different table definitions. I think all tables having the same name are maybe confusing?

Even if I try to group the same table definitions, and avoid create publication/subscription/wait 
for sync steps, the total execution time of the test drops only ~5%. As far as I test, that does not
seem to be the bottleneck for the tests.

Well, I'm really not sure if it is really worth doing that. I think having each test independent of each
other is really much easier to follow.

Though, while looking into the execution times, I realized that in some tests, I used quite a lot
of unnecessary tuples such as: 

-       "INSERT INTO test_replica_id_full SELECT i, i FROM generate_series(0,2100)i;");
+       "INSERT INTO test_replica_id_full SELECT i, i FROM generate_series(0,21)i;");
 
In the next iteration of the patch, I'm going to decrease the number of tuples. That seems to
save 5-10% of the execution time on my local machine.


Thanks,
Onder KALACI
 
Hi Vignesh C,



Few comments
1) Maybe this change is not required:
    fallback if no other solution is possible.  If a replica identity other
    than <quote>full</quote> is set on the publisher side, a replica identity
-   comprising the same or fewer columns must also be set on the subscriber
-   side.  See <xref linkend="sql-altertable-replica-identity"/> for details on
+   comprising the same or fewer columns must also be set on the
subscriber side.
+   See <xref linkend="sql-altertable-replica-identity"/> for details on

Yes, fixed. 

2) Variable declaration and the assignment can be split so that the
readability is better:
+
+               bool            isUsableIndex =
+                       IsIndexUsableForReplicaIdentityFull(indexInfo);
+
+               index_close(indexRelation, AccessShareLock);
+
 
Hmm, can you please elaborate more on this? The declaration
and assignment are already on different lines.  

ps: pgindent changed this line a bit. Does that look better?
 

3) Since there is only one statement within the if condition, the
braces can be removed
+       if (is_btree && !is_partial && !is_only_on_expression)
+       {
+               return true;
+       }


Fixed on a newer version of the patch. Now it is only:
return is_btree && !is_partial && !is_only_on_expression;
 
4) There is minor indentation issue in this, we could run pgindent to fix it:
+static Oid     FindLogicalRepLocalIndex(Relation localrel,
+
   LogicalRepRelation *remoterel);
+


Yes, pgindent fixed it, thanks.


Attached v37

Thanks,
Onder KALACI
Attachment
Here are my review comments for v37-0001.

======
General - should/must.

1.
In my previous review [1] (comment #1) I wrote that only some of the
"should" were misleading and gave examples where to change. But I
didn't say that *every* usage of that word was wrong, so your global
replace of "should" to "must" has modified a couple of places in
unexpected ways.

Details are in subsequent review comments below -- see #2b, #3, #5.

======
doc/src/sgml/logical-replication.sgml

2.
A published table must have a “replica identity” configured in order
to be able to replicate UPDATE and DELETE operations, so that
appropriate rows to update or delete can be identified on the
subscriber side. By default, this is the primary key, if there is one.
Another unique index (with certain additional requirements) can also
be set to be the replica identity. If the table does not have any
suitable key, then it can be set to replica identity “full”, which
means the entire row becomes the key. When replica identity “full” is
specified, indexes can be used on the subscriber side for searching
the rows. Candidate indexes must be btree, non-partial, and have at
least one column reference (i.e. cannot consist of only expressions).
These restrictions on the non-unique index properties adheres to some
of the restrictions that are enforced for primary keys. Internally, we
follow a similar approach for supporting index scans within logical
replication scope. If there are no such suitable indexes, the search
on the subscriber side can be very inefficient, therefore replica
identity “full” must only be used as a fallback if no other solution
is possible. If a replica identity other than “full” is set on the
publisher side, a replica identity comprising the same or fewer
columns must also be set on the subscriber side. See REPLICA IDENTITY
for details on how to set the replica identity. If a table without a
replica identity is added to a publication that replicates UPDATE or
DELETE operations then subsequent UPDATE or DELETE operations will
cause an error on the publisher. INSERT operations can proceed
regardless of any replica identity.

~~

2a.
My previous review [1] (comment #2)  was not fixed quite as suggested.

Please change:
"adheres to" --> "adhere to"

~~

2b. should/must

This should/must change was OK as it was before, because here it is only advice.

Please change this back how it was:
"must only be used as a fallback" --> "should only be used as a fallback"

======
src/backend/executor/execReplication.c

3. build_replindex_scan_key

 /*
  * Setup a ScanKey for a search in the relation 'rel' for a tuple 'key' that
  * is setup to match 'rel' (*NOT* idxrel!).
  *
- * Returns whether any column contains NULLs.
+ * Returns how many columns must be used for the index scan.
+ *

~

This should/must change does not seem quite right.

SUGGESTION (reworded)
Returns how many columns to use for the index scan.

~~~

4. build_replindex_scan_key

>
> Based on the discussions below, I kept as-is. I really don't want to do unrelated
> changes in this patch, as I also got several feedback for not doing it,
>

Hmm, although this code pre-existed I don’t consider this one as
"unrelated changes" because the patch introduced the new "if
(!AttributeNumberIsValid(table_attno))" which changed things.  As I
wrote to Amit yesterday [2] IMO it would be better to do the 'opttype'
assignment *after* the potential 'continue' otherwise there is every
chance that the assignment is just redundant. And if you move the
assignment where it belongs, then you might as well declare the
variable in the more appropriate place at the same time – i.e. with
'opfamily' declaration. Anyway, I've given my reason a couple of times
now, so if you don't want to change it I won't about it debate
anymore.

======
src/backend/replication/logical/relation.c

5. FindUsableIndexForReplicaIdentityFull

+ * XXX: There are no fundamental problems for supporting non-btree indexes.
+ * We mostly need to relax the limitations in RelationFindReplTupleByIndex().
+ * For partial indexes, the required changes are likely to be larger. If
+ * none of the tuples satisfy the expression for the index scan, we must
+ * fall-back to sequential execution, which might not be a good idea in some
+ * cases.

The should/must change (the one in the XXX comment) does not seem quite right.

SUGGESTION
"we must fall-back to sequential execution" --> "we fallback to
sequential execution"

======
.../subscription/t/032_subscribe_use_index.pl

FYI, I get TAP test in error (Note - this is when only patch 0001 is appied)

t/031_column_list.pl ............... ok
t/032_subscribe_use_index.pl ....... 19/?
#   Failed test 'ensure subscriber has not used index with
enable_indexscan=false'
#   at t/032_subscribe_use_index.pl line 806.
#          got: '1'
#     expected: '0'
t/032_subscribe_use_index.pl ....... 21/? # Looks like you failed 1 test of 22.
t/032_subscribe_use_index.pl ....... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/22 subtests
t/100_bugs.pl ...................... ok

AFAICT there is a test case that is testing the patch 0002
functionality even when patch 0002 is not applied yet.

------
[1] Replies to my review v35 -
https://www.postgresql.org/message-id/CACawEhXnTQyGNCXeQGhN3_%2BGWujhS3MyY27C4sSqRvZ%2B_B7FLg%40mail.gmail.com
[2] https://www.postgresql.org/message-id/CAHut%2BPvLvDGFzk4fSaevGY5h2PpAeSZjJjje_7vBdb7ag%3DzswA%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia



On Thu, Mar 9, 2023 at 6:34 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> 4. build_replindex_scan_key
>
> >
> > Based on the discussions below, I kept as-is. I really don't want to do unrelated
> > changes in this patch, as I also got several feedback for not doing it,
> >
>
> Hmm, although this code pre-existed I don’t consider this one as
> "unrelated changes" because the patch introduced the new "if
> (!AttributeNumberIsValid(table_attno))" which changed things.  As I
> wrote to Amit yesterday [2] IMO it would be better to do the 'opttype'
> assignment *after* the potential 'continue' otherwise there is every
> chance that the assignment is just redundant. And if you move the
> assignment where it belongs, then you might as well declare the
> variable in the more appropriate place at the same time – i.e. with
> 'opfamily' declaration. Anyway, I've given my reason a couple of times
> now, so if you don't want to change it I won't about it debate
> anymore.
>

I agree with this reasoning.

--
With Regards,
Amit Kapila.



On Wed, 8 Mar 2023 at 21:46, Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Hi Vignesh C,
>
>>
>>
>> Few comments
>> 1) Maybe this change is not required:
>>     fallback if no other solution is possible.  If a replica identity other
>>     than <quote>full</quote> is set on the publisher side, a replica identity
>> -   comprising the same or fewer columns must also be set on the subscriber
>> -   side.  See <xref linkend="sql-altertable-replica-identity"/> for details on
>> +   comprising the same or fewer columns must also be set on the
>> subscriber side.
>> +   See <xref linkend="sql-altertable-replica-identity"/> for details on
>
>
> Yes, fixed.
>>
>>
>> 2) Variable declaration and the assignment can be split so that the
>> readability is better:
>> +
>> +               bool            isUsableIndex =
>> +                       IsIndexUsableForReplicaIdentityFull(indexInfo);
>> +
>> +               index_close(indexRelation, AccessShareLock);
>> +
>
>
> Hmm, can you please elaborate more on this? The declaration
> and assignment are already on different lines.
>
> ps: pgindent changed this line a bit. Does that look better?

I thought of changing it to something like below:
bool isUsableIndex;
Oid idxoid = lfirst_oid(lc);
Relation indexRelation = index_open(idxoid, AccessShareLock);
IndexInfo  *indexInfo = BuildIndexInfo(indexRelation);

isUsableIndex = IsIndexUsableForReplicaIdentityFull(indexInfo);

Regards,
Vignesh



On Wed, Mar 8, 2023 at 8:44 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>>
>> I felt that once you remove the create publication/subscription/wait
>> for sync steps, the test execution might become faster and save some
>> time in the local execution, cfbot and the various machines in
>> buildfarm. If the execution time will not reduce, then no need to
>> change.
>>
>
> So, as I noted earlier, there are different schemas. As far as I count, there are at least
> 7 different table definitions. I think all tables having the same name are maybe confusing?
>
> Even if I try to group the same table definitions, and avoid create publication/subscription/wait
> for sync steps, the total execution time of the test drops only ~5%. As far as I test, that does not
> seem to be the bottleneck for the tests.
>
> Well, I'm really not sure if it is really worth doing that. I think having each test independent of each
> other is really much easier to follow.
>

This new test takes ~9s on my machine whereas most other tests in
subscription/t take roughly 2-5s. I feel we should try to reduce the
test timing without sacrificing much of the functionality or code
coverage.  I think if possible we should try to reduce setup/teardown
cost for each separate test by combining them where possible. I have a
few comments on tests which also might help to optimize these tests.

1.
+# Testcase start: SUBSCRIPTION USES INDEX
+#
+# Basic test where the subscriber uses index
+# and only updates 1 row and deletes
+# 1 other row
...
...
+# Testcase start: SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS
+#
+# Basic test where the subscriber uses index
+# and updates 50 rows

...
+# Testcase start: SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS
+#
+# Basic test where the subscriber uses index
+# and deletes 200 rows

I think to a good extent these tests overlap each other. I think we
can have just one test for the index with multiple columns that
updates multiple rows and have both updates and deletes.

2.
+# Testcase start: SUBSCRIPTION DOES NOT USE PARTIAL INDEX
...
...
+# Testcase start: SUBSCRIPTION DOES NOT USE INDEXES WITH ONLY EXPRESSIONS

Instead of having separate tests where we do all setups again, I think
it would be better if we create both the indexes in one test and show
that none of them is used.

3.
+# now, the update could either use the test_replica_id_full_idy or
test_replica_id_full_idy index
+# it is not possible for user to control which index to use

The name of the second index is wrong in the above comment.

4.
+# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN

As we have removed enable_indexscan check, you should remove this test.

5. In general, the line length seems to vary a lot for different
multi-line comments. Though we are not strict in that it will look
better if there is consistency in that (let's have ~80 cols line
length for each comment in a single line).

--
With Regards,
Amit Kapila.



On Thu, Mar 9, 2023 at 3:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Mar 8, 2023 at 8:44 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> >
> >>
> >> I felt that once you remove the create publication/subscription/wait
> >> for sync steps, the test execution might become faster and save some
> >> time in the local execution, cfbot and the various machines in
> >> buildfarm. If the execution time will not reduce, then no need to
> >> change.
> >>
> >
> > So, as I noted earlier, there are different schemas. As far as I count, there are at least
> > 7 different table definitions. I think all tables having the same name are maybe confusing?
> >
> > Even if I try to group the same table definitions, and avoid create publication/subscription/wait
> > for sync steps, the total execution time of the test drops only ~5%. As far as I test, that does not
> > seem to be the bottleneck for the tests.
> >
> > Well, I'm really not sure if it is really worth doing that. I think having each test independent of each
> > other is really much easier to follow.
> >
>
> This new test takes ~9s on my machine whereas most other tests in
> subscription/t take roughly 2-5s. I feel we should try to reduce the
> test timing without sacrificing much of the functionality or code
> coverage.  I think if possible we should try to reduce setup/teardown
> cost for each separate test by combining them where possible. I have a
> few comments on tests which also might help to optimize these tests.
>

To avoid culling useful tests just because they take too long to run I
have often thought we should separate some of the useful (but costly)
subscription tests from the mainstream other tests. Then they won't
cost any extra time for the build-farm, but at least we can still run
them on-demand using PG_TEST_EXTRA [1] approach if we really want to.

------
[1] https://www.postgresql.org/docs/devel/regress-run.html

Kind Regards,
Peter Smith.
Fujitsu Austrlia



On Thu, Mar 9, 2023 at 10:37 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Thu, Mar 9, 2023 at 3:49 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Mar 8, 2023 at 8:44 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> > >
> > >>
> > >> I felt that once you remove the create publication/subscription/wait
> > >> for sync steps, the test execution might become faster and save some
> > >> time in the local execution, cfbot and the various machines in
> > >> buildfarm. If the execution time will not reduce, then no need to
> > >> change.
> > >>
> > >
> > > So, as I noted earlier, there are different schemas. As far as I count, there are at least
> > > 7 different table definitions. I think all tables having the same name are maybe confusing?
> > >
> > > Even if I try to group the same table definitions, and avoid create publication/subscription/wait
> > > for sync steps, the total execution time of the test drops only ~5%. As far as I test, that does not
> > > seem to be the bottleneck for the tests.
> > >
> > > Well, I'm really not sure if it is really worth doing that. I think having each test independent of each
> > > other is really much easier to follow.
> > >
> >
> > This new test takes ~9s on my machine whereas most other tests in
> > subscription/t take roughly 2-5s. I feel we should try to reduce the
> > test timing without sacrificing much of the functionality or code
> > coverage.  I think if possible we should try to reduce setup/teardown
> > cost for each separate test by combining them where possible. I have a
> > few comments on tests which also might help to optimize these tests.
> >
>
> To avoid culling useful tests just because they take too long to run I
> have often thought we should separate some of the useful (but costly)
> subscription tests from the mainstream other tests. Then they won't
> cost any extra time for the build-farm, but at least we can still run
> them on-demand using PG_TEST_EXTRA [1] approach if we really want to.
>

I don't think that is relevant here. It is mostly about removing
duplicate work we are doing in tests. I don't see anything in the
tests that should require a long time to complete.

--
With Regards,
Amit Kapila.



Hi Peter,



1.
In my previous review [1] (comment #1) I wrote that only some of the
"should" were misleading and gave examples where to change. But I
didn't say that *every* usage of that word was wrong, so your global
replace of "should" to "must" has modified a couple of places in
unexpected ways.

Details are in subsequent review comments below -- see #2b, #3, #5.

Ah, that was my mistake. Thanks for thorough review on this.
 

======
doc/src/sgml/logical-replication.sgml

2.
A published table must have a “replica identity” configured in order
to be able to replicate UPDATE and DELETE operations, so that
appropriate rows to update or delete can be identified on the
subscriber side. By default, this is the primary key, if there is one.
Another unique index (with certain additional requirements) can also
be set to be the replica identity. If the table does not have any
suitable key, then it can be set to replica identity “full”, which
means the entire row becomes the key. When replica identity “full” is
specified, indexes can be used on the subscriber side for searching
the rows. Candidate indexes must be btree, non-partial, and have at
least one column reference (i.e. cannot consist of only expressions).
These restrictions on the non-unique index properties adheres to some
of the restrictions that are enforced for primary keys. Internally, we
follow a similar approach for supporting index scans within logical
replication scope. If there are no such suitable indexes, the search
on the subscriber side can be very inefficient, therefore replica
identity “full” must only be used as a fallback if no other solution
is possible. If a replica identity other than “full” is set on the
publisher side, a replica identity comprising the same or fewer
columns must also be set on the subscriber side. See REPLICA IDENTITY
for details on how to set the replica identity. If a table without a
replica identity is added to a publication that replicates UPDATE or
DELETE operations then subsequent UPDATE or DELETE operations will
cause an error on the publisher. INSERT operations can proceed
regardless of any replica identity.

~~

2a.
My previous review [1] (comment #2)  was not fixed quite as suggested.

Please change:
"adheres to" --> "adhere to"


Oops, it seems I only got the "to" part of your suggestion and missed "s".

Done now.
 
~~

2b. should/must

This should/must change was OK as it was before, because here it is only advice.

Please change this back how it was:
"must only be used as a fallback" --> "should only be used as a fallback"

Thanks, changed.
 

======
src/backend/executor/execReplication.c

3. build_replindex_scan_key

 /*
  * Setup a ScanKey for a search in the relation 'rel' for a tuple 'key' that
  * is setup to match 'rel' (*NOT* idxrel!).
  *
- * Returns whether any column contains NULLs.
+ * Returns how many columns must be used for the index scan.
+ *

~

This should/must change does not seem quite right.

SUGGESTION (reworded)
Returns how many columns to use for the index scan.

Fixed. 

(I wish we had a simpler process to incorporate such
comments.)

 

~~~

4. build_replindex_scan_key

>
> Based on the discussions below, I kept as-is. I really don't want to do unrelated
> changes in this patch, as I also got several feedback for not doing it,
>

Hmm, although this code pre-existed I don’t consider this one as
"unrelated changes" because the patch introduced the new "if
(!AttributeNumberIsValid(table_attno))" which changed things.  As I
wrote to Amit yesterday [2] IMO it would be better to do the 'opttype'
assignment *after* the potential 'continue' otherwise there is every
chance that the assignment is just redundant. And if you move the
assignment where it belongs, then you might as well declare the
variable in the more appropriate place at the same time – i.e. with
'opfamily' declaration. Anyway, I've given my reason a couple of times
now, so if you don't want to change it I won't about it debate
anymore.

Alright, given both you and Amit [1] agree on this, I'll follow that.

 

======
src/backend/replication/logical/relation.c

5. FindUsableIndexForReplicaIdentityFull

+ * XXX: There are no fundamental problems for supporting non-btree indexes.
+ * We mostly need to relax the limitations in RelationFindReplTupleByIndex().
+ * For partial indexes, the required changes are likely to be larger. If
+ * none of the tuples satisfy the expression for the index scan, we must
+ * fall-back to sequential execution, which might not be a good idea in some
+ * cases.

The should/must change (the one in the XXX comment) does not seem quite right.

SUGGESTION
"we must fall-back to sequential execution" --> "we fallback to
sequential execution"

fixed, thanks.
 

======
.../subscription/t/032_subscribe_use_index.pl

FYI, I get TAP test in error (Note - this is when only patch 0001 is appied)

t/031_column_list.pl ............... ok
t/032_subscribe_use_index.pl ....... 19/?
#   Failed test 'ensure subscriber has not used index with
enable_indexscan=false'
#   at t/032_subscribe_use_index.pl line 806.
#          got: '1'
#     expected: '0'
t/032_subscribe_use_index.pl ....... 21/? # Looks like you failed 1 test of 22.
t/032_subscribe_use_index.pl ....... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/22 subtests
t/100_bugs.pl ...................... ok

AFAICT there is a test case that is testing the patch 0002
functionality even when patch 0002 is not applied yet.


Oops, I somehow managed to make the same rebase mistake. I fixed this,
and for next time I'll make sure that each commit passes the CI separately.
Sorry for the noise.

I'll attach the changes on v38 in the next e-mail.

Thanks,
Onder KALACI

Hi Vignesh C,


> Hmm, can you please elaborate more on this? The declaration
> and assignment are already on different lines.
>
> ps: pgindent changed this line a bit. Does that look better?

I thought of changing it to something like below:
bool isUsableIndex;
Oid idxoid = lfirst_oid(lc);
Relation indexRelation = index_open(idxoid, AccessShareLock);
IndexInfo  *indexInfo = BuildIndexInfo(indexRelation);

isUsableIndex = IsIndexUsableForReplicaIdentityFull(indexInfo);


Alright, this looks slightly better. I did a small change to your suggestion, basically kept  lfirst_oid 
as the first statement in the loop.

I'll attach the changes on v38 in the next e-mail.


Thanks,
Onder KALACI
Hi Amit, all


>

This new test takes ~9s on my machine whereas most other tests in
subscription/t take roughly 2-5s. I feel we should try to reduce the
test timing without sacrificing much of the functionality or code
coverage.
  
Alright, that is reasonable.
 
I think if possible we should try to reduce setup/teardown
cost for each separate test by combining them where possible. I have a
few comments on tests which also might help to optimize these tests.

1.
+# Testcase start: SUBSCRIPTION USES INDEX
+#
+# Basic test where the subscriber uses index
+# and only updates 1 row and deletes
+# 1 other row
...
...
+# Testcase start: SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS
+#
+# Basic test where the subscriber uses index
+# and updates 50 rows

...
+# Testcase start: SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS
+#
+# Basic test where the subscriber uses index
+# and deletes 200 rows

I think to a good extent these tests overlap each other. I think we
can have just one test for the index with multiple columns that
updates multiple rows and have both updates and deletes.

Alright, dropped SUBSCRIPTION USES INDEX, expanded 
SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS with an UPDATE
that touches multiple rows


2.
+# Testcase start: SUBSCRIPTION DOES NOT USE PARTIAL INDEX
...
...
+# Testcase start: SUBSCRIPTION DOES NOT USE INDEXES WITH ONLY EXPRESSIONS

Instead of having separate tests where we do all setups again, I think
it would be better if we create both the indexes in one test and show
that none of them is used.

Makes sense
 

3.
+# now, the update could either use the test_replica_id_full_idy or
test_replica_id_full_idy index
+# it is not possible for user to control which index to use

The name of the second index is wrong in the above comment.

thanks, fixed
 

4.
+# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN

As we have removed enable_indexscan check, you should remove this test.

Hmm, I think my rebase problems are causing confusion here, which v38 fixes.

In the first commit, we have ENABLE_INDEXSCAN checks. In the second commit,
I changed the same test to use enable_replica_identity_full_index_scan.

If we are going to only consider the first patch to get into the master branch,
I can probably drop the test. In that case, I'm not sure what is our perspective
on ENABLE_INDEXSCAN GUC. Do we want to keep that guard in the code 
(hence the test)?
 

5. In general, the line length seems to vary a lot for different
multi-line comments. Though we are not strict in that it will look
better if there is consistency in that (let's have ~80 cols line
length for each comment in a single line).


Went over the tests, and made ~80 cols. There is one exception, in the first
commit, the test for enable_indexcan is still shorter, but I failed to make that
properly. I'll try to fix that as well, but I didn't want to block the progress due to
that. 

Also, you have not noted, but I think SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS
already covers  SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS.

So, I changed the first one to SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS
and dropped the second one. Let me know if it does not make sense to you. If I try, there are few more
opportunities to squeeze in some more tests together, but those would start to complicate readability.

Attached v38.

Thanks,
Onder KALACI  
Attachment
On Thu, Mar 9, 2023 at 3:26 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>>
>> 4.
>> +# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN
>>
>> As we have removed enable_indexscan check, you should remove this test.
>
>
> Hmm, I think my rebase problems are causing confusion here, which v38 fixes.
>

I think it is still not fixed in v38 as the test is still present in 0001.

> In the first commit, we have ENABLE_INDEXSCAN checks. In the second commit,
> I changed the same test to use enable_replica_identity_full_index_scan.
>
> If we are going to only consider the first patch to get into the master branch,
> I can probably drop the test. In that case, I'm not sure what is our perspective
> on ENABLE_INDEXSCAN GUC. Do we want to keep that guard in the code
> (hence the test)?
>

I am not sure what we are going to do on this because I feel we need
some storage option as you have in 0002 patch but you and Andres
thinks that is not required. So, we can discuss that a bit more after
0001 is committed but if there is no agreement then we need to
probably drop it. Even if drop it, I don't think using enable_index
makes sense. I think for now you don't need to send 0002 patch, let's
first try to get 0001 patch and then we can discuss about 0002.

>>
>>
>> 5. In general, the line length seems to vary a lot for different
>> multi-line comments. Though we are not strict in that it will look
>> better if there is consistency in that (let's have ~80 cols line
>> length for each comment in a single line).
>>
>
> Went over the tests, and made ~80 cols. There is one exception, in the first
> commit, the test for enable_indexcan is still shorter, but I failed to make that
> properly. I'll try to fix that as well, but I didn't want to block the progress due to
> that.
>
> Also, you have not noted, but I think SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS
> already covers  SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS.
>
> So, I changed the first one to SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS
> and dropped the second one. Let me know if it does not make sense to you. If I try, there are few more
> opportunities to squeeze in some more tests together, but those would start to complicate readability.
>

I still want to reduce the test time and will think about it. Which of
the other tests do you think can be combined?

BTW, did you consider updating the patch based on my yesterday's email [1]?

One minor comment:
+# now, create index and see that the index is used
+$node_subscriber->safe_psql('postgres',
+ "CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x)");
+
+# wait until the index is created
+$node_subscriber->poll_query_until(
+ 'postgres', q{select count(*)=1 from pg_stat_all_indexes where
indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for creating index test_replica_id_full_idx";
+
+$node_publisher->safe_psql('postgres',
+ "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 15;");
+$node_publisher->wait_for_catchup($appname);
+
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select (idx_scan = 1) from pg_stat_all_indexes where
indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full updates one row via index";
+
+
+# now, create index on column y as well
+$node_subscriber->safe_psql('postgres',
+ "CREATE INDEX test_replica_id_full_idy ON test_replica_id_full(y)");
+
+# wait until the index is created
+$node_subscriber->poll_query_until(
+ 'postgres', q{select count(*)=1 from pg_stat_all_indexes where
indexrelname = 'test_replica_id_full_idy';}
+) or die "Timed out while waiting for creating index test_replica_id_full_idy";

It appears you are using inconsistent spacing. It may be better to use
a single empty line wherever required.

[1] - https://www.postgresql.org/message-id/CAA4eK1%2BoM_v-b_WDHZmqCyVHU2oD4j3vF9YcH9xVHj%3DzAfy4og%40mail.gmail.com

--
With Regards,
Amit Kapila.



On Thu, Mar 9, 2023 at 4:50 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Mar 9, 2023 at 3:26 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> >
> >
> > So, I changed the first one to SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS
> > and dropped the second one. Let me know if it does not make sense to you. If I try, there are few more
> > opportunities to squeeze in some more tests together, but those would start to complicate readability.
> >
>
> I still want to reduce the test time and will think about it. Which of
> the other tests do you think can be combined?
>

Some of the ideas I can think of are as follows:

1. Combine "SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS"
and "SUBSCRIPTION USES INDEX WITH DROPPED COLUMNS" such that after
verifying updates and deletes of the first test, we can drop some of
the columns on both publisher and subscriber, then use alter
subscription ... refresh publication command and then do the steps of
the second test. Note that we don't add tables after initial setup,
only changing schema.

2. We can also combine "Some NULL values" and "PUBLICATION LACKS THE
COLUMN ON THE SUBS INDEX" as both use the same schema. After the first
test, we need to drop the existing index and create a new index on the
subscriber node.

3. General comment
+# updates 200 rows
+$node_publisher->safe_psql('postgres',
+ "UPDATE test_replica_id_full SET x = x + 1 WHERE x IN (5, 6);");

I think here you are updating 20 rows not 200. So, the comment seems
wrong to me.

Please think more and see if we can combine some other tests like
"Unique index that is not primary key or replica identity" and the
test we will have after comment#2 above.

--
With Regards,
Amit Kapila.



Hi,

Amit Kapila <amit.kapila16@gmail.com>, 8 Mar 2023 Çar, 14:42 tarihinde şunu yazdı:
On Wed, Mar 8, 2023 at 4:51 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>
>>
>> I just share this case and then we
>> can discuss should we pick the index which only contain the extra columns on the
>> subscriber.
>>
>
> I think its performance implications come down to the discussion on [1]. Overall, I prefer
> avoiding adding any additional complexity in the code for some edge cases. The code
> can handle this sub-optimal user pattern, with a sub-optimal performance.
>

It is fine to leave this and Hou-San's case if they make the patch
complex. However, it may be better to give it a try and see if this or
other regression/optimization can be avoided without adding much
complexity to the patch. You can prepare a top-up patch and then we
can discuss it.



Alright, I did some basic prototypes for the problems mentioned, just to show
that these problems can be solved without too much hassle. But, the patchees
are not complete, some tests fail, no comments / tests exist, some values should be
cached etc.  Mostly sharing as a heads up and sharing the progress given I have not
responded to this specific mail. I'll update these when I have some extra time after
replying to the 0001 patch. 

 Thanks,
Onder
Attachment
Hi Amit,


>
>>
>> 4.
>> +# Testcase start: SUBSCRIPTION BEHAVIOR WITH ENABLE_INDEXSCAN
>>
>> As we have removed enable_indexscan check, you should remove this test.
>
>
> Hmm, I think my rebase problems are causing confusion here, which v38 fixes.
>

I think it is still not fixed in v38 as the test is still present in 0001.

Ah, yes, sorry again for the noise. v39 will drop that.
 

> In the first commit, we have ENABLE_INDEXSCAN checks. In the second commit,
> I changed the same test to use enable_replica_identity_full_index_scan.
>
> If we are going to only consider the first patch to get into the master branch,
> I can probably drop the test. In that case, I'm not sure what is our perspective
> on ENABLE_INDEXSCAN GUC. Do we want to keep that guard in the code
> (hence the test)?
>

I am not sure what we are going to do on this because I feel we need
some storage option as you have in 0002 patch but you and Andres
thinks that is not required. So, we can discuss that a bit more after
0001 is committed but if there is no agreement then we need to
probably drop it. Even if drop it, I don't think using enable_index
makes sense. I think for now you don't need to send 0002 patch, let's
first try to get 0001 patch and then we can discuss about 0002.

sounds good, when needed I'll rebase 0002. 
 
>
> Also, you have not noted, but I think SUBSCRIPTION USES INDEX WITH MULTIPLE COLUMNS
> already covers  SUBSCRIPTION USES INDEX UPDATEs MULTIPLE ROWS.
>
> So, I changed the first one to SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS
> and dropped the second one. Let me know if it does not make sense to you. If I try, there are few more
> opportunities to squeeze in some more tests together, but those would start to complicate readability.
>

I still want to reduce the test time and will think about it. Which of
the other tests do you think can be combined?


I'll follow your suggestion in the next e-mail [2], and focus on further improvements.

BTW, did you consider updating the patch based on my yesterday's email [1]?


Yes, replied to that one just now with some wip commits [1]
 
It appears you are using inconsistent spacing. It may be better to use
a single empty line wherever required.


Sure, let me fix those.

attached v39. 

Attachment
Here are some review comments for patch v39-0001 (mostly the test code).

======
src/backend/replication/logical/relation.c

1. FindUsableIndexForReplicaIdentityFull

+static Oid
+FindUsableIndexForReplicaIdentityFull(Relation localrel)
+{
+ List    *indexlist = RelationGetIndexList(localrel);
+ Oid usableIndex = InvalidOid;
+ ListCell   *lc;
+
+ foreach(lc, indexlist)
+ {
+ Oid idxoid = lfirst_oid(lc);
+ bool isUsableIndex;
+ Relation indexRelation = index_open(idxoid, AccessShareLock);
+ IndexInfo  *indexInfo = BuildIndexInfo(indexRelation);
+
+ isUsableIndex = IsIndexUsableForReplicaIdentityFull(indexInfo);
+
+ index_close(indexRelation, AccessShareLock);
+
+ if (isUsableIndex)
+ {
+ /* we found one eligible index, don't need to continue */
+ usableIndex = idxoid;
+ break;
+ }
+ }
+
+ return usableIndex;
+}

This comment is not functional -- if you prefer the code as-is, then
ignore this comment.

But, personally I would:
- Move some of that code from the declarations. I feel it would be
better if the index_open/index_close were both in the code-body
instead of half in declarations and half not.
- Remove the 'usableIndex' variable, and just return directly.
- Shorten all the long names (and use consistent 'idx' instead of
sometimes 'idx' and sometimes 'index')

SUGGESTION (YMMV)

static Oid
FindUsableIndexForReplicaIdentityFull(Relation localrel)
{
List    *idxlist = RelationGetIndexList(localrel);
ListCell   *lc;

foreach(lc, idxlist)
{
Oid idxoid = lfirst_oid(lc);
bool isUsableIdx;
Relation idxRel;
IndexInfo  *idxInfo;

idxRel = index_open(idxoid, AccessShareLock);
idxInfo = BuildIndexInfo(idxRel);
isUsableIdx = IsIndexUsableForReplicaIdentityFull(idxInfo);
index_close(idxRel, AccessShareLock);

/* Return the first eligible index found */
if (isUsableIdx)
return idxoid;
}

return InvalidOid;
}

======
.../subscription/t/032_subscribe_use_index.pl

2. SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES

2a.
# Testcase start: SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES
#
# This test ensures that after CREATE INDEX, the subscriber can automatically
# use one of the indexes (provided that it fulfils the requirements).
# Similarly, after DROP index, the subscriber can automatically switch to
# sequential scan


The last sentence is missing full-stop.

~

2b.
# now, create index and see that the index is used
$node_subscriber->safe_psql('postgres',
    "CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x)");

Don't say "and see that the index is used". Yes, that is what this
whole test is doing, but it is not what the psql following this
comment is doing.

~

2c.
$node_publisher->safe_psql('postgres',
    "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 15;");
$node_publisher->wait_for_catchup($appname);


# wait until the index is used on the subscriber

The double blank lines here should be single.

~

2d.
# now, the update could either use the test_replica_id_full_idx or
# test_replica_id_full_idy index it is not possible for user to control
# which index to use

This sentence should break at "it".

Aso "user" --> "the user"

SUGGESTION
# now, the update could either use the test_replica_id_full_idx or
# test_replica_id_full_idy index; it is not possible for the user to control
# which index to use

~

2e.
# let's also test dropping test_replica_id_full_idy and
# hence use test_replica_id_full_idx


I think you ought to have dropped the other (first) index because we
already know that the first index had been used (from earlier), but we
are not 100% sure if the 'y' index has been chosen yet.

~~~~

3. SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS

3a.
# deletes 20 rows
$node_publisher->safe_psql('postgres',
    "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");

# updates 20 rows
$node_publisher->safe_psql('postgres',
    "UPDATE test_replica_id_full SET x = 100, y = '200' WHERE x IN (1, 2);");


"deletes" --> "delete"

"updates" --> "update"

~~~

4. SUBSCRIPTION USES INDEX WITH DROPPED COLUMNS

# updates 200 rows
$node_publisher->safe_psql('postgres',
    "UPDATE test_replica_id_full SET x = x + 1 WHERE x IN (5, 6);");


"updates" --> "update"

"200 rows" ??? is that right --  20 maybe ???

~~~

5. SUBSCRIPTION USES INDEX ON PARTITIONED TABLES

5a.
# updates rows and moves between partitions
$node_publisher->safe_psql('postgres',
    "UPDATE users_table_part SET value_1 = 0 WHERE user_id = 4;");

"updates rows and moves between partitions" --> "update rows, moving
them to other partitions"

~

5b.
# deletes rows from different partitions


"deletes" --> "delete"

~~~

6. SUBSCRIPTION DOES NOT USE INDEXES WITH ONLY EXPRESSIONS OR PARTIAL INDEX

6a.
# update 2 rows
$node_publisher->safe_psql('postgres',
    "UPDATE people SET firstname = 'Nan' WHERE firstname = 'first_name_1';");
$node_publisher->safe_psql('postgres',
    "UPDATE people SET firstname = 'Nan' WHERE firstname =
'first_name_2' AND lastname = 'last_name_2';");

IMO 'Nan' seemed a curious name to assign as a test value, because it
seems like it has a special meaning but in reality, I don't think it
does. Even 'xxx' would be better.

~

6b.
# make sure none of the indexes is not used on the subscriber
$result = $node_subscriber->safe_psql('postgres',
    "select sum(idx_scan) from pg_stat_all_indexes where indexrelname
IN ('people_names_expr_only', 'people_names_partial')");
is($result, qq(0), 'ensure subscriber tap_sub_rep_full updates two
rows via seq. scan with index on expressions');

~

Looks like a typo in this comment: "none of the indexes is not used" ???

~~~

7. SUBSCRIPTION CAN USE INDEXES WITH EXPRESSIONS AND COLUMNS

7a.
# update 2 rows
$node_publisher->safe_psql('postgres',
    "UPDATE people SET firstname = 'Nan' WHERE firstname = 'first_name_1';");
$node_publisher->safe_psql('postgres',
    "UPDATE people SET firstname = 'Nan' WHERE firstname =
'first_name_3' AND lastname = 'last_name_3';");

Same as #6a, 'Nan' seems like a strange test value to assign to a name.

~~~

8. Some NULL values

$node_publisher->safe_psql('postgres',
    "INSERT INTO test_replica_id_full VALUES (1), (2), (3);");
$node_publisher->safe_psql('postgres',
    "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;");
$node_publisher->safe_psql('postgres',
    "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 3;");

~

8a.
For some reason, this insert/update psql was not commented.

~

8b.
Maybe this test data could be more obvious by explicitly inserting the NULLs?

~~~

9. Unique index that is not primary key or replica identity

9a.
Why are other "Testcase start" comments all uppercase but not this one?

~~~

10. SUBSCRIPTION USES INDEX WITH PUB/SUB different data

Why is there a mixed case in this "Test case:" comment?

~~~

11. PUBLICATION LACKS THE COLUMN ON THE SUBS INDEX

11a.
# The subsriber has an additional column compared to publisher,
# and the index is on that column. We still pick the index scan
# on the subscriber even though it is practically similar to
# sequential scan

Typo "subsriber"

Missing full-stop on last sentence.

~

11b.
# make sure that the subscriber has the correct data
# we only deleted 1 row
$result = $node_subscriber->safe_psql('postgres',
    "SELECT sum(x) FROM test_replica_id_full");
is($result, qq(232), 'ensure subscriber has the correct data at the
end of the test');


Why does that say "deleted 1 row", when the previous operation was not a DELETE?


------
Kind Regards,
Peter Smith.
Fujitsu Australia



On Thu, Mar 9, 2023 at 5:47 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Amit Kapila <amit.kapila16@gmail.com>, 8 Mar 2023 Çar, 14:42 tarihinde şunu yazdı:
>>
>> On Wed, Mar 8, 2023 at 4:51 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>> >
>> >
>> >>
>> >> I just share this case and then we
>> >> can discuss should we pick the index which only contain the extra columns on the
>> >> subscriber.
>> >>
>> >
>> > I think its performance implications come down to the discussion on [1]. Overall, I prefer
>> > avoiding adding any additional complexity in the code for some edge cases. The code
>> > can handle this sub-optimal user pattern, with a sub-optimal performance.
>> >
>>
>> It is fine to leave this and Hou-San's case if they make the patch
>> complex. However, it may be better to give it a try and see if this or
>> other regression/optimization can be avoided without adding much
>> complexity to the patch. You can prepare a top-up patch and then we
>> can discuss it.
>>
>>
>
> Alright, I did some basic prototypes for the problems mentioned, just to show
> that these problems can be solved without too much hassle. But, the patchees
> are not complete, some tests fail, no comments / tests exist, some values should be
> cached etc.  Mostly sharing as a heads up and sharing the progress given I have not
> responded to this specific mail. I'll update these when I have some extra time after
> replying to the 0001 patch.
>

wip_for_optimize_index_column_match
+static bool
+IndexContainsAnyRemoteColumn(IndexInfo  *indexInfo,
+ LogicalRepRelation  *remoterel)
+{
+ for (int i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {

Wouldn't it be better to just check if the first column is not part of
the remote column then we can skip that index?

In wip_optimize_for_non_pkey_non_ri_unique_index patch, irrespective
of whether we want to retain this set of changes, the function name
IsIdxSafeToSkipDuplicates() sounds better than
IdxIsRelationIdentityOrPK() and we can even change the check to
GetRelationIdentityOrPK() instead of separate checks replica index and
PK. So, it would be better to include this part of the change (a.
change the function name to IsIdxSafeToSkipDuplicates() and (b) change
the check to use GetRelationIdentityOrPK()) in the main patch.

--
With Regards,
Amit Kapila.



Hi Amit, all



Some of the ideas I can think of are as follows:

1. Combine "SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS"
and "SUBSCRIPTION USES INDEX WITH DROPPED COLUMNS" such that after
verifying updates and deletes of the first test, we can drop some of
the columns on both publisher and subscriber, then use alter
subscription ... refresh publication command and then do the steps of
the second test. Note that we don't add tables after initial setup,
only changing schema.

Done with an important caveat. I think this reorganization of the test helped
us to find one edge case regarding dropped columns.

I realized that the dropped columns also get into the tuples_equal() function. And,
the remote sends NULL to for the dropped columns(i.e., remoteslot), but 
index_getnext_slot() (or table_scan_getnextslot) indeed fills the dropped
columns on the outslot. So, the dropped columns are not NULL in the outslot.

This triggers tuples_equal to fail. To fix that, I improved the tuples_equal
such that it skips the dropped columns.

I also spend quite a bit of time understanding how/why this impacts
HEAD. See steps below on HEAD, where REPLICA IDENTITY FULL
fails to replica the data properly:


-- pub
CREATE TABLE test (drop_1 jsonb, x int, drop_2 numeric, y text, drop_3 timestamptz);
ALTER TABLE test REPLICA IDENTITY FULL;
INSERT INTO test SELECT NULL, i, i, (i)::text, now() FROM generate_series(0,1)i;
CREATE PUBLICATION pub FOR ALL TABLES;

-- sub
CREATE TABLE test (drop_1 jsonb, x int, drop_2 numeric, y text, drop_3 timestamptz);
CREATE SUBSCRIPTION sub CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION pub;

-- show that before dropping the columns, the data in the source and
-- target are deleted properly
DELETE FROM test WHERE x = 0;

-- both on the source and target
SELECT count(*) FROM test WHERE x = 0;
┌───────┐
│ count │
├───────┤
│     0 │
└───────┘
(1 row)

-- drop columns on both the the source
ALTER TABLE test DROP COLUMN drop_1;
ALTER TABLE test DROP COLUMN drop_2;
ALTER TABLE test DROP COLUMN drop_3;

-- drop columns on both the the target
ALTER TABLE test DROP COLUMN drop_1;
ALTER TABLE test DROP COLUMN drop_2;
ALTER TABLE test DROP COLUMN drop_3;

-- on the target
ALTER SUBSCRIPTION sub REFRESH PUBLICATION;


-- after dropping the columns
DELETE FROM test WHERE x = 1;

-- source
SELECT count(*) FROM test WHERE x = 1;
┌───────┐
│ count │
├───────┤
│     0 │
└───────┘
(1 row)

-- target, OOPS wrong result!!!!
SELECT count(*) FROM test WHERE x = 1;

┌───────┐
│ count │
├───────┤
│     1 │
└───────┘
(1 row)


Should we have a separate patch for the tuples_equal change so that we
might want to backport? Attached as v40_0001 on the patch.

Note that I need to have that commit as 0001 so that 0002 patch
passes the tests.


2. We can also combine "Some NULL values" and "PUBLICATION LACKS THE
COLUMN ON THE SUBS INDEX" as both use the same schema. After the first
test, we need to drop the existing index and create a new index on the
subscriber node.

done

 

3. General comment
+# updates 200 rows
+$node_publisher->safe_psql('postgres',
+ "UPDATE test_replica_id_full SET x = x + 1 WHERE x IN (5, 6);");

I think here you are updating 20 rows not 200. So, the comment seems
wrong to me.

I think I have fixed that in an earlier version because I cannot see this 
comment anymore.
 

Please think more and see if we can combine some other tests like
"Unique index that is not primary key or replica identity" and the
test we will have after comment#2 above.


I'll look for more opportunities and reply to the thread. I wanted to send
this mail so that you can have a look at (1) earlier.


Thanks,
Onder  
Attachment
Hi Peter, all


src/backend/replication/logical/relation.c

1. FindUsableIndexForReplicaIdentityFull

+static Oid
+FindUsableIndexForReplicaIdentityFull(Relation localrel)
+{
+ List    *indexlist = RelationGetIndexList(localrel);
+ Oid usableIndex = InvalidOid;
+ ListCell   *lc;
+
+ foreach(lc, indexlist)
+ {
+ Oid idxoid = lfirst_oid(lc);
+ bool isUsableIndex;
+ Relation indexRelation = index_open(idxoid, AccessShareLock);
+ IndexInfo  *indexInfo = BuildIndexInfo(indexRelation);
+
+ isUsableIndex = IsIndexUsableForReplicaIdentityFull(indexInfo);
+
+ index_close(indexRelation, AccessShareLock);
+
+ if (isUsableIndex)
+ {
+ /* we found one eligible index, don't need to continue */
+ usableIndex = idxoid;
+ break;
+ }
+ }
+
+ return usableIndex;
+}

This comment is not functional -- if you prefer the code as-is, then
ignore this comment.

But, personally I would:
- Move some of that code from the declarations. I feel it would be
better if the index_open/index_close were both in the code-body
instead of half in declarations and half not.
- Remove the 'usableIndex' variable, and just return directly.
- Shorten all the long names (and use consistent 'idx' instead of
sometimes 'idx' and sometimes 'index')

SUGGESTION (YMMV)

static Oid
FindUsableIndexForReplicaIdentityFull(Relation localrel)
{
List    *idxlist = RelationGetIndexList(localrel);
ListCell   *lc;

foreach(lc, idxlist)
{
Oid idxoid = lfirst_oid(lc);
bool isUsableIdx;
Relation idxRel;
IndexInfo  *idxInfo;

idxRel = index_open(idxoid, AccessShareLock);
idxInfo = BuildIndexInfo(idxRel);
isUsableIdx = IsIndexUsableForReplicaIdentityFull(idxInfo);
index_close(idxRel, AccessShareLock);

/* Return the first eligible index found */
if (isUsableIdx)
return idxoid;
}

return InvalidOid;
}

applied your suggestion. I think it made it slightly easier to follow.
 

======
.../subscription/t/032_subscribe_use_index.pl

2. SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES

2a.
# Testcase start: SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES
#
# This test ensures that after CREATE INDEX, the subscriber can automatically
# use one of the indexes (provided that it fulfils the requirements).
# Similarly, after DROP index, the subscriber can automatically switch to
# sequential scan


The last sentence is missing full-stop.


fixed
 
~

2b.
# now, create index and see that the index is used
$node_subscriber->safe_psql('postgres',
    "CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(x)");

Don't say "and see that the index is used". Yes, that is what this
whole test is doing, but it is not what the psql following this
comment is doing.

 true


~

2c.
$node_publisher->safe_psql('postgres',
    "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 15;");
$node_publisher->wait_for_catchup($appname);


# wait until the index is used on the subscriber

The double blank lines here should be single.

~

fixed, 
 

2d.
# now, the update could either use the test_replica_id_full_idx or
# test_replica_id_full_idy index it is not possible for user to control
# which index to use

This sentence should break at "it".

Aso "user" --> "the user"
SUGGESTION
# now, the update could either use the test_replica_id_full_idx or
# test_replica_id_full_idy index; it is not possible for the user to control
# which index to use


looks good, thanks
 
~

2e.
# let's also test dropping test_replica_id_full_idy and
# hence use test_replica_id_full_idx


I think you ought to have dropped the other (first) index because we
already know that the first index had been used (from earlier), but we
are not 100% sure if the 'y' index has been chosen yet.

 make sense. Though in general it is hard to check pg_stat_all_indexes
for any of the indexes on this test, as we don't know the exact number
of tuples for each. Just wanted to explicitly note



~~~~

3. SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS

3a.
# deletes 20 rows
$node_publisher->safe_psql('postgres',
    "DELETE FROM test_replica_id_full WHERE x IN (5, 6);");

# updates 20 rows
$node_publisher->safe_psql('postgres',
    "UPDATE test_replica_id_full SET x = 100, y = '200' WHERE x IN (1, 2);");


"deletes" --> "delete"

"updates" --> "update"

Well, I thought the command deletes but I guess delete looks better
 

~~~

4. SUBSCRIPTION USES INDEX WITH DROPPED COLUMNS

# updates 200 rows
$node_publisher->safe_psql('postgres',
    "UPDATE test_replica_id_full SET x = x + 1 WHERE x IN (5, 6);");


"updates" --> "update"

"200 rows" ??? is that right --  20 maybe ???


I guess this is from an earlier version of the patch, I fixed these types
of errors.
 
~~~

5. SUBSCRIPTION USES INDEX ON PARTITIONED TABLES

5a.
# updates rows and moves between partitions
$node_publisher->safe_psql('postgres',
    "UPDATE users_table_part SET value_1 = 0 WHERE user_id = 4;");

"updates rows and moves between partitions" --> "update rows, moving
them to other partitions"


fixed, thanks
 
~

5b.
# deletes rows from different partitions


"deletes" --> "delete"


fixed, and searched for similar errors but couldn't see any more

~~~
 

6. SUBSCRIPTION DOES NOT USE INDEXES WITH ONLY EXPRESSIONS OR PARTIAL INDEX

6a.
# update 2 rows
$node_publisher->safe_psql('postgres',
    "UPDATE people SET firstname = 'Nan' WHERE firstname = 'first_name_1';");
$node_publisher->safe_psql('postgres',
    "UPDATE people SET firstname = 'Nan' WHERE firstname =
'first_name_2' AND lastname = 'last_name_2';");

IMO 'Nan' seemed a curious name to assign as a test value, because it
seems like it has a special meaning but in reality, I don't think it
does. Even 'xxx' would be better.

changed to "no-name" as "xxx" also looks not so good 
 

~

6b.
# make sure none of the indexes is not used on the subscriber
$result = $node_subscriber->safe_psql('postgres',
    "select sum(idx_scan) from pg_stat_all_indexes where indexrelname
IN ('people_names_expr_only', 'people_names_partial')");
is($result, qq(0), 'ensure subscriber tap_sub_rep_full updates two
rows via seq. scan with index on expressions');

~

Looks like a typo in this comment: "none of the indexes is not used" ???

dropped "not"

 
~~~

7. SUBSCRIPTION CAN USE INDEXES WITH EXPRESSIONS AND COLUMNS

7a.
# update 2 rows
$node_publisher->safe_psql('postgres',
    "UPDATE people SET firstname = 'Nan' WHERE firstname = 'first_name_1';");
$node_publisher->safe_psql('postgres',
    "UPDATE people SET firstname = 'Nan' WHERE firstname =
'first_name_3' AND lastname = 'last_name_3';");

Same as #6a, 'Nan' seems like a strange test value to assign to a name.


similarly, changed to no-name
 
~~~

8. Some NULL values

$node_publisher->safe_psql('postgres',
    "INSERT INTO test_replica_id_full VALUES (1), (2), (3);");
$node_publisher->safe_psql('postgres',
    "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;");
$node_publisher->safe_psql('postgres',
    "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 3;");

~

8a.
For some reason, this insert/update psql was not commented.


added some
 
~

8b.
Maybe this test data could be more obvious by explicitly inserting the NULLs?


Well, that's a bit hard. We merged a few tests for perf reasons. And, I merged this
test with "missing column" test. Now, the NULL values are triggered due to
missing column on the source.
 
~~~

9. Unique index that is not primary key or replica identity

9a.
Why are other "Testcase start" comments all uppercase but not this one?


fixed, there was one more
 
~~~

10. SUBSCRIPTION USES INDEX WITH PUB/SUB different data

Why is there a mixed case in this "Test case:" comment?

no specific reason, fixed
 

~~~

11. PUBLICATION LACKS THE COLUMN ON THE SUBS INDEX

11a.
# The subsriber has an additional column compared to publisher,
# and the index is on that column. We still pick the index scan
# on the subscriber even though it is practically similar to
# sequential scan

Typo "subsriber"

I guess I fixed this in a recent iteration, I cannot find it.
 

Missing full-stop on last sentence.

Similarly, probably merged into another test.

Still went over the all such explanations in the test, and ensured
we have the full stop
 

~

11b.
# make sure that the subscriber has the correct data
# we only deleted 1 row
$result = $node_subscriber->safe_psql('postgres',
    "SELECT sum(x) FROM test_replica_id_full");
is($result, qq(232), 'ensure subscriber has the correct data at the
end of the test');


Why does that say "deleted 1 row", when the previous operation was not a DELETE?

 
Probably due to merging multiple tests into one. Fixed now.


Again, thanks for thorough review. Attached v41. 

See the reason for 0001 patch in [1].


Onder KALACI. 


 
Attachment
Hi Amit, all


wip_for_optimize_index_column_match
+static bool
+IndexContainsAnyRemoteColumn(IndexInfo  *indexInfo,
+ LogicalRepRelation  *remoterel)
+{
+ for (int i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
+ {

Wouldn't it be better to just check if the first column is not part of
the remote column then we can skip that index?

Reading [1], I think I can follow what you suggest. So, basically,
if the leftmost column is not filtered, we have the following:

 but the entire index would have to be scanned, so in most cases the planner would prefer a sequential table scan over using the index.

So, in our case, we could follow a similar approach. If the leftmost column of the index
is not sent over the wire from the pub, we can prefer the sequential scan.

Is my understanding of your suggestion accurate? 
 

In wip_optimize_for_non_pkey_non_ri_unique_index patch, irrespective
of whether we want to retain this set of changes, the function name
IsIdxSafeToSkipDuplicates() sounds better than
IdxIsRelationIdentityOrPK() and we can even change the check to
GetRelationIdentityOrPK() instead of separate checks replica index and
PK. So, it would be better to include this part of the change (a.
change the function name to IsIdxSafeToSkipDuplicates() and (b) change
the check to use GetRelationIdentityOrPK()) in the main patch.



I agree that it is a good change. Added to v42

Thanks,
Onder KALACI
 


Attachment
On Fri, Mar 10, 2023 at 5:16 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>>
>> wip_for_optimize_index_column_match
>> +static bool
>> +IndexContainsAnyRemoteColumn(IndexInfo  *indexInfo,
>> + LogicalRepRelation  *remoterel)
>> +{
>> + for (int i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
>> + {
>>
>> Wouldn't it be better to just check if the first column is not part of
>> the remote column then we can skip that index?
>
>
> Reading [1], I think I can follow what you suggest. So, basically,
> if the leftmost column is not filtered, we have the following:
>
>>  but the entire index would have to be scanned, so in most cases the planner would prefer a sequential table scan
overusing the index. 
>
>
> So, in our case, we could follow a similar approach. If the leftmost column of the index
> is not sent over the wire from the pub, we can prefer the sequential scan.
>
> Is my understanding of your suggestion accurate?
>

Yes. I request an opinion from Shi-San who has reported the problem.

--
With Regards,
Amit Kapila.



Hi Amit, all


I'll look for more opportunities and reply to the thread. I wanted to send
this mail so that you can have a look at (1) earlier.


I merged SUBSCRIPTION CREATE/DROP INDEX WORKS WITHOUT ISSUES
into SUBSCRIPTION CAN USE INDEXES WITH EXPRESSIONS AND COLUMNS.

Also, merged SUBSCRIPTION USES INDEX WITH PUB/SUB DIFFERENT DATA and
 A UNIQUE INDEX THAT IS NOT PRIMARY KEY OR REPLICA IDENTITY

So, we have 6 test cases left. I start to feel that trying to merge further is going to start making
the readability get worse. Do you have any further easy test case merge suggestions?

I think one option could be to drop some cases altogether, but not sure we'd want that.

As a semi-related question: Are you aware of any setting that'd make pg_stat_all_indexes
reflect the changes sooner? It is hard to debug what is the bottleneck in the tests, but
I have a suspicion that there might be several poll_query_until() calls on 
pg_stat_all_indexes, which might be the reason?

Attaching v43. 
Attachment
On Fri, Mar 10, 2023 at 3:25 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>>
>
> Done with an important caveat. I think this reorganization of the test helped
> us to find one edge case regarding dropped columns.
>
> I realized that the dropped columns also get into the tuples_equal() function. And,
> the remote sends NULL to for the dropped columns(i.e., remoteslot), but
> index_getnext_slot() (or table_scan_getnextslot) indeed fills the dropped
> columns on the outslot. So, the dropped columns are not NULL in the outslot.
>
> This triggers tuples_equal to fail. To fix that, I improved the tuples_equal
> such that it skips the dropped columns.
>

Good catch. By any chance, have you tried with generated columns? See
logicalrep_write_tuple()/logicalrep_write_attrs() where we neither
send anything for dropped columns nor for generated columns.
Similarly, on receiving side, in logicalrep_rel_open() and
slot_store_data(), we seem to be using NULL for such columns.

> I also spend quite a bit of time understanding how/why this impacts
> HEAD. See steps below on HEAD, where REPLICA IDENTITY FULL
> fails to replica the data properly:
>
>
> -- pub
> CREATE TABLE test (drop_1 jsonb, x int, drop_2 numeric, y text, drop_3 timestamptz);
> ALTER TABLE test REPLICA IDENTITY FULL;
> INSERT INTO test SELECT NULL, i, i, (i)::text, now() FROM generate_series(0,1)i;
> CREATE PUBLICATION pub FOR ALL TABLES;
>
> -- sub
> CREATE TABLE test (drop_1 jsonb, x int, drop_2 numeric, y text, drop_3 timestamptz);
> CREATE SUBSCRIPTION sub CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION pub;
>
> -- show that before dropping the columns, the data in the source and
> -- target are deleted properly
> DELETE FROM test WHERE x = 0;
>
> -- both on the source and target
> SELECT count(*) FROM test WHERE x = 0;
> ┌───────┐
> │ count │
> ├───────┤
> │     0 │
> └───────┘
> (1 row)
>
> -- drop columns on both the the source
> ALTER TABLE test DROP COLUMN drop_1;
> ALTER TABLE test DROP COLUMN drop_2;
> ALTER TABLE test DROP COLUMN drop_3;
>
> -- drop columns on both the the target
> ALTER TABLE test DROP COLUMN drop_1;
> ALTER TABLE test DROP COLUMN drop_2;
> ALTER TABLE test DROP COLUMN drop_3;
>
> -- on the target
> ALTER SUBSCRIPTION sub REFRESH PUBLICATION;
>
>
> -- after dropping the columns
> DELETE FROM test WHERE x = 1;
>
> -- source
> SELECT count(*) FROM test WHERE x = 1;
> ┌───────┐
> │ count │
> ├───────┤
> │     0 │
> └───────┘
> (1 row)
>
> -- target, OOPS wrong result!!!!
> SELECT count(*) FROM test WHERE x = 1;
>
> ┌───────┐
> │ count │
> ├───────┤
> │     1 │
> └───────┘
> (1 row)
>
>
> Should we have a separate patch for the tuples_equal change so that we
> might want to backport?
>

Yes, it would be better to report and discuss this in a separate thread,

> Attached as v40_0001 on the patch.
>
> Note that I need to have that commit as 0001 so that 0002 patch
> passes the tests.
>

I think we can add such a test (which relies on existing buggy
behavior) later after fixing the existing bug. For now, it would be
better to remove that test and add it after we fix dropped columns
issue in HEAD.

--
With Regards,
Amit Kapila.



On Fri, Mar 10, 2023 at 7:58 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>
> I think one option could be to drop some cases altogether, but not sure we'd want that.
>
> As a semi-related question: Are you aware of any setting that'd make pg_stat_all_indexes
> reflect the changes sooner? It is hard to debug what is the bottleneck in the tests, but
> I have a suspicion that there might be several poll_query_until() calls on
> pg_stat_all_indexes, which might be the reason?
>

Yeah, I also think poll_query_until() calls on pg_stat_all_indexes is
the main reason for these tests taking more time. When I commented
those polls, it drastically reduces the test time. On looking at
pgstat_report_stat(), it seems we don't report stats sooner than 1s
and as most of this patch's test relies on stats, it leads to taking
more time. I don't have a better idea to verify this patch without
checking whether the index scan is really used by referring to
pg_stat_all_indexes. I think trying to reduce the poll calls may help
in reducing the test timings further. Some ideas on those lines are as
follows:
1.
+# Testcase start: SUBSCRIPTION USES INDEX WITH PUB/SUB DIFFERENT DATA VIA
+# A UNIQUE INDEX THAT IS NOT PRIMARY KEY OR REPLICA IDENTITY

No need to use Delete test separate for this.

2.
+$node_publisher->safe_psql('postgres',
+ "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;");
+$node_publisher->safe_psql('postgres',
+ "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 3;");
+
+# check if the index is used even when the index has NULL values
+$node_subscriber->poll_query_until(
+ 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full updates test_replica_id_full table";

Here, I think only one update is sufficient.

3.
+$node_subscriber->safe_psql('postgres',
+ "CREATE INDEX people_last_names ON people(lastname)");
+
+# wait until the index is created
+$node_subscriber->poll_query_until(
+ 'postgres', q{select count(*)=1 from pg_stat_all_indexes where
indexrelname = 'people_last_names';}
+) or die "Timed out while waiting for creating index people_last_names";

I don't think we need this poll.

4.
+# update 2 rows
+$node_publisher->safe_psql('postgres',
+ "UPDATE people SET firstname = 'no-name' WHERE firstname = 'first_name_1';");
+$node_publisher->safe_psql('postgres',
+ "UPDATE people SET firstname = 'no-name' WHERE firstname =
'first_name_3' AND lastname = 'last_name_3';");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
indexrelname = 'people_names';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full updates two rows via index scan with index on
expressions and columns";
+
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM people WHERE firstname = 'no-name';");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select idx_scan=4 from pg_stat_all_indexes where
indexrelname = 'people_names';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full deletes two rows via index scan with index on
expressions and columns";

I think having one update or delete should be sufficient.

5.
+# update rows, moving them to other partitions
+$node_publisher->safe_psql('postgres',
+ "UPDATE users_table_part SET value_1 = 0 WHERE user_id = 4;");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select sum(idx_scan)=1 from pg_stat_all_indexes where
indexrelname ilike 'users_table_part_%';}
+) or die "Timed out while waiting for updates on partitioned table with index";
+
+# delete rows from different partitions
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM users_table_part WHERE user_id = 1 and value_1 = 1;");
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM users_table_part WHERE user_id = 12 and value_1 = 12;");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select sum(idx_scan)=3 from pg_stat_all_indexes where
indexrelname ilike 'users_table_part_%';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full updates partitioned table";
+

Can we combine these two polls?

6.
+# Testcase start: SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS, ALSO
+# DROPS COLUMN

In this test, let's try to update/delete 2-3 rows instead of 20. And
after drop columns, let's keep just one of the update or delete.

7. Apart from the above, I think it is better to use
wait_for_catchup() consistently before trying to verify the data on
the subscriber. We always use it in other tests. I guess here you are
relying on the poll for index scans to ensure that data is replicated
but I feel it may still be better to use wait_for_catchup().
Similarly, wait_for_subscription_sync uses the publisher name and
appname in other tests, so it is better to be consistent. It can avoid
random failures by ensuring data is synced.

--
With Regards,
Amit Kapila.



Hi Amit, all


> This triggers tuples_equal to fail. To fix that, I improved the tuples_equal
> such that it skips the dropped columns.
>

 By any chance, have you tried with generated columns?

Yes, it shows the same behavior.
 
See
logicalrep_write_tuple()/logicalrep_write_attrs() where we neither
send anything for dropped columns nor for generated columns.
Similarly, on receiving side, in logicalrep_rel_open() and
slot_store_data(), we seem to be using NULL for such columns.


Thanks for the explanation, it helps a lot.
 

Yes, it would be better to report and discuss this in a separate thread,

Done via [1]  

> Attached as v40_0001 on the patch.
>
> Note that I need to have that commit as 0001 so that 0002 patch
> passes the tests.
>

I think we can add such a test (which relies on existing buggy
behavior) later after fixing the existing bug. For now, it would be
better to remove that test and add it after we fix dropped columns
issue in HEAD.

Alright, when I push the next version (hopefully tomorrow), I'll follow this suggestion.

Thanks,
Onder KALACI  

On Sun, Mar 12, 2023 at 1:34 AM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>>
>> I think we can add such a test (which relies on existing buggy
>> behavior) later after fixing the existing bug. For now, it would be
>> better to remove that test and add it after we fix dropped columns
>> issue in HEAD.
>
>
> Alright, when I push the next version (hopefully tomorrow), I'll follow this suggestion.
>

Okay, thanks. See, if you can also include your changes in the patch
wip_for_optimize_index_column_match (after discussed modification).
Few other minor comments:

1.
+   are enforced for primary keys.  Internally, we follow a similar approach for
+   supporting index scans within logical replication scope.  If there are no

I think we can remove the above line: "Internally, we follow a similar
approach for supporting index scans within logical replication scope."
This didn't seem useful for users.

2.
diff --git a/src/backend/executor/execReplication.c
b/src/backend/executor/execReplication.c
index bc6409f695..646e608eb7 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -83,11 +83,8 @@ build_replindex_scan_key(ScanKey skey, Relation
rel, Relation idxrel,
                if (!AttributeNumberIsValid(table_attno))
                {
                        /*
-                        * XXX: For a non-primary/unique index with an
additional
-                        * expression, we do not have to continue at
this point. However,
-                        * the below code assumes the index scan is
only done for simple
-                        * column references. If we can relax the
assumption in the below
-                        * code-block, we can also remove the continue.
+                        * XXX: Currently, we don't support
expressions in the scan key,
+                        * see code below.
                         */


I have tried to simplify the above comment. See, if that makes sense to you.

3.
/*
+ * We only need to allocate once. This is allocated within per
+ * tuple context -- ApplyMessageContext -- hence no need to
+ * explicitly pfree().
+ */

We normally don't write why we don't need to explicitly pfree. It is
good during the review but not sure if it is a good idea to keep it in
the final code.

4. I have modified the proposed commit message as follows, see if that
makes sense to you, and let me know if I missed anything especially
the review/author credit.

Allow the use of indexes other than PK and REPLICA IDENTITY on the subscriber.

Using REPLICA IDENTITY FULL on the publisher can lead to a full table
scan per tuple change on the subscription when REPLICA IDENTITY or PK
index is not available. This makes REPLICA IDENTITY FULL impractical
to use apart from some small number of use cases.

This patch allows using indexes other than PRIMARY KEY or REPLICA
IDENTITY on the subscriber during apply of update/delete. The index
that can be used must be a btree index, not a partial index, and it
must have at least one column reference (i.e. cannot consist of only
expressions). We can uplift these restrictions in the future. There is
no smart mechanism to pick the index. If there is more than one index
that
satisfies these requirements, we just pick the first one. We discussed
using some of the optimizer's low-level APIs for this but ruled it out
as that can be a maintenance burden in the long run.

This patch improves the performance in the vast majority of cases and
the improvement is proportional to the amount of data in the table.
However, there could be some regression in a small number of cases
where the indexes have a lot of duplicate and dead rows. It was
discussed that those are mostly impractical cases but we can provide a
table or subscription level option to disable this feature if
required.

Author: Onder Kalaci
Reviewed-by: Peter Smith, Shi yu, Hou Zhijie, Vignesh C, Kuroda
Hayato, Amit Kapila
Discussion: https://postgr.es/m/CACawEhVLqmAAyPXdHEPv1ssU2c=dqOniiGz7G73HfyS7+nGV4w@mail.gmail.com

--
With Regards,
Amit Kapila.



Hi Amit, all

1.
+# Testcase start: SUBSCRIPTION USES INDEX WITH PUB/SUB DIFFERENT DATA VIA
+# A UNIQUE INDEX THAT IS NOT PRIMARY KEY OR REPLICA IDENTITY

No need to use Delete test separate for this.

Yeah, there is really no difference between update/delete for this patch,
so it makes sense. I initially added it for completeness for the coverage,
but as it has the perf overhead for the tests, I agree that we could 
drop some of those.
 

2.
+$node_publisher->safe_psql('postgres',
+ "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 1;");
+$node_publisher->safe_psql('postgres',
+ "UPDATE test_replica_id_full SET x = x + 1 WHERE x = 3;");
+
+# check if the index is used even when the index has NULL values
+$node_subscriber->poll_query_until(
+ 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
indexrelname = 'test_replica_id_full_idx';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full updates test_replica_id_full table";

Here, I think only one update is sufficient.

done. I guess you requested this change so that we would wait
for idx_scan=1 not idx_scan=2, which could help.


3.
+$node_subscriber->safe_psql('postgres',
+ "CREATE INDEX people_last_names ON people(lastname)");
+
+# wait until the index is created
+$node_subscriber->poll_query_until(
+ 'postgres', q{select count(*)=1 from pg_stat_all_indexes where
indexrelname = 'people_last_names';}
+) or die "Timed out while waiting for creating index people_last_names";

I don't think we need this poll.

 true, not sure why I have this. none of the tests has this anyway.


4.
+# update 2 rows
+$node_publisher->safe_psql('postgres',
+ "UPDATE people SET firstname = 'no-name' WHERE firstname = 'first_name_1';");
+$node_publisher->safe_psql('postgres',
+ "UPDATE people SET firstname = 'no-name' WHERE firstname =
'first_name_3' AND lastname = 'last_name_3';");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select idx_scan=2 from pg_stat_all_indexes where
indexrelname = 'people_names';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full updates two rows via index scan with index on
expressions and columns";
+
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM people WHERE firstname = 'no-name';");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select idx_scan=4 from pg_stat_all_indexes where
indexrelname = 'people_names';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full deletes two rows via index scan with index on
expressions and columns";

I think having one update or delete should be sufficient.

So, I dropped the 2nd update, but kept 1 update and 1 delete.
The latter deletes the tuple updated by the former. Seems like
an interesting test to keep.

Still, I dropped one of the extra poll_query_until, which is probably
good enough for this one? Let me know if you think otherwise.
 

5.
+# update rows, moving them to other partitions
+$node_publisher->safe_psql('postgres',
+ "UPDATE users_table_part SET value_1 = 0 WHERE user_id = 4;");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select sum(idx_scan)=1 from pg_stat_all_indexes where
indexrelname ilike 'users_table_part_%';}
+) or die "Timed out while waiting for updates on partitioned table with index";
+
+# delete rows from different partitions
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM users_table_part WHERE user_id = 1 and value_1 = 1;");
+$node_publisher->safe_psql('postgres',
+ "DELETE FROM users_table_part WHERE user_id = 12 and value_1 = 12;");
+
+# wait until the index is used on the subscriber
+$node_subscriber->poll_query_until(
+ 'postgres', q{select sum(idx_scan)=3 from pg_stat_all_indexes where
indexrelname ilike 'users_table_part_%';}
+) or die "Timed out while waiting for check subscriber
tap_sub_rep_full updates partitioned table";
+

Can we combine these two polls?

Looking at it closely, the first one seems like an unnecessary poll anyway.
We can simply check the idxscan at the end of the test, I don't see
value in checking earlier.
 

6.
+# Testcase start: SUBSCRIPTION USES INDEX WITH MULTIPLE ROWS AND COLUMNS, ALSO
+# DROPS COLUMN

In this test, let's try to update/delete 2-3 rows instead of 20. And
after drop columns, let's keep just one of the update or delete.

changed to 3 rows
 

7. Apart from the above, I think it is better to use
wait_for_catchup() consistently before trying to verify the data on
the subscriber. We always use it in other tests. I guess here you are
relying on the poll for index scans to ensure that data is replicated
but I feel it may still be better to use wait_for_catchup().

Yes, that was my understanding & expectation. I'm not convinced that
 wait_for_catchup() is strictly needed, as without catching up, how could
pg_stat_all_indexes be updated? Still, it is good to be consistent
with the test suite. So, applied your suggestion.

Similarly, wait_for_subscription_sync uses the publisher name and
appname in other tests, so it is better to be consistent. It can avoid
random failures by ensuring data is synced.
 
makes sense.

I'll attach a new patch in the next e-mail, along with your 
other comments.


Thanks,
Onder KALACI
 
 
Hi Amit, all


>> I think we can add such a test (which relies on existing buggy
>> behavior) later after fixing the existing bug. For now, it would be
>> better to remove that test and add it after we fix dropped columns
>> issue in HEAD.
>
>
> Alright, when I push the next version (hopefully tomorrow), I'll follow this suggestion.
>

Okay, thanks. See, if you can also include your changes in the patch
wip_for_optimize_index_column_match (after discussed modification).
Few other minor comments:

Sure, done. Please check RemoteRelContainsLeftMostColumnOnIdx() function.

Note that we already have a test for that on SOME NULL VALUES AND MISSING COLUMN.
Previously we'd check if test_replica_id_full_idy is used. Now, we don't because it is not
used anymore :) I initially used poll_query_until with idx_scan=0, but that also seems
confusing to read in the test. It looks like it could be prone to race conditions as 
poll_query_until with idxcan=0 does not guarantee anything.
 

1.
+   are enforced for primary keys.  Internally, we follow a similar approach for
+   supporting index scans within logical replication scope.  If there are no

I think we can remove the above line: "Internally, we follow a similar
approach for supporting index scans within logical replication scope."
This didn't seem useful for users.


removed
 
2.
diff --git a/src/backend/executor/execReplication.c
b/src/backend/executor/execReplication.c
index bc6409f695..646e608eb7 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -83,11 +83,8 @@ build_replindex_scan_key(ScanKey skey, Relation
rel, Relation idxrel,
                if (!AttributeNumberIsValid(table_attno))
                {
                        /*
-                        * XXX: For a non-primary/unique index with an
additional
-                        * expression, we do not have to continue at
this point. However,
-                        * the below code assumes the index scan is
only done for simple
-                        * column references. If we can relax the
assumption in the below
-                        * code-block, we can also remove the continue.
+                        * XXX: Currently, we don't support
expressions in the scan key,
+                        * see code below.
                         */


I have tried to simplify the above comment. See, if that makes sense to you.

Makes sense
 

3.
/*
+ * We only need to allocate once. This is allocated within per
+ * tuple context -- ApplyMessageContext -- hence no need to
+ * explicitly pfree().
+ */

We normally don't write why we don't need to explicitly pfree. It is
good during the review but not sure if it is a good idea to keep it in
the final code.

 
Sounds good, applied 

4. I have modified the proposed commit message as follows, see if that
makes sense to you, and let me know if I missed anything especially
the review/author credit.

Allow the use of indexes other than PK and REPLICA IDENTITY on the subscriber.

Using REPLICA IDENTITY FULL on the publisher can lead to a full table
scan per tuple change on the subscription when REPLICA IDENTITY or PK
index is not available. This makes REPLICA IDENTITY FULL impractical
to use apart from some small number of use cases.

This patch allows using indexes other than PRIMARY KEY or REPLICA
IDENTITY on the subscriber during apply of update/delete. The index
that can be used must be a btree index, not a partial index, and it
must have at least one column reference (i.e. cannot consist of only
expressions). We can uplift these restrictions in the future. There is
no smart mechanism to pick the index. If there is more than one index
that
satisfies these requirements, we just pick the first one. We discussed
using some of the optimizer's low-level APIs for this but ruled it out
as that can be a maintenance burden in the long run.

This patch improves the performance in the vast majority of cases and
the improvement is proportional to the amount of data in the table.
However, there could be some regression in a small number of cases
where the indexes have a lot of duplicate and dead rows. It was
discussed that those are mostly impractical cases but we can provide a
table or subscription level option to disable this feature if
required.

Author: Onder Kalaci
Reviewed-by: Peter Smith, Shi yu, Hou Zhijie, Vignesh C, Kuroda
Hayato, Amit Kapila
Discussion: https://postgr.es/m/CACawEhVLqmAAyPXdHEPv1ssU2c=dqOniiGz7G73HfyS7+nGV4w@mail.gmail.com


I also see 2 mails/reviews from Wang wei, but I'm not sure what qualifies as "reviewer" for this group. Should we 
add that name as well? I think you can guide us on this.

Other than that, I only fixed one extra new line between 'that' and "satisfies'. Other than that, it looks pretty good!
 
Thanks,
Onder

Attachment
On Sat, Mar 11, 2023 at 6:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Mar 10, 2023 at 7:58 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> >
> >
> > I think one option could be to drop some cases altogether, but not sure we'd want that.
> >
> > As a semi-related question: Are you aware of any setting that'd make pg_stat_all_indexes
> > reflect the changes sooner? It is hard to debug what is the bottleneck in the tests, but
> > I have a suspicion that there might be several poll_query_until() calls on
> > pg_stat_all_indexes, which might be the reason?
> >
>
> Yeah, I also think poll_query_until() calls on pg_stat_all_indexes is
> the main reason for these tests taking more time. When I commented
> those polls, it drastically reduces the test time. On looking at
> pgstat_report_stat(), it seems we don't report stats sooner than 1s
> and as most of this patch's test relies on stats, it leads to taking
> more time. I don't have a better idea to verify this patch without
> checking whether the index scan is really used by referring to
> pg_stat_all_indexes. I think trying to reduce the poll calls may help
> in reducing the test timings further. Some ideas on those lines are as
> follows:

If the reason for the stats polling was only to know if some index is
chosen or not, I was wondering if you can just convey the same
information to the TAP test via some conveniently placed (DEBUG?)
logging.

This way the TAP test can do a 'wait_for_log' instead of the
'poll_query_until'. It will probably generate lots of extra logging
but it still might be lots faster that current code because it won't
incur the 1s overheads of the stats.

------
Kind Regards,
Peter Smith.
Fujitsu Australia.



RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"shiy.fnst@fujitsu.com"
Date:
On Fri, Mar 10, 2023 8:17 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Fri, Mar 10, 2023 at 5:16 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> >
> >>
> >> wip_for_optimize_index_column_match
> >> +static bool
> >> +IndexContainsAnyRemoteColumn(IndexInfo  *indexInfo,
> >> + LogicalRepRelation  *remoterel)
> >> +{
> >> + for (int i = 0; i < indexInfo->ii_NumIndexAttrs; i++)
> >> + {
> >>
> >> Wouldn't it be better to just check if the first column is not part of
> >> the remote column then we can skip that index?
> >
> >
> > Reading [1], I think I can follow what you suggest. So, basically,
> > if the leftmost column is not filtered, we have the following:
> >
> >>  but the entire index would have to be scanned, so in most cases the planner
> would prefer a sequential table scan over using the index.
> >
> >
> > So, in our case, we could follow a similar approach. If the leftmost column of
> the index
> > is not sent over the wire from the pub, we can prefer the sequential scan.
> >
> > Is my understanding of your suggestion accurate?
> >
> 
> Yes. I request an opinion from Shi-San who has reported the problem.
> 

I also agree with this.
And I think we can mention this in the comments if we do so.

Regards,
Shi Yu

Hi Shi Yu,


> >
> >
> > Reading [1], I think I can follow what you suggest. So, basically,
> > if the leftmost column is not filtered, we have the following:
> >
> >>  but the entire index would have to be scanned, so in most cases the planner
> would prefer a sequential table scan over using the index.
> >
> >
> > So, in our case, we could follow a similar approach. If the leftmost column of
> the index
> > is not sent over the wire from the pub, we can prefer the sequential scan.
> >
> > Is my understanding of your suggestion accurate?
> >
>
> Yes. I request an opinion from Shi-San who has reported the problem.
>

I also agree with this.
And I think we can mention this in the comments if we do so.


Already commented on FindUsableIndexForReplicaIdentityFull() on v44.


Thanks,
Onder KALACI
 

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"houzj.fnst@fujitsu.com"
Date:
On Monday, March 13, 2023 2:23 PM Önder Kalacı <onderkalaci@gmail.com>  wrote:
Hi,

> > >
> > >
> > > Reading [1], I think I can follow what you suggest. So, basically,
> > > if the leftmost column is not filtered, we have the following:
> > >
> > >>  but the entire index would have to be scanned, so in most cases the planner
> > would prefer a sequential table scan over using the index.
> > >
> > >
> > > So, in our case, we could follow a similar approach. If the leftmost column of
> > the index
> > > is not sent over the wire from the pub, we can prefer the sequential scan.
> > >
> > > Is my understanding of your suggestion accurate?
> > >
> > 
> > Yes. I request an opinion from Shi-San who has reported the problem.
> > 
> 
> I also agree with this.
> And I think we can mention this in the comments if we do so.
> 
> Already commented on FindUsableIndexForReplicaIdentityFull() on v44.

Thanks for updating the patch.

I noticed one problem:

+static bool
+RemoteRelContainsLeftMostColumnOnIdx(IndexInfo  *indexInfo,
+                                     LogicalRepRelation  *remoterel)
+{
+    AttrNumber            keycol;
+
+    if (indexInfo->ii_NumIndexAttrs < 1)
+        return false;
+
+    keycol = indexInfo->ii_IndexAttrNumbers[0];
+    if (!AttributeNumberIsValid(keycol))
+        return false;
+
+    return bms_is_member(keycol-1, remoterel->attkeys);
+}

In this function, it used the local column number(keycol) to match the remote
column number(attkeys), I think it will cause problem if the column order
between pub/sub doesn't match. Like:

-------
- pub
CREATE TABLE test_replica_id_full (x int, y int);
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;
- sub
CREATE TABLE test_replica_id_full (z int, y int, x int);
CREATE unique INDEX idx ON test_replica_id_full(z);
CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres port=5432' PUBLICATION tap_pub_rep_full;
-------

I think we need to use the attrmap->attnums to convert the column number before
comparing. Just for reference, attach a diff(0001) which I noted down when trying to
fix the problem.

Besides, I also look at the "WIP: Optimize for non-pkey / non-RI unique
indexes" patch, I think it also had a similar problem about the column
matching. And another thing I think we can improved in this WIP patch is that
we can cache the result of IsIdxSafeToSkipDuplicates() instead of doing it for
each UPDATE, because the cost of this function becomes bigger after applying
this patch. And for reference, I tried to improve the WIP for the same, and
here is a slight modified version of this WIP(0002). Feel free to modify or merge
it if needed.
Thanks for Shi-san for helping to finish these fixes.


Best Regards,
Hou zj

Attachment
Hi Hou zj,  Shi-san, all


In this function, it used the local column number(keycol) to match the remote
column number(attkeys), I think it will cause problem if the column order
between pub/sub doesn't match. Like:

-------
- pub
CREATE TABLE test_replica_id_full (x int, y int);
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;
- sub
CREATE TABLE test_replica_id_full (z int, y int, x int);
CREATE unique INDEX idx ON test_replica_id_full(z);
CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres port=5432' PUBLICATION tap_pub_rep_full;
-------

I think we need to use the attrmap->attnums to convert the column number before
comparing. Just for reference, attach a diff(0001) which I noted down when trying to
fix the problem.

I'm always afraid of these types of last minute additions to the patch, and here we have
this issue on one of the latest addition :( 

Thanks for reporting the problem and also providing guidance on the fix. After reading
codes on attrMap and debugging this case further, I think your suggestion makes sense.

I only made some small changes, and included them in the patch.


Besides, I also look at the "WIP: Optimize for non-pkey / non-RI unique
indexes" patch, I think it also had a similar problem about the column
matching

Right, I'll incorporate this fix to that one as well.
 
. And another thing I think we can improved in this WIP patch is that
we can cache the result of IsIdxSafeToSkipDuplicates() instead of doing it for
each UPDATE, because the cost of this function becomes bigger after applying
this patch

Yes, it makes sense.
 

Thanks for Shi-san for helping to finish these fixes.

Thank you both!


Onder Kalaci 
Attachment
On Mon, Mar 13, 2023 at 2:44 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Sat, Mar 11, 2023 at 6:05 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Fri, Mar 10, 2023 at 7:58 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> > >
> > >
> > > I think one option could be to drop some cases altogether, but not sure we'd want that.
> > >
> > > As a semi-related question: Are you aware of any setting that'd make pg_stat_all_indexes
> > > reflect the changes sooner? It is hard to debug what is the bottleneck in the tests, but
> > > I have a suspicion that there might be several poll_query_until() calls on
> > > pg_stat_all_indexes, which might be the reason?
> > >
> >
> > Yeah, I also think poll_query_until() calls on pg_stat_all_indexes is
> > the main reason for these tests taking more time. When I commented
> > those polls, it drastically reduces the test time. On looking at
> > pgstat_report_stat(), it seems we don't report stats sooner than 1s
> > and as most of this patch's test relies on stats, it leads to taking
> > more time. I don't have a better idea to verify this patch without
> > checking whether the index scan is really used by referring to
> > pg_stat_all_indexes. I think trying to reduce the poll calls may help
> > in reducing the test timings further. Some ideas on those lines are as
> > follows:
>
> If the reason for the stats polling was only to know if some index is
> chosen or not, I was wondering if you can just convey the same
> information to the TAP test via some conveniently placed (DEBUG?)
> logging.
>

I had thought about it but didn't convince myself that it would be a
better approach because it would LOG a lot of messages for bulk
updates/deletes. Note for each row update on the publisher a new
index/sequence scan will happen. So, instead, I tried to further
change the test cases to remove unnecessary parts. I have changed
below tests:

1.
+# subscriber gets the missing table information
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub_rep_full REFRESH PUBLICATION");

This and the follow-on test was not required after we have removed
Dropped columns test.

2. Reduce the number of updates/deletes in the first test to two rows.

3. Removed the cases for dropping the index. This ensures that after
dropping the index on the table we switch to either an index scan (if
a new index is created) or to a sequence scan. It doesn't seem like a
very interesting case to me.

Apart from the above, I have removed the explicit setting of
'wal_retrieve_retry_interval = 1ms' as the same is not done for any
other subscription tests. I know setting wal_retrieve_retry_interval
avoids the launcher sometimes taking more time to launch apply worker
but it is better to be consistent. See the changes in
changes_amit_1.patch, if you agree with the same then please include
them in the next version.

After doing the above, the test time on my machine is closer to what
other tests take which is ~5s.


--
With Regards,
Amit Kapila.

Attachment
Hi Amit, Peter, all


> If the reason for the stats polling was only to know if some index is
> chosen or not, I was wondering if you can just convey the same
> information to the TAP test via some conveniently placed (DEBUG?)
> logging.
>

I had thought about it but didn't convince myself that it would be a
better approach because it would LOG a lot of messages for bulk
updates/deletes.

I'm also hesitant to add any log messages for testing purposes, especially
something like this one, where a single UPDATE on the source code
leads to an unbounded number of logs.
 

1.
+# subscriber gets the missing table information
+$node_subscriber->safe_psql('postgres',
+ "ALTER SUBSCRIPTION tap_sub_rep_full REFRESH PUBLICATION");

This and the follow-on test was not required after we have removed
Dropped columns test.


Right, I kept it with the idea that we might get the dropped column changes
earlier, so that I can rebase and add the drop column ones.

But, sure, we can add that later to other tests.
 
2. Reduce the number of updates/deletes in the first test to two rows.

We don't have any particular reasons to have more tuples. Given the
time constraints, I don't have any objections to change this. 
 

3. Removed the cases for dropping the index. This ensures that after
dropping the index on the table we switch to either an index scan (if
a new index is created) or to a sequence scan. It doesn't seem like a
very interesting case to me.

For that test, my goal was to ensure/show that the invalidation callback
is triggered after `DROP / CREATE INDEX` commands.

Can we always assume that this would never change? Because if this
behavior ever changes, the users would stuck with the wrong/old
index until VACUUM happens.
 

Apart from the above, I have removed the explicit setting of
'wal_retrieve_retry_interval = 1ms' as the same is not done for any
other subscription tests. I know setting wal_retrieve_retry_interval
avoids the launcher sometimes taking more time to launch apply worker
but it is better to be consistent

Hmm, I cannot remember why I added that. It was probably to make
poll_query_until/wait_for_catchup to happen faster.

But, running the test w/wout this setting, I cannot observe any noticeable
difference. So, probably fine to remove.
 
. See the changes in
changes_amit_1.patch, if you agree with the same then please include
them in the next version.

included all, but I'm not very sure to remove (3). If you think we have
coverage for that in other cases, I'm fine with that.
 

After doing the above, the test time on my machine is closer to what
other tests take which is ~5s.

Yes, same for me.

Thanks, attaching v46 
Attachment
On Mon, Mar 13, 2023 at 6:14 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>>
>>
>> 3. Removed the cases for dropping the index. This ensures that after
>> dropping the index on the table we switch to either an index scan (if
>> a new index is created) or to a sequence scan. It doesn't seem like a
>> very interesting case to me.
>
>
> For that test, my goal was to ensure/show that the invalidation callback
> is triggered after `DROP / CREATE INDEX` commands.
>

Fair point. I suggest in that case just keep one of the tests for Drop
Index such that after that it will pick up a sequence scan. However,
just do the poll for the number of index scans stat once. I think that
will cover the case you are worried about without having a noticeable
impact on test timing.

--
With Regards,
Amit Kapila.



Hi Amit, all
>
> For that test, my goal was to ensure/show that the invalidation callback
> is triggered after `DROP / CREATE INDEX` commands.
>

Fair point. I suggest in that case just keep one of the tests for Drop
Index such that after that it will pick up a sequence scan. However,
just do the poll for the number of index scans stat once. I think that
will cover the case you are worried about without having a noticeable
impact on test timing.


So, after dropping the index, it is not possible to poll for the idxscan.

But, I think, after the drop index, it is enough to check if the modification
is applied properly on the target (wait_for_catchup + safe_psql). 
If it were to cache the indexOid, the update/delete would fail anyway.

Attaching v47.


Thanks,
Onder KALACI
 
Attachment

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From
"shiy.fnst@fujitsu.com"
Date:
On Mon, Mar 13, 2023 10:16 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> 
> Attaching v47.
> 

Thanks for updating the patch. Here are some comments.

1.
in RemoteRelContainsLeftMostColumnOnIdx():

+    if (indexInfo->ii_NumIndexAttrs < 1)
+        return false;

Did you see any cases that the condition is true? I think there is at least one
column in the index. If so, we can use an Assert().

+    if (attrmap->maplen <= AttrNumberGetAttrOffset(keycol))
+        return false;

Similarly, I think `attrmap->maplen` is the number of columns and it is always
greater than keycol. If you agree, we can check it with an Assert(). Besides, It
seems we don't need AttrNumberGetAttrOffset().

2.
+# make sure that the subscriber has the correct data after the update UPDATE

"update UPDATE" seems to be a typo.

3.
+# now, drop the index with the expression, and re-create index on column lastname

The comment says "re-create index on column lastname" but it seems we didn't do
that, should it be modified to something like: 
# now, drop the index with the expression, we will use sequential scan

Besides these, the patch LGTM.

Regards,
Shi Yu

On Mon, Mar 13, 2023 at 7:46 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Attaching v47.
>

I have made the following changes in the attached patch (a) removed
the function IsIdxSafeToSkipDuplicates() and used the check directly
in the caller; (b) changed a few comments in the patch; (c) the test
file was inconsistently using ';' while executing statement with
safe_psql, changed it to remove ';'.

--
With Regards,
Amit Kapila.

Attachment
Hi Shi Yu,


in RemoteRelContainsLeftMostColumnOnIdx():

+       if (indexInfo->ii_NumIndexAttrs < 1)
+               return false;

Did you see any cases that the condition is true? I think there is at least one
column in the index. If so, we can use an Assert().

Actually, it was mostly to guard against any edge cases. I thought similar to tables,
we can have zero column indexes, but it turns out it is not possible. Also, 
index_create seems to check that, so changing it asset makes sense:

 /*
* check parameters
*/
if (indexInfo->ii_NumIndexAttrs < 1)
elog(ERROR, "must index at least one column");


 

+       if (attrmap->maplen <= AttrNumberGetAttrOffset(keycol))
+               return false;

Similarly, I think `attrmap->maplen` is the number of columns and it is always
greater than keycol. If you agree, we can check it with an Assert().

At this point, I'm really hesitant to make any assumptions. Logical replication
is pretty flexible, and who knows maybe dropped columns will be treated
differently at some point, and this assumption changes?

I really feel more comfortable to keep this as-is. We call this function very infrequently
anyway.
 
Besides, It
seems we don't need AttrNumberGetAttrOffset().

 
Why not? We are accessing the AttrNumberGetAttrOffset(keycol) element
of the array attnums?
 
2.
+# make sure that the subscriber has the correct data after the update UPDATE

"update UPDATE" seems to be a typo.


thanks, fixed
 
3.
+# now, drop the index with the expression, and re-create index on column lastname

The comment says "re-create index on column lastname" but it seems we didn't do
that, should it be modified to something like:
# now, drop the index with the expression, we will use sequential scan



Thanks, fixed

I'll add the changes to v49 in the next e-mail.

Thanks,
Onder KALACI 
Hi Amit, all


Amit Kapila <amit.kapila16@gmail.com>, 14 Mar 2023 Sal, 09:50 tarihinde şunu yazdı:
On Mon, Mar 13, 2023 at 7:46 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Attaching v47.
>

I have made the following changes in the attached patch (a) removed
the function IsIdxSafeToSkipDuplicates() and used the check directly
in the caller

Should be fine, we can re-introduce this function when I work on the
non-pkey/RI unique index improvement as a follow up to this.
 
; (b) changed a few comments in the patch;

Thanks, looks good.
 
(c) the test
file was inconsistently using ';' while executing statement with
safe_psql, changed it to remove ';'.


Alright, thanks. 

And as a self-review, when I write regression tests next time, I'll spend a lot
more time on the style/consistency/comments etc. During this review,
the reviewers had to spend many cycles on that area, which is something
I should have done better.

Attaching v49 with some minor changes Shi Yu noted earlier.

Thanks,
Onder KALACI

 
Attachment
On Tue, Mar 14, 2023 at 12:48 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>>
>> 2.
>> +# make sure that the subscriber has the correct data after the update UPDATE
>>
>> "update UPDATE" seems to be a typo.
>>
>
> thanks, fixed
>
>>
>> 3.
>> +# now, drop the index with the expression, and re-create index on column lastname
>>
>> The comment says "re-create index on column lastname" but it seems we didn't do
>> that, should it be modified to something like:
>> # now, drop the index with the expression, we will use sequential scan
>>
>>
>
> Thanks, fixed
>
> I'll add the changes to v49 in the next e-mail.
>

It seems you forgot to address these last two comments in the latest version.


--
With Regards,
Amit Kapila.




Amit Kapila <amit.kapila16@gmail.com>, 14 Mar 2023 Sal, 11:59 tarihinde şunu yazdı:
On Tue, Mar 14, 2023 at 12:48 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>>
>> 2.
>> +# make sure that the subscriber has the correct data after the update UPDATE
>>
>> "update UPDATE" seems to be a typo.
>>
>
> thanks, fixed
>
>>
>> 3.
>> +# now, drop the index with the expression, and re-create index on column lastname
>>
>> The comment says "re-create index on column lastname" but it seems we didn't do
>> that, should it be modified to something like:
>> # now, drop the index with the expression, we will use sequential scan
>>
>>
>
> Thanks, fixed
>
> I'll add the changes to v49 in the next e-mail.
>

It seems you forgot to address these last two comments in the latest version.


Oops, sorry. I think when I get your test changes, I somehow overridden these changes
on my local.
Attachment
On Tue, 14 Mar 2023 at 14:36, Önder Kalacı <onderkalaci@gmail.com> wrote:
>
>
> Amit Kapila <amit.kapila16@gmail.com>, 14 Mar 2023 Sal, 11:59 tarihinde şunu yazdı:
>>
>> On Tue, Mar 14, 2023 at 12:48 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>> >>
>> >> 2.
>> >> +# make sure that the subscriber has the correct data after the update UPDATE
>> >>
>> >> "update UPDATE" seems to be a typo.
>> >>
>> >
>> > thanks, fixed
>> >
>> >>
>> >> 3.
>> >> +# now, drop the index with the expression, and re-create index on column lastname
>> >>
>> >> The comment says "re-create index on column lastname" but it seems we didn't do
>> >> that, should it be modified to something like:
>> >> # now, drop the index with the expression, we will use sequential scan
>> >>
>> >>
>> >
>> > Thanks, fixed
>> >
>> > I'll add the changes to v49 in the next e-mail.
>> >
>>
>> It seems you forgot to address these last two comments in the latest version.
>>
>
> Oops, sorry. I think when I get your test changes, I somehow overridden these changes
> on my local.

Thanks for the updated patch.
Few minor comments:
1) The extra line break after IsIndexOnlyOnExpression function can be removed:
+ }
+
+ return true;
+}
+
+
+/*
+ * Returns true if the attrmap (which belongs to remoterel) contains the
+ * leftmost column of the index.
+ *
+ * Otherwise returns false.
+ */

2) Generally we don't terminate with "." for single line comments
+
+ /*
+ * Simple case, we already have a primary key or a replica identity index.
+ */
+ idxoid = GetRelationIdentityOrPK(localrel);
+ if (OidIsValid(idxoid))
+ return idxoid;

Regards,
Vignesh





Thanks for the updated patch.
Few minor comments:
1) The extra line break after IsIndexOnlyOnExpression function can be removed:

removed 
 


2) Generally we don't terminate with "." for single line comments
+
+ /*
+ * Simple case, we already have a primary key or a replica identity index.
+ */
+ idxoid = GetRelationIdentityOrPK(localrel);
+ if (OidIsValid(idxoid))
+ return idxoid;

Well, there are several "."  for single line comments even in the same file such as:

/* 0 means it's a dropped attribute.  See comments atop AttrMap. */

I really don't have any preference on this, but I doubt if I change it, I'll get
another review suggesting to conform to the existing style in the same file.
So, I'm skipping this suggestion for now, unless you have objections.

 
Attachment
On Tue, Mar 14, 2023 at 3:18 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>>

Pushed this patch but forgot to add a new testfile. Will do that soon.


--
With Regards,
Amit Kapila.



On Wed, Mar 15, 2023 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Mar 14, 2023 at 3:18 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> >>
>
> Pushed this patch but forgot to add a new testfile. Will do that soon.
>

The main patch is committed now. I think the pending item in this
thread is to conclude whether we need a storage or subscription to
disable/enable this feature. Both Andres and Onder don't seem to be in
favor but I am of opinion that it could be helpful in scenarios where
the index scan (due to duplicates or dead tuples) is slower. However,
if we don't have a consensus on the same, we can anyway add it later.
If there are no more votes in favor of adding such an option, we can
probably close the CF entry.

--
With Regards,
Amit Kapila.



On Thu, Mar 16, 2023 at 2:15 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Mar 15, 2023 at 9:12 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Tue, Mar 14, 2023 at 3:18 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> > >>
> >
> > Pushed this patch but forgot to add a new testfile. Will do that soon.
> >
>
> The main patch is committed now. I think the pending item in this
> thread is to conclude whether we need a storage or subscription to
> disable/enable this feature. Both Andres and Onder don't seem to be in
> favor but I am of opinion that it could be helpful in scenarios where
> the index scan (due to duplicates or dead tuples) is slower. However,
> if we don't have a consensus on the same, we can anyway add it later.
> If there are no more votes in favor of adding such an option, we can
> probably close the CF entry.
>

I have closed this CF entry for now. However, if there is any interest
in pursuing the storage or subscription option for this feature, we
can still discuss it.

--
With Regards,
Amit Kapila.