Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id CAA4eK1+q_6dJksasaA1SOibQk2huUFUEvZRa3Mi0J9uU235Fvw@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
Responses Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
List pgsql-hackers
On Fri, Jul 22, 2022 at 9:45 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>>
>>
>> BTW, do we want to consider partial indexes for the scan in this
>> context? I mean it may not have data of all rows so how that would be
>> usable?
>>
>
> As far as I can see, check_index_predicates() never picks a partial index for the baserestrictinfos we create in
CreateReplicaIdentityFullPaths().The reason is that we have roughly the following call stack: 
>
> -check_index_predicates
>  --predicate_implied_by
> ---predicate_implied_by_recurse
> ----predicate_implied_by_simple_clause
> -----operator_predicate_proof
>
> And, inside operator_predicate_proof(), there is never going to be an equality. Because, we push `Param`s to the
baserestrictinfoswhereas the index predicates are always `Const`. 
>

I agree that the way currently baserestrictinfos are formed by patch,
it won't select the partial path, and chances are that that will be
true in future as well but I think it is better to be explicit in this
case to avoid creating a dependency between two code paths.

Few other comments:
==================
1. Why is it a good idea to choose the index selected even for the
bitmap path (T_BitmapHeapScan or T_BitmapIndexScan)? We use index scan
during update/delete, so not sure how we can conclude to use index for
bitmap paths.

2. The index info is built even on insert, so workload, where there
are no updates/deletes or those are not published then this index
selection work will go waste. Will it be better to do it at first
update/delete? One can say that it is not worth the hassle as anyway
it will be built the first time we perform an operation on the
relation or after the relation gets invalidated. If we think so, then
probably adding a comment could be useful.

3.
+my $synced_query =
+  "SELECT count(1) = 0 FROM pg_subscription_rel WHERE srsubstate NOT
IN ('r', 's');";
...
...
+# wait for initial table synchronization to finish
+$node_subscriber->poll_query_until('postgres', $synced_query)
+  or die "Timed out while waiting for subscriber to synchronize data";

You can avoid such instances in the test by using the new
infrastructure added in commit 0c20dd33db.

4.
  LogicalRepRelation *remoterel = &root->remoterel;
+
  Oid partOid = RelationGetRelid(partrel);

Spurious line addition.

--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: "shiy.fnst@fujitsu.com"
Date:
Subject: RE: Perform streaming logical transactions by background workers and parallel apply
Next
From: Bharath Rupireddy
Date:
Subject: Use pg_pwritev_with_retry() instead of write() in dir_open_for_write() to avoid partial writes?