Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From Önder Kalacı
Subject Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id CACawEhWfqUcS8Hofss73P-20Hp4uMpRk1o3TCXU+wjzjwPhx7w@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hi Amit, all


>> I think we can add such a test (which relies on existing buggy
>> behavior) later after fixing the existing bug. For now, it would be
>> better to remove that test and add it after we fix dropped columns
>> issue in HEAD.
>
>
> Alright, when I push the next version (hopefully tomorrow), I'll follow this suggestion.
>

Okay, thanks. See, if you can also include your changes in the patch
wip_for_optimize_index_column_match (after discussed modification).
Few other minor comments:

Sure, done. Please check RemoteRelContainsLeftMostColumnOnIdx() function.

Note that we already have a test for that on SOME NULL VALUES AND MISSING COLUMN.
Previously we'd check if test_replica_id_full_idy is used. Now, we don't because it is not
used anymore :) I initially used poll_query_until with idx_scan=0, but that also seems
confusing to read in the test. It looks like it could be prone to race conditions as 
poll_query_until with idxcan=0 does not guarantee anything.
 

1.
+   are enforced for primary keys.  Internally, we follow a similar approach for
+   supporting index scans within logical replication scope.  If there are no

I think we can remove the above line: "Internally, we follow a similar
approach for supporting index scans within logical replication scope."
This didn't seem useful for users.


removed
 
2.
diff --git a/src/backend/executor/execReplication.c
b/src/backend/executor/execReplication.c
index bc6409f695..646e608eb7 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -83,11 +83,8 @@ build_replindex_scan_key(ScanKey skey, Relation
rel, Relation idxrel,
                if (!AttributeNumberIsValid(table_attno))
                {
                        /*
-                        * XXX: For a non-primary/unique index with an
additional
-                        * expression, we do not have to continue at
this point. However,
-                        * the below code assumes the index scan is
only done for simple
-                        * column references. If we can relax the
assumption in the below
-                        * code-block, we can also remove the continue.
+                        * XXX: Currently, we don't support
expressions in the scan key,
+                        * see code below.
                         */


I have tried to simplify the above comment. See, if that makes sense to you.

Makes sense
 

3.
/*
+ * We only need to allocate once. This is allocated within per
+ * tuple context -- ApplyMessageContext -- hence no need to
+ * explicitly pfree().
+ */

We normally don't write why we don't need to explicitly pfree. It is
good during the review but not sure if it is a good idea to keep it in
the final code.

 
Sounds good, applied 

4. I have modified the proposed commit message as follows, see if that
makes sense to you, and let me know if I missed anything especially
the review/author credit.

Allow the use of indexes other than PK and REPLICA IDENTITY on the subscriber.

Using REPLICA IDENTITY FULL on the publisher can lead to a full table
scan per tuple change on the subscription when REPLICA IDENTITY or PK
index is not available. This makes REPLICA IDENTITY FULL impractical
to use apart from some small number of use cases.

This patch allows using indexes other than PRIMARY KEY or REPLICA
IDENTITY on the subscriber during apply of update/delete. The index
that can be used must be a btree index, not a partial index, and it
must have at least one column reference (i.e. cannot consist of only
expressions). We can uplift these restrictions in the future. There is
no smart mechanism to pick the index. If there is more than one index
that
satisfies these requirements, we just pick the first one. We discussed
using some of the optimizer's low-level APIs for this but ruled it out
as that can be a maintenance burden in the long run.

This patch improves the performance in the vast majority of cases and
the improvement is proportional to the amount of data in the table.
However, there could be some regression in a small number of cases
where the indexes have a lot of duplicate and dead rows. It was
discussed that those are mostly impractical cases but we can provide a
table or subscription level option to disable this feature if
required.

Author: Onder Kalaci
Reviewed-by: Peter Smith, Shi yu, Hou Zhijie, Vignesh C, Kuroda
Hayato, Amit Kapila
Discussion: https://postgr.es/m/CACawEhVLqmAAyPXdHEPv1ssU2c=dqOniiGz7G73HfyS7+nGV4w@mail.gmail.com


I also see 2 mails/reviews from Wang wei, but I'm not sure what qualifies as "reviewer" for this group. Should we 
add that name as well? I think you can guide us on this.

Other than that, I only fixed one extra new line between 'that' and "satisfies'. Other than that, it looks pretty good!
 
Thanks,
Onder

Attachment

pgsql-hackers by date:

Previous
From: Önder Kalacı
Date:
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Next
From: Zheng Li
Date:
Subject: Re: Support logical replication of DDLs