Home > mailing lists

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From	Önder Kalacı
Subject	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date	March 12, 2023 18:07:21
Msg-id	CACawEhWfqUcS8Hofss73P-20Hp4uMpRk1o3TCXU+wjzjwPhx7w@mail.gmail.com Whole thread Raw
In response to	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher (Amit Kapila <amit.kapila16@gmail.com>)
List	pgsql-hackers

Tree view

Hi Amit, all

>> I think we can add such a test (which relies on existing buggy
>> behavior) later after fixing the existing bug. For now, it would be
>> better to remove that test and add it after we fix dropped columns
>> issue in HEAD.
>
>
> Alright, when I push the next version (hopefully tomorrow), I'll follow this suggestion.
>

Okay, thanks. See, if you can also include your changes in the patch
wip_for_optimize_index_column_match (after discussed modification).
Few other minor comments:

Sure, done. Please check RemoteRelContainsLeftMostColumnOnIdx() function.

Note that we already have a test for that on SOME NULL VALUES AND MISSING COLUMN.

Previously we'd check if test_replica_id_full_idy is used. Now, we don't because it is not

used anymore :) I initially used poll_query_until with idx_scan=0, but that also seems

confusing to read in the test. It looks like it could be prone to race conditions as

poll_query_until with idxcan=0 does not guarantee anything.

1.
+ are enforced for primary keys. Internally, we follow a similar approach for
+ supporting index scans within logical replication scope. If there are no

I think we can remove the above line: "Internally, we follow a similar
approach for supporting index scans within logical replication scope."
This didn't seem useful for users.

removed

2.
diff --git a/src/backend/executor/execReplication.c
b/src/backend/executor/execReplication.c
index bc6409f695..646e608eb7 100644
--- a/src/backend/executor/execReplication.c
+++ b/src/backend/executor/execReplication.c
@@ -83,11 +83,8 @@ build_replindex_scan_key(ScanKey skey, Relation
rel, Relation idxrel,
if (!AttributeNumberIsValid(table_attno))
{
/*
- * XXX: For a non-primary/unique index with an
additional
- * expression, we do not have to continue at
this point. However,
- * the below code assumes the index scan is
only done for simple
- * column references. If we can relax the
assumption in the below
- * code-block, we can also remove the continue.
+ * XXX: Currently, we don't support
expressions in the scan key,
+ * see code below.
*/

I have tried to simplify the above comment. See, if that makes sense to you.

Makes sense

3.
/*
+ * We only need to allocate once. This is allocated within per
+ * tuple context -- ApplyMessageContext -- hence no need to
+ * explicitly pfree().
+ */

We normally don't write why we don't need to explicitly pfree. It is
good during the review but not sure if it is a good idea to keep it in
the final code.

Sounds good, applied

4. I have modified the proposed commit message as follows, see if that
makes sense to you, and let me know if I missed anything especially
the review/author credit.

Allow the use of indexes other than PK and REPLICA IDENTITY on the subscriber.

Using REPLICA IDENTITY FULL on the publisher can lead to a full table
scan per tuple change on the subscription when REPLICA IDENTITY or PK
index is not available. This makes REPLICA IDENTITY FULL impractical
to use apart from some small number of use cases.

This patch allows using indexes other than PRIMARY KEY or REPLICA
IDENTITY on the subscriber during apply of update/delete. The index
that can be used must be a btree index, not a partial index, and it
must have at least one column reference (i.e. cannot consist of only
expressions). We can uplift these restrictions in the future. There is
no smart mechanism to pick the index. If there is more than one index
that
satisfies these requirements, we just pick the first one. We discussed
using some of the optimizer's low-level APIs for this but ruled it out
as that can be a maintenance burden in the long run.

This patch improves the performance in the vast majority of cases and
the improvement is proportional to the amount of data in the table.
However, there could be some regression in a small number of cases
where the indexes have a lot of duplicate and dead rows. It was
discussed that those are mostly impractical cases but we can provide a
table or subscription level option to disable this feature if
required.

Author: Onder Kalaci
Reviewed-by: Peter Smith, Shi yu, Hou Zhijie, Vignesh C, Kuroda
Hayato, Amit Kapila
Discussion: https://postgr.es/m/CACawEhVLqmAAyPXdHEPv1ssU2c=dqOniiGz7G73HfyS7+nGV4w@mail.gmail.com

I also see 2 mails/reviews from Wang wei, but I'm not sure what qualifies as "reviewer" for this group. Should we

add that name as well? I think you can guide us on this.

Other than that, I only fixed one extra new line between 'that' and "satisfies'. Other than that, it looks pretty good!

Thanks,

Onder

Attachment

v44_0001_use_index_on_subs_when_pub_rep_ident_full.patch

pgsql-hackers by date:

From: Önder Kalacı
Date: 12 March 2023, 18:07:17
Subject: Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

From: Zheng Li
Date: 12 March 2023, 18:24:13
Subject: Re: Support logical replication of DDLs

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

Attachment

Previous

Next