Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From Önder Kalacı
Subject Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id CACawEhV66S9W4EygPWcx-PXqXOzvh6j11hoRzM4mRJOAzryzrQ@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Peter Smith <smithpb2250@gmail.com>)
List pgsql-hackers
Hi Peter,



1.
In my previous review [1] (comment #1) I wrote that only some of the
"should" were misleading and gave examples where to change. But I
didn't say that *every* usage of that word was wrong, so your global
replace of "should" to "must" has modified a couple of places in
unexpected ways.

Details are in subsequent review comments below -- see #2b, #3, #5.

Ah, that was my mistake. Thanks for thorough review on this.
 

======
doc/src/sgml/logical-replication.sgml

2.
A published table must have a “replica identity” configured in order
to be able to replicate UPDATE and DELETE operations, so that
appropriate rows to update or delete can be identified on the
subscriber side. By default, this is the primary key, if there is one.
Another unique index (with certain additional requirements) can also
be set to be the replica identity. If the table does not have any
suitable key, then it can be set to replica identity “full”, which
means the entire row becomes the key. When replica identity “full” is
specified, indexes can be used on the subscriber side for searching
the rows. Candidate indexes must be btree, non-partial, and have at
least one column reference (i.e. cannot consist of only expressions).
These restrictions on the non-unique index properties adheres to some
of the restrictions that are enforced for primary keys. Internally, we
follow a similar approach for supporting index scans within logical
replication scope. If there are no such suitable indexes, the search
on the subscriber side can be very inefficient, therefore replica
identity “full” must only be used as a fallback if no other solution
is possible. If a replica identity other than “full” is set on the
publisher side, a replica identity comprising the same or fewer
columns must also be set on the subscriber side. See REPLICA IDENTITY
for details on how to set the replica identity. If a table without a
replica identity is added to a publication that replicates UPDATE or
DELETE operations then subsequent UPDATE or DELETE operations will
cause an error on the publisher. INSERT operations can proceed
regardless of any replica identity.

~~

2a.
My previous review [1] (comment #2)  was not fixed quite as suggested.

Please change:
"adheres to" --> "adhere to"


Oops, it seems I only got the "to" part of your suggestion and missed "s".

Done now.
 
~~

2b. should/must

This should/must change was OK as it was before, because here it is only advice.

Please change this back how it was:
"must only be used as a fallback" --> "should only be used as a fallback"

Thanks, changed.
 

======
src/backend/executor/execReplication.c

3. build_replindex_scan_key

 /*
  * Setup a ScanKey for a search in the relation 'rel' for a tuple 'key' that
  * is setup to match 'rel' (*NOT* idxrel!).
  *
- * Returns whether any column contains NULLs.
+ * Returns how many columns must be used for the index scan.
+ *

~

This should/must change does not seem quite right.

SUGGESTION (reworded)
Returns how many columns to use for the index scan.

Fixed. 

(I wish we had a simpler process to incorporate such
comments.)

 

~~~

4. build_replindex_scan_key

>
> Based on the discussions below, I kept as-is. I really don't want to do unrelated
> changes in this patch, as I also got several feedback for not doing it,
>

Hmm, although this code pre-existed I don’t consider this one as
"unrelated changes" because the patch introduced the new "if
(!AttributeNumberIsValid(table_attno))" which changed things.  As I
wrote to Amit yesterday [2] IMO it would be better to do the 'opttype'
assignment *after* the potential 'continue' otherwise there is every
chance that the assignment is just redundant. And if you move the
assignment where it belongs, then you might as well declare the
variable in the more appropriate place at the same time – i.e. with
'opfamily' declaration. Anyway, I've given my reason a couple of times
now, so if you don't want to change it I won't about it debate
anymore.

Alright, given both you and Amit [1] agree on this, I'll follow that.

 

======
src/backend/replication/logical/relation.c

5. FindUsableIndexForReplicaIdentityFull

+ * XXX: There are no fundamental problems for supporting non-btree indexes.
+ * We mostly need to relax the limitations in RelationFindReplTupleByIndex().
+ * For partial indexes, the required changes are likely to be larger. If
+ * none of the tuples satisfy the expression for the index scan, we must
+ * fall-back to sequential execution, which might not be a good idea in some
+ * cases.

The should/must change (the one in the XXX comment) does not seem quite right.

SUGGESTION
"we must fall-back to sequential execution" --> "we fallback to
sequential execution"

fixed, thanks.
 

======
.../subscription/t/032_subscribe_use_index.pl

FYI, I get TAP test in error (Note - this is when only patch 0001 is appied)

t/031_column_list.pl ............... ok
t/032_subscribe_use_index.pl ....... 19/?
#   Failed test 'ensure subscriber has not used index with
enable_indexscan=false'
#   at t/032_subscribe_use_index.pl line 806.
#          got: '1'
#     expected: '0'
t/032_subscribe_use_index.pl ....... 21/? # Looks like you failed 1 test of 22.
t/032_subscribe_use_index.pl ....... Dubious, test returned 1 (wstat 256, 0x100)
Failed 1/22 subtests
t/100_bugs.pl ...................... ok

AFAICT there is a test case that is testing the patch 0002
functionality even when patch 0002 is not applied yet.


Oops, I somehow managed to make the same rebase mistake. I fixed this,
and for next time I'll make sure that each commit passes the CI separately.
Sorry for the noise.

I'll attach the changes on v38 in the next e-mail.

Thanks,
Onder KALACI

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Time delayed LR (WAS Re: logical replication restrictions)
Next
From: Peter Eisentraut
Date:
Subject: Re: improving user.c error messages