Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id CAA4eK1LVB7LaEzQh-+MO1reS-NU7AX5kxuvYJRCwpuqmqQbKxA@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
Responses Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
List pgsql-hackers
On Thu, Mar 2, 2023 at 8:52 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Alright, attached v27_0001_use_index_on_subs_when_pub_rep_ident_full.patch and
> v27_0002_use_index_on_subs_when_pub_rep_ident_full.patch.
>

Few comments on 0001
====================
1.
+   such suitable indexes, the search on the subscriber s ide can be
very inefficient,

unnecessary space in 'side'

2.
-   identity.  If the table does not have any suitable key, then it can be set
-   to replica identity <quote>full</quote>, which means the entire row becomes
-   the key.  This, however, is very inefficient and should only be used as a
-   fallback if no other solution is possible.  If a replica identity other
+   identity. When replica identity <literal>FULL</literal> is specified,
+   indexes can be used on the subscriber side for searching the rows.

I think it is better to retain the first sentence (If the table does
not ... entire row becomes the key.) as that says what will be part of
the key.

3.
-   comprising the same or fewer columns must also be set on the subscriber
-   side.  See <xref linkend="sql-altertable-replica-identity"/> for details on
-   how to set the replica identity.  If a table without a replica identity is
-   added to a publication that replicates <command>UPDATE</command>
+   comprising the same or fewer columns must also be set on the
subscriber side.
+   See <xref linkend="sql-altertable-replica-identity"/> for
+   details on how to set the replica identity.  If a table without a replica
+   identity is added to a publication that replicates <command>UPDATE</command>

I don't see any change in this except line length. If so, I don't
think we should change it as part of this patch.

4.
 /*
  * Setup a ScanKey for a search in the relation 'rel' for a tuple 'key' that
  * is setup to match 'rel' (*NOT* idxrel!).
  *
- * Returns whether any column contains NULLs.
+ * Returns how many columns should be used for the index scan.
+ *
+ * This is not generic routine, it expects the idxrel to be
+ * a btree, non-partial and have at least one column
+ * reference (e.g., should not consist of only expressions).
  *
- * This is not generic routine, it expects the idxrel to be replication
- * identity of a rel and meet all limitations associated with that.
+ * By definition, replication identity of a rel meets all
+ * limitations associated with that. Note that any other
+ * index could also meet these limitations.

The comment changes look quite asymmetric to me. Normally, we break
the line if the line length goes beyond 80 cols. Please check and
change other places in the patch if they have a similar symptom.

5.
+ * There are no fundamental problems for supporting non-btree
+ * and/or partial indexes.

Can we mention partial indexes in the above comment? It seems to me
that because the required tuple may not satisfy the expression (in the
case of partial indexes) it may not be easy to support it.

6.
build_replindex_scan_key()
{
...
+ for (index_attoff = 0; index_attoff <
IndexRelationGetNumberOfKeyAttributes(idxrel);
+ index_attoff++)
...
...
+#ifdef USE_ASSERT_CHECKING
+ IndexInfo  *indexInfo = BuildIndexInfo(idxrel);
+
+ Assert(!IsIndexOnlyOnExpression(indexInfo));
+#endif
...
}

We can avoid building index info multiple times. This can be either
checked at the beginning of the function outside attribute offset loop
or we can probably cache it. I understand this is for assert builds
but seems easy to avoid it doing multiple times and it also looks odd
to do it multiple times for the same index.

7.
- /* Build scankey for every attribute in the index. */
- for (attoff = 0; attoff <
IndexRelationGetNumberOfKeyAttributes(idxrel); attoff++)
+ /* Build scankey for every non-expression attribute in the index. */
+ for (index_attoff = 0; index_attoff <
IndexRelationGetNumberOfKeyAttributes(idxrel);
+ index_attoff++)
  {
  Oid operator;
  Oid opfamily;
+ Oid optype = get_opclass_input_type(opclass->values[index_attoff]);
  RegProcedure regop;
- int pkattno = attoff + 1;
- int mainattno = indkey->values[attoff];
- Oid optype = get_opclass_input_type(opclass->values[attoff]);
+ int table_attno = indkey->values[index_attoff];

I don't think here we need to change variable names if we retain
mainattno as it is instead of changing it to table_attno. The current
naming doesn't seem bad for the current usage of the patch.

8.
+ TypeCacheEntry **eq = NULL; /* only used when the index is not RI or PK */

Normally, we don't add such comments as the usage is quite obvious by
looking at the code.


--
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Real config values for bytes needs quotes?
Next
From: Corey Huinker
Date:
Subject: Re: Add SHELL_EXIT_CODE to psql