Home > mailing lists

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date	March 3, 2023 06:19:21
Msg-id	CAA4eK1LVB7LaEzQh-+MO1reS-NU7AX5kxuvYJRCwpuqmqQbKxA@mail.gmail.com Whole thread
In response to	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher (Önder Kalacı <onderkalaci@gmail.com>)
Responses	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
List	pgsql-hackers

Tree view

On Thu, Mar 2, 2023 at 8:52 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
>
> Alright, attached v27_0001_use_index_on_subs_when_pub_rep_ident_full.patch and
> v27_0002_use_index_on_subs_when_pub_rep_ident_full.patch.
>

Few comments on 0001
====================
1.
+   such suitable indexes, the search on the subscriber s ide can be
very inefficient,

unnecessary space in 'side'

2.
-   identity.  If the table does not have any suitable key, then it can be set
-   to replica identity <quote>full</quote>, which means the entire row becomes
-   the key.  This, however, is very inefficient and should only be used as a
-   fallback if no other solution is possible.  If a replica identity other
+   identity. When replica identity <literal>FULL</literal> is specified,
+   indexes can be used on the subscriber side for searching the rows.

I think it is better to retain the first sentence (If the table does
not ... entire row becomes the key.) as that says what will be part of
the key.

3.
-   comprising the same or fewer columns must also be set on the subscriber
-   side.  See <xref linkend="sql-altertable-replica-identity"/> for details on
-   how to set the replica identity.  If a table without a replica identity is
-   added to a publication that replicates <command>UPDATE</command>
+   comprising the same or fewer columns must also be set on the
subscriber side.
+   See <xref linkend="sql-altertable-replica-identity"/> for
+   details on how to set the replica identity.  If a table without a replica
+   identity is added to a publication that replicates <command>UPDATE</command>

I don't see any change in this except line length. If so, I don't
think we should change it as part of this patch.

4.
 /*
  * Setup a ScanKey for a search in the relation 'rel' for a tuple 'key' that
  * is setup to match 'rel' (*NOT* idxrel!).
  *
- * Returns whether any column contains NULLs.
+ * Returns how many columns should be used for the index scan.
+ *
+ * This is not generic routine, it expects the idxrel to be
+ * a btree, non-partial and have at least one column
+ * reference (e.g., should not consist of only expressions).
  *
- * This is not generic routine, it expects the idxrel to be replication
- * identity of a rel and meet all limitations associated with that.
+ * By definition, replication identity of a rel meets all
+ * limitations associated with that. Note that any other
+ * index could also meet these limitations.

The comment changes look quite asymmetric to me. Normally, we break
the line if the line length goes beyond 80 cols. Please check and
change other places in the patch if they have a similar symptom.

5.
+ * There are no fundamental problems for supporting non-btree
+ * and/or partial indexes.

Can we mention partial indexes in the above comment? It seems to me
that because the required tuple may not satisfy the expression (in the
case of partial indexes) it may not be easy to support it.

6.
build_replindex_scan_key()
{
...
+ for (index_attoff = 0; index_attoff <
IndexRelationGetNumberOfKeyAttributes(idxrel);
+ index_attoff++)
...
...
+#ifdef USE_ASSERT_CHECKING
+ IndexInfo  *indexInfo = BuildIndexInfo(idxrel);
+
+ Assert(!IsIndexOnlyOnExpression(indexInfo));
+#endif
...
}

We can avoid building index info multiple times. This can be either
checked at the beginning of the function outside attribute offset loop
or we can probably cache it. I understand this is for assert builds
but seems easy to avoid it doing multiple times and it also looks odd
to do it multiple times for the same index.

7.
- /* Build scankey for every attribute in the index. */
- for (attoff = 0; attoff <
IndexRelationGetNumberOfKeyAttributes(idxrel); attoff++)
+ /* Build scankey for every non-expression attribute in the index. */
+ for (index_attoff = 0; index_attoff <
IndexRelationGetNumberOfKeyAttributes(idxrel);
+ index_attoff++)
  {
  Oid operator;
  Oid opfamily;
+ Oid optype = get_opclass_input_type(opclass->values[index_attoff]);
  RegProcedure regop;
- int pkattno = attoff + 1;
- int mainattno = indkey->values[attoff];
- Oid optype = get_opclass_input_type(opclass->values[attoff]);
+ int table_attno = indkey->values[index_attoff];

I don't think here we need to change variable names if we retain
mainattno as it is instead of changing it to table_attno. The current
naming doesn't seem bad for the current usage of the patch.

8.
+ TypeCacheEntry **eq = NULL; /* only used when the index is not RI or PK */

Normally, we don't add such comments as the usage is quite obvious by
looking at the code.


--
With Regards,
Amit Kapila.

pgsql-hackers by date:

From: Kyotaro Horiguchi
Date: 03 March 2023, 05:38:03
Subject: Re: Real config values for bytes needs quotes?

From: Corey Huinker
Date: 03 March 2023, 06:23:04
Subject: Re: Add SHELL_EXIT_CODE to psql

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

Previous

Next