RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From houzj.fnst@fujitsu.com
Subject RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id OS0PR01MB5716D94BDE679A012AB4524694B49@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
Responses RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
List pgsql-hackers
On Tuesday, March 7, 2023 9:47 PM Önder Kalacı <onderkalaci@gmail.com>  wrote:

Hi,

> > > Let me give an example to demonstrate why I thought something is fishy here:
> > >
> > > Imagine rel has a (non-default) REPLICA IDENTITY with Oid=1111.
> > > Imagine the same rel has a PRIMARY KEY with Oid=2222.
> > >
> 
> Hmm, alright, this is syntactically possible, but not sure if any user 
> would do this. Still thanks for catching this.
> 
> And, you are right, if a user has created such a schema, IdxIsRelationIdentityOrPK() 
> would return the wrong result and we'd use sequential scan instead of index scan. 
> This would be a regression. I think we should change the function. 

I am looking at the latest patch and have a question about the following code.

     /* Try to find the tuple */
-    if (index_getnext_slot(scan, ForwardScanDirection, outslot))
+    while (index_getnext_slot(scan, ForwardScanDirection, outslot))
     {
-        found = true;
+        /*
+         * Avoid expensive equality check if the index is primary key or
+         * replica identity index.
+         */
+        if (!idxIsRelationIdentityOrPK)
+        {
+            if (eq == NULL)
+            {
+#ifdef USE_ASSERT_CHECKING
+                /* apply assertions only once for the input idxoid */
+                IndexInfo  *indexInfo = BuildIndexInfo(idxrel);
+                Assert(IsIndexUsableForReplicaIdentityFull(indexInfo));
+#endif
+
+                /*
+                 * We only need to allocate once. This is allocated within per
+                 * tuple context -- ApplyMessageContext -- hence no need to
+                 * explicitly pfree().
+                 */
+                eq = palloc0(sizeof(*eq) * outslot->tts_tupleDescriptor->natts);
+            }
+
+            if (!tuples_equal(outslot, searchslot, eq))
+                continue;
+        }

IIRC, it invokes tuples_equal for all cases unless we are using replica
identity key or primary key to scan. But there seem some other cases where the
tuples_equal looks unnecessary.

For example, if the table on subscriber don't have a PK or RI key but have a
not-null, non-deferrable, unique key. And if the apply worker choose this index
to do the scan, it seems we can skip the tuples_equal as well.

--Example
pub:
CREATE TABLE test_replica_id_full (a int, b int not null);
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;

sub:
CREATE TABLE test_replica_id_full (a int, b int not null);
CREATE UNIQUE INDEX test_replica_id_full_idx ON test_replica_id_full(b);
--

I am not 100% sure if it's worth optimizing this by complicating the check in
idxIsRelationIdentityOrPK. What do you think ?

Best Regards,
Hou zj

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Add standard collation UNICODE
Next
From: Julien Rouhaud
Date:
Subject: Re: pg_upgrade and logical replication