RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From houzj.fnst@fujitsu.com
Subject RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id OS0PR01MB5716B9EC1FCE3FC04C50DB8394B49@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>)
List pgsql-hackers
On Wednesday, March 8, 2023 2:51 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote:
> 
> On Tuesday, March 7, 2023 9:47 PM Önder Kalacı <onderkalaci@gmail.com>
> wrote:
> 
> Hi,
> 
> > > > Let me give an example to demonstrate why I thought something is fishy
> here:
> > > >
> > > > Imagine rel has a (non-default) REPLICA IDENTITY with Oid=1111.
> > > > Imagine the same rel has a PRIMARY KEY with Oid=2222.
> > > >
> >
> > Hmm, alright, this is syntactically possible, but not sure if any user
> > would do this. Still thanks for catching this.
> >
> > And, you are right, if a user has created such a schema,
> > IdxIsRelationIdentityOrPK() would return the wrong result and we'd use
> sequential scan instead of index scan.
> > This would be a regression. I think we should change the function.
> 
> I am looking at the latest patch and have a question about the following code.
> 
>      /* Try to find the tuple */
> -    if (index_getnext_slot(scan, ForwardScanDirection, outslot))
> +    while (index_getnext_slot(scan, ForwardScanDirection, outslot))
>      {
> -        found = true;
> +        /*
> +         * Avoid expensive equality check if the index is primary key or
> +         * replica identity index.
> +         */
> +        if (!idxIsRelationIdentityOrPK)
> +        {
> +            if (eq == NULL)
> +            {
> +#ifdef USE_ASSERT_CHECKING
> +                /* apply assertions only once for the input
> idxoid */
> +                IndexInfo  *indexInfo = BuildIndexInfo(idxrel);
> +
>     Assert(IsIndexUsableForReplicaIdentityFull(indexInfo));
> +#endif
> +
> +                /*
> +                 * We only need to allocate once. This is
> allocated within per
> +                 * tuple context -- ApplyMessageContext --
> hence no need to
> +                 * explicitly pfree().
> +                 */
> +                eq = palloc0(sizeof(*eq) *
> outslot->tts_tupleDescriptor->natts);
> +            }
> +
> +            if (!tuples_equal(outslot, searchslot, eq))
> +                continue;
> +        }
> 
> IIRC, it invokes tuples_equal for all cases unless we are using replica identity key
> or primary key to scan. But there seem some other cases where the
> tuples_equal looks unnecessary.
> 
> For example, if the table on subscriber don't have a PK or RI key but have a
> not-null, non-deferrable, unique key. And if the apply worker choose this index
> to do the scan, it seems we can skip the tuples_equal as well.
> 
> --Example
> pub:
> CREATE TABLE test_replica_id_full (a int, b int not null); ALTER TABLE
> test_replica_id_full REPLICA IDENTITY FULL; CREATE PUBLICATION
> tap_pub_rep_full FOR TABLE test_replica_id_full;
> 
> sub:
> CREATE TABLE test_replica_id_full (a int, b int not null); CREATE UNIQUE INDEX
> test_replica_id_full_idx ON test_replica_id_full(b);

Thinking again. This example is incorrect, sorry. I mean the case when
all the columns of the tuple to be compared are in the unique index on
subscriber side, like:

--Example
pub:
CREATE TABLE test_replica_id_full (a int);
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;

sub:
CREATE TABLE test_replica_id_full (a int not null);
CREATE UNIQUE INDEX test_replica_id_full_idx ON test_replica_id_full(a);
--

Best Regards,
Hou zj

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Add standard collation UNICODE
Next
From: Michael Paquier
Date:
Subject: Re: Add pg_walinspect function with block info columns