RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers
From | houzj.fnst@fujitsu.com |
---|---|
Subject | RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher |
Date | |
Msg-id | OS0PR01MB5716B9EC1FCE3FC04C50DB8394B49@OS0PR01MB5716.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher ("houzj.fnst@fujitsu.com" <houzj.fnst@fujitsu.com>) |
List | pgsql-hackers |
On Wednesday, March 8, 2023 2:51 PM houzj.fnst@fujitsu.com <houzj.fnst@fujitsu.com> wrote: > > On Tuesday, March 7, 2023 9:47 PM Önder Kalacı <onderkalaci@gmail.com> > wrote: > > Hi, > > > > > Let me give an example to demonstrate why I thought something is fishy > here: > > > > > > > > Imagine rel has a (non-default) REPLICA IDENTITY with Oid=1111. > > > > Imagine the same rel has a PRIMARY KEY with Oid=2222. > > > > > > > > Hmm, alright, this is syntactically possible, but not sure if any user > > would do this. Still thanks for catching this. > > > > And, you are right, if a user has created such a schema, > > IdxIsRelationIdentityOrPK() would return the wrong result and we'd use > sequential scan instead of index scan. > > This would be a regression. I think we should change the function. > > I am looking at the latest patch and have a question about the following code. > > /* Try to find the tuple */ > - if (index_getnext_slot(scan, ForwardScanDirection, outslot)) > + while (index_getnext_slot(scan, ForwardScanDirection, outslot)) > { > - found = true; > + /* > + * Avoid expensive equality check if the index is primary key or > + * replica identity index. > + */ > + if (!idxIsRelationIdentityOrPK) > + { > + if (eq == NULL) > + { > +#ifdef USE_ASSERT_CHECKING > + /* apply assertions only once for the input > idxoid */ > + IndexInfo *indexInfo = BuildIndexInfo(idxrel); > + > Assert(IsIndexUsableForReplicaIdentityFull(indexInfo)); > +#endif > + > + /* > + * We only need to allocate once. This is > allocated within per > + * tuple context -- ApplyMessageContext -- > hence no need to > + * explicitly pfree(). > + */ > + eq = palloc0(sizeof(*eq) * > outslot->tts_tupleDescriptor->natts); > + } > + > + if (!tuples_equal(outslot, searchslot, eq)) > + continue; > + } > > IIRC, it invokes tuples_equal for all cases unless we are using replica identity key > or primary key to scan. But there seem some other cases where the > tuples_equal looks unnecessary. > > For example, if the table on subscriber don't have a PK or RI key but have a > not-null, non-deferrable, unique key. And if the apply worker choose this index > to do the scan, it seems we can skip the tuples_equal as well. > > --Example > pub: > CREATE TABLE test_replica_id_full (a int, b int not null); ALTER TABLE > test_replica_id_full REPLICA IDENTITY FULL; CREATE PUBLICATION > tap_pub_rep_full FOR TABLE test_replica_id_full; > > sub: > CREATE TABLE test_replica_id_full (a int, b int not null); CREATE UNIQUE INDEX > test_replica_id_full_idx ON test_replica_id_full(b); Thinking again. This example is incorrect, sorry. I mean the case when all the columns of the tuple to be compared are in the unique index on subscriber side, like: --Example pub: CREATE TABLE test_replica_id_full (a int); ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL; CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full; sub: CREATE TABLE test_replica_id_full (a int not null); CREATE UNIQUE INDEX test_replica_id_full_idx ON test_replica_id_full(a); -- Best Regards, Hou zj
pgsql-hackers by date: