RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From shiy.fnst@fujitsu.com
Subject RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id OSZPR01MB63105E246E96C5ECF8131C09FDB49@OSZPR01MB6310.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
Responses Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Önder Kalacı <onderkalaci@gmail.com>)
List pgsql-hackers
On Tue, Mar 7, 2023 9:47 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> 
> I'm attaching v35. 
> 

I noticed that if the index column only exists on the subscriber side, this index
can also be chosen. This seems a bit odd because the index column isn't sent
from publisher.

e.g.
-- pub
CREATE TABLE test_replica_id_full (x int, y int);
ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL;
CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full;
-- sub
CREATE TABLE test_replica_id_full (x int, y int, z int);
CREATE INDEX test_replica_id_full_idx ON test_replica_id_full(z);
CREATE SUBSCRIPTION tap_sub_rep_full_0 CONNECTION 'dbname=postgres port=5432' PUBLICATION tap_pub_rep_full;

I didn't see in any cases the behavior changed after applying the patch, which
looks good. Besides, I tested the performance for such case.

Steps:
1. create tables, index, publication, and subscription
-- pub
create table tbl (a int);
alter table tbl replica identity full;
create publication pub for table tbl;
-- sub
create table tbl (a int, b int);
create index idx_b on tbl(b);
create subscription sub connection 'dbname=postgres port=5432' publication pub;
2. setup synchronous replication
3. execute SQL:
truncate tbl;
insert into tbl select i from generate_series(0,10000)i;
update tbl set a=a+1;

The time of UPDATE (take the average of 10 runs):
master: 1356.06 ms
patched: 3968.14 ms

For the cases that all values of extra columns on the subscriber are NULL, index
scan can't do better than sequential scan. This is not a real scenario and I
think it only degrades when there are many NULL values in the index column, so
this is probably not a case to worry about. I just share this case and then we
can discuss should we pick the index which only contain the extra columns on the
subscriber.

Regards,
Shi Yu

pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: psql: Add role's membership options to the \du+ command
Next
From: "wangw.fnst@fujitsu.com"
Date:
Subject: RE: Rework LogicalOutputPluginWriterUpdateProgress