RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From shiy.fnst@fujitsu.com
Subject RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date
Msg-id OSZPR01MB63103A4AFBBA56BAF8AE7FAAFDA39@OSZPR01MB6310.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
List pgsql-hackers
On Sat, Feb 4, 2023 7:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Thu, Feb 2, 2023 at 2:03 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> >
> >>
> >> and if there's more
> >> than one candidate index pick any one. Additionally, we can allow
> >> disabling the use of an index scan for this particular case. If we are
> >> too worried about API change for allowing users to specify the index
> >> then we can do that later or as a separate patch.
> >>
> >
> > On v23, I dropped the planner support for picking the index. Instead, it simply
> > iterates over the indexes and picks the first one that is suitable.
> >
> > I'm currently thinking on how to enable users to override this decision.
> > One option I'm leaning towards is to add a syntax like the following:
> >
> > ALTER SUBSCRIPTION .. ALTER TABLE ... SET INDEX ...
> >
> > Though, that should probably be a seperate patch. I'm going to work
> > on that, but still wanted to share v23 given picking the index sounds
> > complementary, not strictly required at this point.
> >
> 
> I agree that it could be a separate patch. However, do you think we
> need some way to disable picking the index scan? This is to avoid
> cases where sequence scan could be better or do we think there won't
> exist such a case?
> 

I think such a case exists. I tried the following cases based on v23 patch.

# Step 1.
Create publication, subscription and tables.
-- on publisher
create table tbl (a int);
alter table tbl replica identity full;
create publication pub for table tbl;

-- on subscriber
create table tbl (a int);
create index idx_a on tbl(a);
create subscription sub connection 'dbname=postgres port=5432' publication pub;

# Step 2.
Setup synchronous replication.

# Step 3.
Execute SQL query on publisher.

-- case 1 (All values are duplicated)
truncate tbl;
insert into tbl select 1 from generate_series(0,10000)i;
update tbl set a=a+1;

-- case 2
truncate tbl;
insert into tbl select i%3 from generate_series(0,10000)i;
update tbl set a=a+1;

-- case 3
truncate tbl;
insert into tbl select i%5 from generate_series(0,10000)i;
update tbl set a=a+1;

-- case 4
truncate tbl;
insert into tbl select i%10 from generate_series(0,10000)i;
update tbl set a=a+1;

-- case 5
truncate tbl;
insert into tbl select i%100 from generate_series(0,10000)i;
update tbl set a=a+1;

-- case 6
truncate tbl;
insert into tbl select i%1000 from generate_series(0,10000)i;
update tbl set a=a+1;

-- case 7 (No duplicate value)
truncate tbl;
insert into tbl select i from generate_series(0,10000)i;
update tbl set a=a+1;

# Result
The time executing update (the average of 3 runs is taken, the unit is
milliseconds):

+--------+---------+---------+
|        | patched |  master |
+--------+---------+---------+
| case 1 | 3933.68 | 1298.32 |
| case 2 | 1803.46 | 1294.42 |
| case 3 | 1380.82 | 1299.90 |
| case 4 | 1042.60 | 1300.20 |
| case 5 |  691.69 | 1297.51 |
| case 6 |  578.50 | 1300.69 |
| case 7 |  566.45 | 1302.17 |
+--------+---------+---------+

In case 1~3, there's an overhead after applying the patch. In other cases, the
patch improved the performance. As more duplicate values, the greater the
overhead after applying the patch.

Regards,
Shi Yu


pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Wrong query results caused by loss of join quals
Next
From: Peter Smith
Date:
Subject: Re: Support logical replication of DDLs