Re: Make the qual cost on index Filter slightly higher than qual coston index Cond. - Mailing list pgsql-hackers

From Andy Fan
Subject Re: Make the qual cost on index Filter slightly higher than qual coston index Cond.
Date
Msg-id CAKU4AWrS8d-Hpr4_wV1F+Oudn_NGdUYWnQpGPKYChVCNU3BY2A@mail.gmail.com
Whole thread Raw
In response to Re: Make the qual cost on index Filter slightly higher than qual coston index Cond.  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses Re: Make the qual cost on index Filter slightly higher than qual coston index Cond.  (Andy Fan <zhihui.fan1213@gmail.com>)
List pgsql-hackers


On Tue, May 26, 2020 at 9:59 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Tue, May 26, 2020 at 1:52 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
>
> Consider the below example:
>
> create table j1(i int, im5 int,  im100 int, im1000 int);
> insert into j1 select i, i%5, i%100, i%1000 from generate_series(1, 10000000)i;
> create index j1_i_im5 on j1(i, im5);
> create index j1_i_im100 on j1(i, im100);
> analyze j1;
> explain select * from j1 where i = 100 and im5 = 5;
>
> We may get the plan like this:
>
> demo=# explain select  * from  j1 where i = 100 and im5 = 1;
>                               QUERY PLAN
> ----------------------------------------------------------------------
>  Index Scan using j1_i_im100 on j1  (cost=0.43..8.46 rows=1 width=16)
>    Index Cond: (i = 100)
>    Filter: (im5 = 1)
> (3 rows)
>
> At this case, optimizer can estimate there are only 1 row to return, so both
> indexes have same cost, which one will be choose is un-controlable. This is
> fine for above query based on the estimation is accurate. However estimation
> can't be always accurate in real life. Some inaccurate estimation can cause an
> wrong index choose. As an experience, j1_i_im5 index should always be choose
> for above query.

I think we need a better example where choosing an index makes a difference.

An index can be chosen just because it's path was created before some
other more appropriate index but the cost difference was within fuzzy
limit. Purely based on the order in which index paths are created.

Here is an further example with the above case:

demo=# insert into j1 select 1, 1, 1, 1 from generate_series(1, 100000)i;
INSERT 0 100000
 
With the current implementation, it is 

demo=# explain analyze select * from j1 where i = 1 and im5 = 2;
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Index Scan using j1_i_im100 on j1  (cost=0.43..8.44 rows=1 width=16) (actual time=63.431..63.431 rows=0 loops=1)
   Index Cond: (i = 1)
   Filter: (im5 = 2)
   Rows Removed by Filter: 100001
 Planning Time: 0.183 ms
 Execution Time: 63.484 ms
(6 rows)

With the patch above, it can always choose a correct index even the statistics is inaccurate:

demo=# explain analyze select * from j1 where i = 1 and im5 = 2;
                                                  QUERY PLAN
--------------------------------------------------------------------------------------------------------------
 Index Scan using j1_i_im5 on j1  (cost=0.43..8.46 rows=1 width=16) (actual time=0.030..0.030 rows=0 loops=1)
   Index Cond: ((i = 1) AND (im5 = 2))
 Planning Time: 1.087 ms
 Execution Time: 0.077 ms
(4 rows)

-- 
Best Regards
Andy Fan

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: hash join error improvement (old)
Next
From: Andy Fan
Date:
Subject: Re: Make the qual cost on index Filter slightly higher than qual coston index Cond.