Re: BUG #7495: chosen wrong index - Mailing list pgsql-bugs

From Kevin Grittner
Subject Re: BUG #7495: chosen wrong index
Date
Msg-id 502B75150200002500049751@gw.wicourts.gov
Whole thread Raw
In response to BUG #7495: chosen wrong index  (psql@elbrief.de)
Responses Re: BUG #7495: chosen wrong index  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Re-2: BUG #7495: chosen wrong index  (psql@elbrief.de)
Re-2: BUG #7495: chosen wrong index  (psql@elbrief.de)
Re-2: BUG #7495: chosen wrong index  (psql@elbrief.de)
Re-2: BUG #7495: chosen wrong index  (psql@elbrief.de)
Re-2: BUG #7495: chosen wrong index  (psql@elbrief.de)
List pgsql-bugs
<psql@elbrief.de> wrote:

> insert into bla ( a , b )
>   select a , a
>     from generate_series( 1 , 1000000 ) as a ( a ) ;

> explain analyze select * from bla
>   where b > 990000 order by a limit 10 ;
> [uses index on b and has a long run time]

The problem is that PostgreSQL doesn't have any sense of the
correlation between columns a and b (i.e., they are always equal)
and assumes that it will find enough matching rows soon enough on
the scan of the index on b to make it cheaper than sorting the
results of finding all rows that match the predicate.  Try your test
suite again with the only change being the insert statement:

insert into bla ( a , b )
  select a , floor(random() * 1000000) + 1
  from generate_series( 1 , 1000000 ) as a ( a ) ;

On my machine, with that data, all of the queries run fast.

We've been looking at ways to develop statistics on multiple
columns, so that correlations like that don't confuse the optimizer,
or trying to evaluate the "risk" of a query taking a long time based
on unexpected correlations.

Not really a bug; more like a recognized opportunity to improve the
optimizer.

-Kevin

pgsql-bugs by date:

Previous
From: psql@elbrief.de
Date:
Subject: BUG #7495: chosen wrong index
Next
From: Angel Zúñiga
Date:
Subject: ProblemWithCharsOSX