Thread: Why won't the query planner use my index?

Why won't the query planner use my index?

From
"Robert Wille"
Date:
I am performing a scalability test of PostgreSQL and am having a problem getting the planner to use an index. I have created the following table:
 
create table a (
  id int default nextval('seq'),
  parent int,
  name varchar(32),
  state bigint default 0,
  scope varchar(80),
  primary key (id)
);
I then populate it with 1M rows, and then create the following index:
 
create unique index parentindex on a (parent, name);
and then vacuum analyze.
 
The output from the following explain statements seem unusual:
 
------------------------
test=# explain select * from a where parent=5;
NOTICE:  QUERY PLAN:
Seq Scan on a  (cost=0.00..23000.20 rows=46108 width=40)
------------------------
 
I have tried to no avail to get it to use parentindex on select statements involving only parent.
 
------------------------
test=# explain select * from a where parent=5 and name between '0' and '1';
NOTICE:  QUERY PLAN:
Index Scan using parentindex on a  (cost=0.00..4.82 rows=1 width=40)
 
test=# explain select * from a where parent=5 and name between '0' and '4';
NOTICE:  QUERY PLAN:
Seq Scan on a  (cost=0.00..28243.08 rows=15323 width=40)
------------------------
 
The first explain yields what I would expect, but the second one does not. How can a sequential scan of the entire table be faster than using the index, or at least using the index for the parent column and then doing a sequential scan of the name column?
 
------------------------
test=# explain select max(id) from a;
NOTICE:  QUERY PLAN:
Aggregate  (cost=23000.20..23000.20 rows=1 width=4)
  ->  Seq Scan on a  (cost=0.00..20378.76 rows=1048576 width=4)
------------------------
 
This one is quite baffling. All the DB needs to do is look at the end of the primary key index.
 
------------------------
test=# explain select * from a where id < 10000;
NOTICE:  QUERY PLAN:
Seq Scan on a  (cost=0.00..23000.20 rows=9998 width=40)
------------------------
 
This select statement would select only 1% of the rows, yet the planner thinks that a sequential scan is faster. If I select few enough rows so that the index is used, this one executes much faster.
 
Can someone tell me why the query planner makes seemingly poor choices? Is it possible to force an execution plan (or give hints)? I am using PostgreSQL 7.1.3 for Windows (soon to be for Linux).
 
You can duplicate my tests by running the attached scripts in this order: create, populate, index.
 
Attachment

Re: Why won't the query planner use my index?

From
Neil Conway
Date:
On Wed, 2002-03-27 at 23:46, Robert Wille wrote:
> create table a (
>   id int default nextval('seq'),
>   parent int,
>   name varchar(32),
>   state bigint default 0,
>   scope varchar(80),
>   primary key (id)
> );
>
> I then populate it with 1M rows, and then create the following index:
>
> create unique index parentindex on a (parent, name);
>
> and then vacuum analyze.
>
> The output from the following explain statements seem unusual:
>
> ------------------------
> test=# explain select * from a where parent=5;
> NOTICE:  QUERY PLAN:
> Seq Scan on a  (cost=0.00..23000.20 rows=46108 width=40)

PostgreSQL is guessing that this qualification will return ~46,000 rows,
so that using a sequential scan would be faster than using the index.
You can test if this is correct by using "SET enable_seqscan = off" to
force an index scan, and then timing to see how long that query plan
takes to execute (in 7.2, you could use EXPLAIN ANALYZE). Also, 7.2 is
smarter in this regard.

> ------------------------
> test=# explain select max(id) from a;
> NOTICE:  QUERY PLAN:
> Aggregate  (cost=23000.20..23000.20 rows=1 width=4)
>   ->  Seq Scan on a  (cost=0.00..20378.76 rows=1048576 width=4)
> ------------------------
>
> This one is quite baffling. All the DB needs to do is look at the end of the primary key index.

Postgres will never use an index for this kind of query. If you need
this kind of functionality, perhaps you can manipulate the sequence
directly, using nextval() or currval().

> ------------------------
> test=# explain select * from a where id < 10000;
> NOTICE:  QUERY PLAN:
> Seq Scan on a  (cost=0.00..23000.20 rows=9998 width=40)
> ------------------------
>
> This select statement would select only 1% of the rows, yet the planner thinks that a sequential scan is faster.

Use PostgreSQL 7.2, it should be smart enough to figure that out.

Cheers,

Neil

--
Neil Conway <neilconway@rogers.com>
PGP Key ID: DB3C29FC


Re: Why won't the query planner use my index?

From
Tom Lane
Date:
Neil Conway <nconway@klamath.dyndns.org> writes:
> On Wed, 2002-03-27 at 23:46, Robert Wille wrote:
>> test=# explain select max(id) from a;
>> This one is quite baffling. All the DB needs to do is look at the end of the primary key index.

> Postgres will never use an index for this kind of query. If you need
> this kind of functionality, perhaps you can manipulate the sequence
> directly, using nextval() or currval().

The traditional answer is

    select id from a order by id desc limit 1;

which will give an indexscan-based plan in recent releases.  Since this
is more functional than a max() query (because you can get at all the
columns of the row containing the maximum ID value, not only the max
itself), there's not been a huge amount of interest in teaching the
planner that there might be some relationship between btree indexes and
max/min aggregates.  We do regularly get razzed by people who think that
such a relationship is "obvious" ... but I like Postgres' black-box
approach to aggregates, and am not eager to break it for little or no
gain in functionality.

Wille's test case does seem to expose some problems in current sources:
I notice that a plain "ANALYZE A" produces a ridiculously low reltuples
estimate.  I think this might be because the update sequence in his
script ends up with the first pages of the table completely empty ---
that seems to be causing ANALYZE to do the wrong thing.  Too tired to
look at it more tonight, though.

            regards, tom lane