Thread: Optimizer not using index when it should

Optimizer not using index when it should

From
Rob Messer
Date:
Like some other recent messages, this one is about getting postgresql
to use an index that it seems it should clearly use.  (But this one has
nothing to do with count(*)).

Here is my table:

    Column    |            Type             | Modifiers
--------------+-----------------------------+-----------
 dsid         | character varying(20)       | not null
 recid        | numeric(8,0)                | not null
 trans_id     | character varying(16)       | not null
 status       | character varying(1)        |
 init_ts      | timestamp without time zone |
 last_ts      | timestamp without time zone |
 last_form_id | character varying(8)        |
 secval       | character varying(20)       |
Indexes: ds_rec1 unique btree (recid),
         ds_rec2 btree (dsid)

Here is my version info:
 PostgreSQL 7.3.2 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2
20020903 (Red Hat Linux 8.0 3.2-7)

Currently the table ds_record has about 250,000 records in it.  Of
those, about 3000 have dsid = 'starz'. When I need to look up all the
recids with this dsid in ds_record, I have the following simple query:

select recid from ds_record where dsid = 'startz';

But it doesn't use the index ds_rec2 on dsid.  Here is the explain
analyze output:

intellis2=> explain analyze select recid from ds_record where dsid =
'starz';
                                                  QUERY PLAN

--------------------------------------------------------------------------------------------------------------
 Seq Scan on ds_record  (cost=0.00..6186.21 rows=3484 width=12) (actual
time=10.60..408.12 rows=3484 loops=1)
   Filter: (dsid = 'starz'::character varying)
 Total runtime: 410.14 msec
(3 rows)

but if I turn off seqscan I get this:

intellis2=> set enable_seqscan=off;
SET
intellis2=> explain analyze select recid from ds_record where dsid =
'starz';
                                                         QUERY PLAN


----------------------------------------------------------------------------------------------------------------------------
 Index Scan using ds_rec2 on ds_record  (cost=0.00..7185.47 rows=3484
width=12)
(actual time=0.17..12.94 rows=3484 loops=1)
   Index Cond: (dsid = 'starz'::character varying)
 Total runtime: 14.97 msec
(3 rows)

so it is faster by more than a factor of 25 to use the index.  The
problem gets worse when I add a join to the table.

I have tried the following:

alter table ds_record alter dsid set statistics 1000;
vacuum analyze ds_record;
drop index ds_rec2;
CREATE INDEX ds_rec2 ON ds_record USING btree (dsid);

But to no avail, I get the same results.

Interestingly, for queries that return fewer rows it does use the
correct index.  For example, dsid="mapbuy2" appears about 500 times in
ds_record.  Here is the explain out there (with enable_seqscan back
on):

intellis2=> explain analyze select recid from ds_record where dsid =
'mapbuy2';
                                                       QUERY PLAN


-------------------------------------------------------------------------------------------------------------------------
 Index Scan using ds_rec2 on ds_record  (cost=0.00..1351.17 rows=522
width=12) (actual time=0.18..4.31 rows=522 loops=1)
   Index Cond: (dsid = 'mapbuy2'::character varying)
 Total runtime: 4.68 msec

To me it seems that the threshold for doing a table scan is wrong --
when the rows retrieved are about 1.25% of the table it does a scan.

What can I do to fix this -- is there something I am missing about
setting statistics or some configuration variable I can change?  Any
insights would be greatly appreciated.  Thank you,

Rob Messer



__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com


Re: Optimizer not using index when it should

From
Rod Taylor
Date:
> What can I do to fix this -- is there something I am missing about
> setting statistics or some configuration variable I can change?  Any
> insights would be greatly appreciated.  Thank you,

If you look at the estimates for cost, the index scan is more expensive
by ~1/8th. But as you've shown, it's not.

You might try adjusting the random_page_cost down to something more
appropriate for your hardware and situation.

Do testing... this may cause other queries to use an index scan when
they should have been doing a sequential scan.  Mistakenly using an
index can be a much more costly error (hence the high default
random_page_cost).

--
Rod Taylor <rbt@rbt.ca>

PGP Key: http://www.rbt.ca/rbtpub.asc

Attachment

Re: Optimizer not using index when it should

From
Tom Lane
Date:
Rod Taylor <rbt@rbt.ca> writes:
>> What can I do to fix this -- is there something I am missing about
>> setting statistics or some configuration variable I can change?  Any
>> insights would be greatly appreciated.  Thank you,

> You might try adjusting the random_page_cost down to something more
> appropriate for your hardware and situation.

Also, is the table physically ordered by dsid?  If so, is that condition
likely to persist?  You may be looking at a test-condition artifact
here --- a poor estimate for an ordered table may not mean much when
you get to realistic database states.

I assume you've done an ANALYZE of course --- what does the pg_stats row
for column dsid contain?

            regards, tom lane