Use of index in 7.0 vs 6.5 - Mailing list pgsql-sql

From Ryan Bradetich
Subject Use of index in 7.0 vs 6.5
Date
Msg-id 392C6056.D5BF7FE8@hp.com
Whole thread Raw
Responses Re: Use of index in 7.0 vs 6.5  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Use of index in 7.0 vs 6.5  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-sql
Tom (Or anyone else who is good with PostgreSQL statistics),

I am in the process of transitioning from postgreSQL 6.5.3 to
postgreSQL 7.0.  I ran into an issue where a sequential scan
is being choosen on postgreSQL 7.0 where an index scan was
choosen on postgreSQL 6.5.3.

Note: All tables have been freshly vacuum'd and analyzed.

procman=# select version();                             version
-------------------------------------------------------------------PostgreSQL 7.0.0 on hppa2.0w-hp-hpux11.00, compiled
bygcc 2.95.2
 
(1 row)

procman=# explain select count(catagory) from medusa where host_id = 404
and catagory like 'A%';
NOTICE:  QUERY PLAN:
Aggregate  (cost=189546.19..189546.19 rows=1 width=12) ->  Seq Scan on medusa  (cost=0.00..189529.43 rows=6704
width=12)
EXPLAIN

Note: The above query produces an index scan on postgreSQL 6.5.3.

procman=# set enable_seqscan = off;
SET VARIABLE

procman=# explain select count(catagory) from medusa where host_id = 404
and catagory like 'A%';
NOTICE:  QUERY PLAN:
Aggregate  (cost=207347.36..207347.36 rows=1 width=12) ->  Index Scan using medusa_host_id_key on medusa
(cost=0.00..207330.60 rows=6704 width=12)
EXPLAIN

Here are the statistics:

procman=# select attname,attdisbursion,s.*
procman-# from pg_statistic s, pg_attribute a, pg_class c
procman-# where starelid = c.oid and attrelid = c.oid and staattnum =
attnum
procman-# and relname = 'medusa'; attname  | attdisbursion | starelid | staattnum | staop | stanullfrac
| stacommonfrac |
stacommonval                                 |staloval                             |                stahival

-----------+---------------+----------+-----------+-------+-------------+---------------+-----------------------------------------------------------------------------+----------------------------

--------------------------------------+-----------------------------------------
host_id   |    0.00621312 | 30874288 |         1 |    97 |           0
|     0.0279425 |
446
| 0                                     | 11011
(1 row)


Here is my analysis of the stastics (based on the examples in the
archive).

The most common value host_id in the table is 446 with row fraction of
~ 2.8%.
The estimated number of rows in the index is 6704.  This table has
4,630,229
entries in the table.

Hopefully this analysis is correct, if not .. please correct me :)

I do not understand why the planner would choose a seqscan over the
index scan because
6704/4,630,229 is ~ 0.15%.

Thanks for your time,

Ryan



- Ryan




pgsql-sql by date:

Previous
From: Julie Hunt
Date:
Subject: Re: possible bug with group by?
Next
From: "Stephan Szabo"
Date:
Subject: Re: possible bug with group by?