Optimizer not using index on 120M row table - Mailing list pgsql-general

From Jim C. Nasby
Subject Optimizer not using index on 120M row table
Date
Msg-id 20030407222903.M31861@flake.decibel.org
Whole thread Raw
Responses Re: Optimizer not using index on 120M row table  (Neil Conway <neilc@samurai.com>)
Re: Optimizer not using index on 120M row table  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Optimizer not using index on 120M row table  ("scott.marlowe" <scott.marlowe@ihs.com>)
List pgsql-general
As background: this is the main table for http://stats.distributed.net

      Table "public.email_contrib"
   Column   |     Type      | Modifiers
------------+---------------+-----------
 id         | integer       | not null
 team_id    | integer       | not null
 date       | date          | not null
 project_id | smallint      | not null
 work_units | numeric(20,0) | not null
Indexes: email_contrib_pkey primary key btree (project_id, id, date)

id is the id of a participant, team_id is the team they were on for that
day, date is the day the work was done, project_id is the project, and
work_units is the amount of work done.

 explain select * from email_contrib where project_id=8 and id=39622 and
date='3/1/03';
                                  QUERY PLAN
-------------------------------------------------------------------------------
 Seq Scan on email_contrib  (cost=0.00..2942185.40 rows=1 width=25)
   Filter: ((project_id = 8) AND (id = 39622) AND (date =
'2003-03-01'::date))
(2 rows)

Is there any reason why this shouldn't be using the index? The
selectivity on project_id is very low (only 5 values for all 120M rows).

select attname, avg_width, n_distinct, correlation from pg_stats where
tablename='email_contrib';
  attname   | avg_width | n_distinct | correlation
------------+-----------+------------+-------------
 id         |         4 |      95184 |    0.496598
 team_id    |         4 |       1361 |    0.219478
 date       |         4 |       1769 |    0.329469
 project_id |         2 |          5 |           1
 work_units |        11 |       2100 |   0.0900541

It seems that not only should the query I explained be using the index,
but it seems that something like select sum(work_units) .. where
project_id=8 and id=39622 should also use the index.

I've run vacuum analyze to no effect.

On a related note, will pgsql do 'index covering', reading only the
index if it contains all the information a query needs? IE: in Sybase,
this query will only hit the index on Email_Contrib:

select date from email_contrib where project_id=8 and id=39622;

because email_contrib_pkey contains all required values.
--
Jim C. Nasby (aka Decibel!)                    jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"


pgsql-general by date:

Previous
From: Lamar Owen
Date:
Subject: Re: Failed dependencies: perl(Pg) is needed by postgresql-contrib
Next
From: Neil Conway
Date:
Subject: Re: Optimizer not using index on 120M row table