Query planner question - Mailing list pgsql-general

From Ernest E Vogelsinger
Subject Query planner question
Date
Msg-id 5.1.1.6.2.20030612224331.03afb828@mail.vogelsinger.at
Whole thread Raw
Responses Re: Query planner question  (Dmitry Tkach <dmitry@openratings.com>)
Re: Query planner question  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
List pgsql-general
Maybe someone can shed a light on this:

I have a (test) table, populated with 2M rows:
    rid       int4,       -- primary key
    dcid      varchar,
    dsid      varchar,
    drid      int4,
    owid      int4,
    nxid      int4,
    usg       int4,
    --- something more that's not important

Indexes in question:
    CREATE INDEX id_owid ON table (owid,nxid,usg) USING BTREE
    CREATE INDEX id_dowid ON table (owid, dcid, dsid, drid, nxid, usg)
USING BTREE
    CREATE INDEX id_dictid ON table (dcid, dsid, drid, nxid) USING BTREE

Test Run

VACUUM ANALYZE FULL;
VACUUM
EXPLAIN ANALYZE SELECT DISTINCT dcid,dsid,drid FROM table WHERE owid = 1;
NOTICE:  QUERY PLAN:
  Unique  (cost=402.07..402.90 rows=11 width=20) (actual time=10.13..11.49
rows=512 loops=1)
    ->  Sort  (cost=402.07..402.07 rows=111 width=20) (actual
time=10.13..10.46 rows=512 loops=1)
          ->  Index Scan using id_owid on table (cost=0.00..398.30 rows=111
width=20) (actual time=0.05..4.44 rows=512 loops=1)
Total runtime: 11.95 msec

EXPLAIN ANALYZE SELECT DISTINCT dcid,dsid,drid FROM table WHERE owid is null;
NOTICE:  QUERY PLAN:
  Unique  (cost=126611.73..128034.59 rows=18971 width=20) (actual
time=27515.63..28359.88 rows=513 loops=1)
    ->  Sort  (cost=126611.73..126611.73 rows=189714 width=20) (actual
time=27515.62..27792.29 rows=199131 loops=1)
          ->  Seq Scan on table  (cost=0.00..106885.04 rows=189714
width=20) (actual time=18.76..16467.28 rows=199131 loops=1)
Total runtime: 28633.68 msec

SET enable_seqscan TO off;
SET VARIABLE
EXPLAIN SELECT DISTINCT dcid,dsid,drid FROM table WHERE owid is null;
-- this process uses 34M RSS!!!
NOTICE:  QUERY PLAN:
  Unique  (cost=0.00..7887659.31 rows=18971 width=20) (actual
time=2.57..711940.78 rows=513 loops=1)
    ->  Index Scan using id_dictid on table  (cost=0.00..7886236.46
rows=189714 width=20) (actual time=2.57..710482.07 rows=199131 loops=1)
Total runtime: 711942.76 msec

A couple of questions arise:

1) Why chooses the planner to use id_owid (and not id_dowid as it would
seem logical) for the first query?
2) Why is NO index used for the second query, the only difference being in
the constraint value (owid is set vs. owid is null)?
3) Why does it use id_dictid for the second query when forced to, and not
id_owid or id_dowid?
4) What could I do to get the planner to use the index access method (apart
from setting enable_seqscan to off)?

This is PostgreSQL 7.2.1 on a dual 1GHz / 1GB Dell server
SHMALL = 128MB
SHMMAX = 128MB
shared_buffers = 4000
sort_mem = 4000

Thanks for any pointers/insights/whatever makes sense,


--
   >O     Ernest E. Vogelsinger
   (\)    ICQ #13394035
    ^     http://www.vogelsinger.at/



pgsql-general by date:

Previous
From: "Dmitri Bichko"
Date:
Subject: Re: Best pg_dump practices
Next
From: "Jim C. Nasby"
Date:
Subject: Re: Postgres performance comments from a MySQL user