Re: Query planner question - Mailing list pgsql-general

From Dmitry Tkach
Subject Re: Query planner question
Date
Msg-id 3EE8F732.3010800@openratings.com
Whole thread Raw
In response to Query planner question  (Ernest E Vogelsinger <ernest@vogelsinger.at>)
Responses Re: Query planner question  (Ernest E Vogelsinger <ernest@vogelsinger.at>)
List pgsql-general
Ernest E Vogelsinger wrote:

>Maybe someone can shed a light on this:
>
>I have a (test) table, populated with 2M rows:
>    rid       int4,       -- primary key
>    dcid      varchar,
>    dsid      varchar,
>    drid      int4,
>    owid      int4,
>    nxid      int4,
>    usg       int4,
>    --- something more that's not important
>
>Indexes in question:
>    CREATE INDEX id_owid ON table (owid,nxid,usg) USING BTREE
>    CREATE INDEX id_dowid ON table (owid, dcid, dsid, drid, nxid, usg)
>USING BTREE
>    CREATE INDEX id_dictid ON table (dcid, dsid, drid, nxid) USING BTREE
>
>Test Run
>
>VACUUM ANALYZE FULL;
>VACUUM
>EXPLAIN ANALYZE SELECT DISTINCT dcid,dsid,drid FROM table WHERE owid = 1;
>NOTICE:  QUERY PLAN:
>  Unique  (cost=402.07..402.90 rows=11 width=20) (actual time=10.13..11.49
>rows=512 loops=1)
>    ->  Sort  (cost=402.07..402.07 rows=111 width=20) (actual
>time=10.13..10.46 rows=512 loops=1)
>          ->  Index Scan using id_owid on table (cost=0.00..398.30 rows=111
>width=20) (actual time=0.05..4.44 rows=512 loops=1)
>Total runtime: 11.95 msec
>
>EXPLAIN ANALYZE SELECT DISTINCT dcid,dsid,drid FROM table WHERE owid is null;
>NOTICE:  QUERY PLAN:
>  Unique  (cost=126611.73..128034.59 rows=18971 width=20) (actual
>time=27515.63..28359.88 rows=513 loops=1)
>    ->  Sort  (cost=126611.73..126611.73 rows=189714 width=20) (actual
>time=27515.62..27792.29 rows=199131 loops=1)
>          ->  Seq Scan on table  (cost=0.00..106885.04 rows=189714
>width=20) (actual time=18.76..16467.28 rows=199131 loops=1)
>Total runtime: 28633.68 msec
>
>SET enable_seqscan TO off;
>SET VARIABLE
>EXPLAIN SELECT DISTINCT dcid,dsid,drid FROM table WHERE owid is null;
>-- this process uses 34M RSS!!!
>NOTICE:  QUERY PLAN:
>  Unique  (cost=0.00..7887659.31 rows=18971 width=20) (actual
>time=2.57..711940.78 rows=513 loops=1)
>    ->  Index Scan using id_dictid on table  (cost=0.00..7886236.46
>rows=189714 width=20) (actual time=2.57..710482.07 rows=199131 loops=1)
>Total runtime: 711942.76 msec
>
>A couple of questions arise:
>
>1) Why chooses the planner to use id_owid (and not id_dowid as it would
>seem logical) for the first query?
>
It is not smart enough to realize that owid doesn't matter for
sorting/distinct purposes...
I think, something like this should make it do what you expect:

SELECT DISTINCT ON (owid,dcid,dsid,drid) dcid,dsid,drid FROM table WHERE owid = 1;


>2) Why is NO index used for the second query, the only difference being in
>the constraint value (owid is set vs. owid is null)?
>
Because the 'is null' operator is not indexable...
There was a long discussion here a while ago that, as far as I remember,
ended up with a resolution, that it is possible to make it indexable,
and it would be nice to do ... when somebody gets a chance to actually
implement it.

As a wrokaround, you may consider using partial indexes - like:

create id_dictid_noid on table (dcid, dsid, drid, nxid) where owid is null;

>3) Why does it use id_dictid for the second query when forced to, and not
>id_owid or id_dowid?
>
Because having owid in the beginning doesn't help at all, and using
id_dictid at least eliminates the need to sort.

>4) What could I do to get the planner to use the index access method (apart
>from setting enable_seqscan to off)?
>
>
Nothing really :-)
ANALYZE or SET STATISTICS to a higher value may help sometimes  (when
its refusal to use the index is due to misestimating the number of rows) ...
In your case though, it does seem to pick the best available plan, so
you, probably, don't want to force it use the index to begin with...


I hope, it helps...

Dima

>This is PostgreSQL 7.2.1 on a dual 1GHz / 1GB Dell server
>SHMALL = 128MB
>SHMMAX = 128MB
>shared_buffers = 4000
>sort_mem = 4000
>
>Thanks for any pointers/insights/whatever makes sense,
>
>
>
>



pgsql-general by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Re: Postgres performance comments from a MySQL user
Next
From: Stephan Szabo
Date:
Subject: Re: Query planner question