sequential scan on select distinct - Mailing list pgsql-performance

From Ole Langbehn
Subject sequential scan on select distinct
Date
Msg-id 200410061130.58625.ole@freiheit.com
Whole thread Raw
Responses Re: sequential scan on select distinct
List pgsql-performance
Hi,

I'm using Postgres 7.4.5. Tables are analyzed & vacuumed.

I am wondering why postgresql never uses an index on queries of the type
'select distinct ...' while e.g. mysql uses the index on the same query.
See the following explains:


postgresql:

explain analyze select distinct "land" from "customer_dim";

---------------------------------------------------------------------------------------------------------------------------------------+
                                                              QUERY PLAN
              | 

---------------------------------------------------------------------------------------------------------------------------------------+
 Unique  (cost=417261.85..430263.66 rows=18 width=15) (actual time=45875.235..67204.694 rows=103 loops=1)
              | 
   ->  Sort  (cost=417261.85..423762.75 rows=2600362 width=15) (actual time=45875.226..54114.473 rows=2600362 loops=1)
              | 
         Sort Key: land
              | 
         ->  Seq Scan on customer_dim  (cost=0.00..84699.62 rows=2600362 width=15) (actual time=0.048..10733.227
rows=2600362loops=1) | 
 Total runtime: 67246.465 ms
              | 

---------------------------------------------------------------------------------------------------------------------------------------+


mysql:

explain select DISTINCT `customer_dim`.`land` from `customer_dim`;
--------------+-------+---------------+---------------+---------+--------+---------+-------------+
    table     | type  | possible_keys |      key      | key_len |  ref   |  rows   |    Extra    |
--------------+-------+---------------+---------------+---------+--------+---------+-------------+
 customer_dim | index | [NULL]        | IDX_cstd_land | 81      | [NULL] | 2600362 | Using index |
--------------+-------+---------------+---------------+---------+--------+---------+-------------+
1 row in result (first row: 8 msec; total: 9 msec)



The result set contains 103 rows (but i get this behavior with every query of
this kind). My tables consist of at least a million rows.

The indexes on the column 'land' are standard indexes, so in case of
postgresql, it's a btree-index. I've tried to change the index type, but to no
avail.

So, why doesn't postgresql use the index, and (how) could i persuade postgresql
to use an index for this type of query?

TiA

--
Ole Langbehn

freiheit.com technologies gmbh
Theodorstr. 42-90 / 22761 Hamburg, Germany
fon       +49 (0)40 / 890584-0
fax       +49 (0)40 / 890584-20

Freie Software durch Bücherkauf fördern | http://bookzilla.de/

pgsql-performance by date:

Previous
From: Greg Stark
Date:
Subject: Re: Excessive context switching on SMP Xeons
Next
From: "Alban Médici (NetCentrex)"
Date:
Subject: stats on cursor and query execution troubleshooting