Thread: select ... distinct performance

select ... distinct performance

From

Don Bowman

Date:

31 January 2004, 20:48:04

I have a table with a large number of rows (10K in the example below,
but >1M in some databases). I would like to find the distinct
values for one of the columns. The column is indexed.

I would have expected that this would be a very fast operation,
simply walking down the index. In the example below, there is
only 1 unique value, but it takes 2 seconds. I would have
expected more like ~50ms.

explain analyze select distinct element from elem_trafficstats ;
NOTICE:  QUERY PLAN:

Unique  (cost=0.00..4117.18 rows=9350 width=44) (actual time=0.59..1710.34
rows=1 loops=1)
  ->  Index Scan using elem_trafficstats_element_idx on elem_trafficstats
(cost=0.00..3883.44 rows=93495 width=44) (actual time=0.58..1184.17
rows=93495 loops=1)
Total runtime: 1710.88 msec

is there an alternate way to construct a 'distinct' query
that will use the index properly?

--don

Re: select ... distinct performance

From

Martijn van Oosterhout

Date:

31 January 2004, 22:08:42

On Wed, Jan 28, 2004 at 11:20:30PM -0500, Don Bowman wrote:
> I have a table with a large number of rows (10K in the example below,
> but >1M in some databases). I would like to find the distinct
> values for one of the columns. The column is indexed.
>
> I would have expected that this would be a very fast operation,
> simply walking down the index. In the example below, there is
> only 1 unique value, but it takes 2 seconds. I would have
> expected more like ~50ms.

The problem is that the index doesn't contain info about which rows are
visibile in your current transaction, so it has to load the entire table to
check. Looks like it used the index to avoid a sort step. I don't think
there is a way to write this that doesn't need the whole table.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> (... have gone from d-i being barely usable even by its developers
> anywhere, to being about 20% done. Sweet. And the last 80% usually takes
> 20% of the time, too, right?) -- Anthony Towns, debian-devel-announce

Attachment

msg-32091-75066.dat

Re: select ... distinct performance

From

Don Bowman

Date:

31 January 2004, 22:43:06

From: Martijn van Oosterhout [mailto:kleptog@svana.org]
>
> On Wed, Jan 28, 2004 at 11:20:30PM -0500, Don Bowman wrote:
> > I have a table with a large number of rows (10K in the
> example below,
> > but >1M in some databases). I would like to find the distinct
> > values for one of the columns. The column is indexed.
> >
> > I would have expected that this would be a very fast operation,
> > simply walking down the index. In the example below, there is
> > only 1 unique value, but it takes 2 seconds. I would have
> > expected more like ~50ms.
>
> The problem is that the index doesn't contain info about
> which rows are
> visibile in your current transaction, so it has to load the
> entire table to
> check. Looks like it used the index to avoid a sort step. I
> don't think
> there is a way to write this that doesn't need the whole table.
>
> Hope this helps,
> --

It would appear that postgresql does not support index-only fetches
(e.g. DB2).

or, perhaps a materialized view. I see there is some work on going
for this.

I haven't tried a stored procedure like this...

select first name into prev_name from Table
while FETCH_OK:
  return prev_name
  fetch first name into prev_name from Table where name > prev_name
end

to see if it can walk the index.

--don