Re: SELECT DISTINCT very slow - Mailing list pgsql-general

From Tom Lane
Subject Re: SELECT DISTINCT very slow
Date
Msg-id 10703.1247187040@sss.pgh.pa.us
Whole thread Raw
In response to Re: SELECT DISTINCT very slow  (Greg Stark <gsstark@mit.edu>)
Responses Re: SELECT DISTINCT very slow
List pgsql-general
Greg Stark <gsstark@mit.edu> writes:
> Not really. The OP doesn't say how wide the record rows are but unless
> they're very wide it wouldn't pay to use an index for this even if you
> didn't have to access the heap also. It's going to be faster to scan
> the whole heap and either sort or use a hash. Currently there aren't
> many cases where a btree with 6,000 copies of 111 distinct keys is
> going to be useful.

It was 600,000 not 6,000 ... so a skip-scan might be worth the trouble,
but as you say we haven't done it.

In any case I think the real issue is that the OP is probably using a
pre-8.4 release which will always do SELECT DISTINCT via sort-and-unique.
Hash aggregation would be a whole lot faster for these numbers, even
if not exactly instantaneous.  He could update to 8.4, or go over to
using GROUP BY as was recommended upthread.

            regards, tom lane

pgsql-general by date:

Previous
From: Greg Stark
Date:
Subject: Re: SELECT DISTINCT very slow
Next
From: "serafin segador"
Date:
Subject: Re: pg_dump PostgreSQL 8.4