Home > mailing lists

Re: SELECT DISTINCT very slow - Mailing list pgsql-general

From	Tom Lane
Subject	Re: SELECT DISTINCT very slow
Date	July 9, 2009 21:50:57
Msg-id	10703.1247187040@sss.pgh.pa.us Whole thread Raw
In response to	Re: SELECT DISTINCT very slow (Greg Stark <gsstark@mit.edu>)
Responses	Re: SELECT DISTINCT very slow
List	pgsql-general

Tree view

Greg Stark <gsstark@mit.edu> writes:
> Not really. The OP doesn't say how wide the record rows are but unless
> they're very wide it wouldn't pay to use an index for this even if you
> didn't have to access the heap also. It's going to be faster to scan
> the whole heap and either sort or use a hash. Currently there aren't
> many cases where a btree with 6,000 copies of 111 distinct keys is
> going to be useful.

It was 600,000 not 6,000 ... so a skip-scan might be worth the trouble,
but as you say we haven't done it.

In any case I think the real issue is that the OP is probably using a
pre-8.4 release which will always do SELECT DISTINCT via sort-and-unique.
Hash aggregation would be a whole lot faster for these numbers, even
if not exactly instantaneous.  He could update to 8.4, or go over to
using GROUP BY as was recommended upthread.

            regards, tom lane

pgsql-general by date:

From: Greg Stark
Date: 09 July 2009, 21:37:38
Subject: Re: SELECT DISTINCT very slow

From: "serafin segador"
Date: 09 July 2009, 21:58:49
Subject: Re: pg_dump PostgreSQL 8.4

Re: SELECT DISTINCT very slow - Mailing list pgsql-general

Previous

Next