Home > mailing lists

Re: select count(distinct ...) is slower than select distinct in about 5x - Mailing list pgsql-performance

From	Jeff Janes
Subject	Re: select count(distinct ...) is slower than select distinct in about 5x
Date	December 10, 2013 18:19:52
Msg-id	CAMkU=1wFdvb76NWVw=zLKF3H6-tn=+z9o_Nv2oioWfy6Z8Xo0A@mail.gmail.com Whole thread
In response to	select count(distinct ...) is slower than select distinct in about 5x (jacket41142 <jacket41142@gmail.com>)
List	pgsql-performance

Tree view

On Tue, Dec 10, 2013 at 9:28 AM, jacket41142 <jacket41142@gmail.com> wrote:

test=> select distinct col_int from t1 group by col_int;
Time: 1177.936 ms

So the performance difference is not very large.
But when I do that:

test=> select count(distinct col_int) from t1;
count
-------
1025
(1 row)

Time: 7367.476 ms

count(distinct ...) always sorts, rather than using a hash, to do its work. I don't think that there is any fundamental reason that it could not be changed to allow it to use hashing, it just hasn't been done yet. It is complicated by the fact that you can have multiple count() expressions in the same query which demand sorting/grouping on different columns.

Cheers,

Jeff

pgsql-performance by date:

From: Kevin Grittner
Date: 10 December 2013, 18:16:46
Subject: Re: select count(distinct ...) is slower than select distinct in about 5x

From: Merlin Moncure
Date: 10 December 2013, 20:03:16
Subject: Re: Parallel Select query performance and shared buffers

Re: select count(distinct ...) is slower than select distinct in about 5x - Mailing list pgsql-performance

Previous

Next