Re: select count(distinct ...) is slower than select distinct in about 5x - Mailing list pgsql-performance

From Jeff Janes
Subject Re: select count(distinct ...) is slower than select distinct in about 5x
Date
Msg-id CAMkU=1wFdvb76NWVw=zLKF3H6-tn=+z9o_Nv2oioWfy6Z8Xo0A@mail.gmail.com
Whole thread Raw
In response to select count(distinct ...) is slower than select distinct in about 5x  (jacket41142 <jacket41142@gmail.com>)
List pgsql-performance
On Tue, Dec 10, 2013 at 9:28 AM, jacket41142 <jacket41142@gmail.com> wrote:
 

test=> select distinct col_int from t1 group by col_int;
Time: 1177.936 ms

So the performance difference is not very large.
But when I do that:

test=> select count(distinct col_int) from t1;
 count
-------
  1025
(1 row)

Time: 7367.476 ms


count(distinct ...) always sorts, rather than using a hash, to do its work.  I don't think that there is any fundamental reason that it could not be changed to allow it to use hashing, it just hasn't been done yet.  It is complicated by the fact that you can have multiple count() expressions in the same query which demand sorting/grouping on different columns.

Cheers,

Jeff

pgsql-performance by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: select count(distinct ...) is slower than select distinct in about 5x
Next
From: Merlin Moncure
Date:
Subject: Re: Parallel Select query performance and shared buffers