Thread: BUG #14011: select count(distinct column) does not utilizes indices to improve performance

BUG #14011: select count(distinct column) does not utilizes indices to improve performance

From

gtakahashi@palantir.com

Date:

10 March 2016, 04:39:00

The following bug has been logged on the website:

Bug reference:      14011
Logged by:          Glen Takahashi
Email address:      gtakahashi@palantir.com
PostgreSQL version: 9.3.6
Operating system:   Red Hat Enterprise Linux 6.7
Description:

Doing a select count(distinct column) does not utilize indices and takes a
very long time to compute.
For example

Create table test_table (number integer);
Create index on test_table (number);
Insert into test_table (number) select generate_series(1,100000); (I did
this ~500 times to replicate the scale of data we have)
Select count(distinct number) from test_table;
Count 100000 Time: 63419.600ms
Select count(1) from (select distinct number from test_table) a;
Count 100000 Time: 10743.186 ms

Re: BUG #14011: select count(distinct column) does not utilizes indices to improve performance

From

Tom Lane

Date:

10 March 2016, 04:50:16

gtakahashi@palantir.com writes:
> Doing a select count(distinct column) does not utilize indices

Nope, it does not.  Don't hold your breath waiting for that to change;
it might sometime, but nobody is working on it that I know of.

You'd have better luck with a "SELECT count(*) GROUP BY column"
type of query, I expect.  (Most likely that won't use an index either,
BTW, but it'll be smarter than count(distinct).)

            regards, tom lane