Thread: Fwd: Slow Count-Distinct Query

Fwd: Slow Count-Distinct Query

From

Varadharajan Mukundan

Date:

04 April 2014, 09:32:05

Sorry that i just joined the list and have to break the thread to reply to Tom Lane's response on this @ http://www.postgresql.org/message-id/13741.1396275339@sss.pgh.pa.us

Note that the indexscan is actually *slower* than the seqscan so far as
the table access is concerned; if the table were big enough to not fit
in RAM, this would get very much worse. So I'm not impressed with trying
to force the optimizer's hand as you've done here --- it might be a nice
plan now, but it's brittle. See if a bigger work_mem improves matters
enough with the regular plan.

I agree to the point that hand tuning optimiser is brittle and something that should not be done. But the reason to that was to force the index-only scan (Not the index scan). I feel Index-only scan would speed up given postgres is row oriented and we are running count-distinct on a column in a table with lot of columns (Say 6-7 in number). I think that is what have contributed to the gain in performance.

I did a similar test with around 2 million tuples with work_mem = 250 MB and got the query to respond with 2x speed up. But the speed-up got with index-only scan was huge and response was in sub-seconds whereas with work_mem the response was couple of seconds.

I doubt you need the "where email=email" hack, in any case. That isn't
forcing the optimizer's decision in any meaningful fashion.

This is the where clause which forced the index-only scan on partial index. AFAIK, whenever the the tuples estimated to be processed by query is going to be high, Postgres does a seq scan. This was happening even in the following query even though an index was present in the email column and index-only could have had much faster processing. I think the planner is taking a hit on estimating the cost of the query with index-only scan. So the little trick we had put in was to create a partial index on email field of the participants table and use the same condition of the partial index in the query to trick the optimiser to use index-only scans.

Query : select count(distinct email) from participants;

Please revert me back if i'm missing something.

--
Thanks,
M. Varadharajan

------------------------------------------------

"Experience is what you get when you didn't get what you wanted"
-By Prof. Randy Pausch in "The Last Lecture"

My Journal :- www.thinkasgeek.wordpress.com

Re: Fwd: Slow Count-Distinct Query

From

Jeff Janes

Date:

04 April 2014, 17:59:10

On Fri, Apr 4, 2014 at 2:31 AM, Varadharajan Mukundan <srinathsmn@gmail.com> wrote:

Sorry that i just joined the list and have to break the thread to reply to Tom Lane's response on this @ http://www.postgresql.org/message-id/13741.1396275339@sss.pgh.pa.us

Note that the indexscan is actually *slower* than the seqscan so far as
the table access is concerned; if the table were big enough to not fit
in RAM, this would get very much worse. So I'm not impressed with trying
to force the optimizer's hand as you've done here --- it might be a nice
plan now, but it's brittle. See if a bigger work_mem improves matters
enough with the regular plan.

I agree to the point that hand tuning optimiser is brittle and something that should not be done. But the reason to that was to force the index-only scan (Not the index scan). I feel Index-only scan would speed up given postgres is row oriented and we are running count-distinct on a column in a table with lot of columns (Say 6-7 in number). I think that is what have contributed to the gain in performance.

It looks like the original emailer wrote a query that the planner is not smart enough to plan properly (A known limitation of that kind of query). He then made a bunch of changes, none of which worked. He then re-wrote the query into a form for which the planner does a better job on. What we do not know is, what would happen if he undoes all of those other changes and *just* uses the new form of the query?

I did a similar test with around 2 million tuples with work_mem = 250 MB and got the query to respond with 2x speed up. But the speed-up got with index-only scan was huge and response was in sub-seconds whereas with work_mem the response was couple of seconds.

This change is almost certainly due to the change from a sort to a hash aggregate, and nothing to do with the index-only scan at all.

Re: Fwd: Slow Count-Distinct Query

From

Varadharajan Mukundan

Date:

05 April 2014, 01:13:26

Hi Jeff,

It looks like the original emailer wrote a query that the planner is not smart enough to plan properly (A known limitation of that kind of query). He then made a bunch of changes, none of which worked. He then re-wrote the query into a form for which the planner does a better job on. What we do not know is, what would happen if he undoes all of those other changes and *just* uses the new form of the query?

I was also pairing up with Chris (original emailer) on this issue and in order to reproduce it, i've a created a two column table with following schema with 1.9 Million dummy rows:

=================================

a=# \d participants

Table "public.participants"

Column | Type | Modifiers

--------+------------------------+-----------------------------------------------------------

id | integer | not null default nextval('participants_id_seq'::regclass)

email | character varying(255) |

I've tried out various scenarios on this table and recorded it as a transcript below: (Please read it as we read a shell script from top to bottom continuously to get the whole idea):

; Create table and Insert 1.9 Million rows

a=# create table participants(id serial, email varchar(255));

NOTICE: CREATE TABLE will create implicit sequence "participants_id_seq" for serial column "participants.id"

CREATE TABLE

a=# \d participants

Table "public.participants"

Column | Type | Modifiers

--------+------------------------+-----------------------------------------------------------

id | integer | not null default nextval('participants_id_seq'::regclass)

email | character varying(255) |

a=# copy participants from '/tmp/a.csv' with csv;

COPY 1999935

; Queries without any index

a=# EXPLAIN (ANALYZE) select count(1) from (select email from participants group by email) x;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=263449.51..263449.52 rows=1 width=0) (actual time=3898.329..3898.329 rows=1 loops=1)

-> Group (cost=241033.44..251033.12 rows=993311 width=16) (actual time=2766.805..3807.693 rows=1000000 loops=1)

-> Sort (cost=241033.44..246033.28 rows=1999935 width=16) (actual time=2766.803..3453.922 rows=1999935 loops=1)

Sort Key: participants.email

Sort Method: external merge Disk: 52552kB

-> Seq Scan on participants (cost=0.00..21910.20 rows=1999935 width=16) (actual time=0.013..362.511 rows=1999935 loops=1)

Total runtime: 3902.460 ms

(7 rows)

a=# EXPLAIN (ANALYZE) select count(distinct email) from participants; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=37738.19..37738.20 rows=1 width=16) (actual time=3272.854..3272.855 rows=1 loops=1)

-> Seq Scan on participants (cost=0.00..32738.35 rows=1999935 width=16) (actual time=0.049..236.518 rows=1999935 loops=1)

Total runtime: 3272.905 ms

(3 rows)

a=# EXPLAIN (ANALYZE) select count(1) from (select email from participants where email=email group by email) x;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=37874.94..37874.96 rows=1 width=0) (actual time=1549.258..1549.258 rows=1 loops=1)

-> HashAggregate (cost=37763.19..37812.86 rows=4967 width=16) (actual time=1168.114..1461.672 rows=1000000 loops=1)

-> Seq Scan on participants (cost=0.00..37738.19 rows=10000 width=16) (actual time=0.045..411.267 rows=1999935 loops=1)

Filter: ((email)::text = (email)::text)

Total runtime: 1567.586 ms

(5 rows)

; Creation of idx on email field

a=# create index email_idx on participants(email);

CREATE INDEX

a=# EXPLAIN (ANALYZE) select count(distinct email) from participants;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=37738.19..37738.20 rows=1 width=16) (actual time=3305.203..3305.203 rows=1 loops=1)

-> Seq Scan on participants (cost=0.00..32738.35 rows=1999935 width=16) (actual time=0.052..237.409 rows=1999935 loops=1)

Total runtime: 3305.253 ms

(3 rows)

a=# EXPLAIN (ANALYZE) select count(1) from (select email from participants group by email) x;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=48622.59..48622.60 rows=1 width=0) (actual time=1242.718..1242.718 rows=1 loops=1)

-> HashAggregate (cost=26273.09..36206.20 rows=993311 width=16) (actual time=855.215..1150.781 rows=1000000 loops=1)

-> Seq Scan on participants (cost=0.00..21273.25 rows=1999935 width=16) (actual time=0.058..217.105 rows=1999935 loops=1)

Total runtime: 1264.234 ms

(4 rows)

a=# drop index email_idx;

DROP INDEX

; Creation of partial index on email

a=# create index email_idx on participants(email) where email=email;

CREATE INDEX

a=# EXPLAIN (ANALYZE) select count(distinct email) from participants where email=email;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=12949.26..12949.27 rows=1 width=16) (actual time=3465.472..3465.473 rows=1 loops=1)

-> Bitmap Heap Scan on participants (cost=249.14..12924.26 rows=10000 width=16) (actual time=161.776..421.223 rows=1999935 loops=1)

Recheck Cond: ((email)::text = (email)::text)

-> Bitmap Index Scan on email_idx (cost=0.00..246.64 rows=10000 width=0) (actual time=159.446..159.446 rows=1999935 loops=1)

Total runtime: 3466.867 ms

(5 rows)

a=# set enable_bitmapscan = false;

a=# set seq_page_cost = 0.1;

SET

a=# set random_page_cost = 0.2;

SET

a=# explain analyze select count(distinct email) from participants where email=email;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=1517.16..1517.17 rows=1 width=16) (actual time=3504.310..3504.310 rows=1 loops=1)

-> Index Only Scan using email_idx on participants (cost=0.00..1492.16 rows=10000 width=16) (actual time=0.101..795.595 rows=1999935 loops=1)

Heap Fetches: 1999935

Total runtime: 3504.358 ms

(4 rows)

a=# explain analyze select count(1) from (select email from participants where email=email group by email) x;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=1579.25..1579.26 rows=1 width=0) (actual time=1193.912..1193.912 rows=1 loops=1)

-> Group (cost=0.00..1517.16 rows=4967 width=16) (actual time=0.096..1101.828 rows=1000000 loops=1)

-> Index Only Scan using email_idx on participants (cost=0.00..1492.16 rows=10000 width=16) (actual time=0.091..719.281 rows=1999935 loops=1)

Heap Fetches: 1999935

Total runtime: 1193.963 ms

(5 rows)

; Oh yes, cluster the rows by email

a=# create index email_idx_2 on participants(email)

a-# ;

CREATE INDEX

a=# cluster participants using email_idx_2;

CLUSTER

a=# EXPLAIN (ANALYZE) select count(1) from (select email from participants where email=email group by email) x;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=3376.63..3376.64 rows=1 width=0) (actual time=942.118..942.119 rows=1 loops=1)

-> Group (cost=0.00..3314.54 rows=4967 width=16) (actual time=0.080..851.315 rows=1000000 loops=1)

-> Index Only Scan using email_idx on participants (cost=0.00..3289.54 rows=10000 width=16) (actual time=0.076..472.101 rows=1999935 loops=1)

Heap Fetches: 1999935

Total runtime: 942.159 ms

(5 rows)

; Description of the table

a=# \d participants

Table "public.participants"

Column | Type | Modifiers

--------+------------------------+-----------------------------------------------------------

id | integer | not null default nextval('participants_id_seq'::regclass)

email | character varying(255) |

Indexes:

"email_idx" btree (email) WHERE email::text = email::text

"email_idx_2" btree (email) CLUSTER

Summary : Query execution time dropped from 3.9 secs to 900 ms

====================================

The gain from 3.9 secs to 900 ms is huge in this case and it would be more evident in a bigger table (with more rows and more columns).

I did a similar test with around 2 million tuples with work_mem = 250 MB and got the query to respond with 2x speed up. But the speed-up got with index-only scan was huge and response was in sub-seconds whereas with work_mem the response was couple of seconds.

This change is almost certainly due to the change from a sort to a hash aggregate, and nothing to do with the index-only scan at all.

I think i did not represent things more clearly. The experiment consisted of two options:

1. Increase work_mem and use the query without any fancy tuning [select count(1) from (select email from participants group by email) x]

2. No increase in work_mem, just force index_only scan.

Here is how they performed:

a=# explain analyze select count(1) from (select email from participants group by email) x;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=60087.68..60087.69 rows=1 width=0) (actual time=1256.626..1256.626 rows=1 loops=1)

-> HashAggregate (cost=37738.19..47671.30 rows=993311 width=16) (actual time=871.800..1168.298 rows=1000000 loops=1)

-> Seq Scan on participants (cost=0.00..32738.35 rows=1999935 width=16) (actual time=0.060..216.678 rows=1999935 loops=1)

Total runtime: 1277.964 ms

(4 rows)

a=# EXPLAIN (ANALYZE) select count(1) from (select email from participants where email=email group by email) x;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------

Aggregate (cost=3376.63..3376.64 rows=1 width=0) (actual time=942.118..942.119 rows=1 loops=1)

-> Group (cost=0.00..3314.54 rows=4967 width=16) (actual time=0.080..851.315 rows=1000000 loops=1)

-> Index Only Scan using email_idx on participants (cost=0.00..3289.54 rows=10000 width=16) (actual time=0.076..472.101 rows=1999935 loops=1)

Heap Fetches: 1999935

Total runtime: 942.159 ms

(5 rows)

Comment i made was that with index_only scan, i was able to get the query to respond in 940 ms with a work_mem of about 1 MB (default in my system) whereas in other case (scenario 1) it took a work_mem of 250 MB (I agree that 250MB might not be optimal, but its just a ballpark number) to respond in 1.2 seconds.

Slow Count-Distinct Query

From

Jeff Janes

Date:

06 April 2014, 17:26:25

On Friday, April 4, 2014, Varadharajan Mukundan <srinathsmn@gmail.com> wrote:

Hi Jeff,

It looks like the original emailer wrote a query that the planner is not smart enough to plan properly (A known limitation of that kind of query). He then made a bunch of changes, none of which worked. He then re-wrote the query into a form for which the planner does a better job on. What we do not know is, what would happen if he undoes all of those other changes and *just* uses the new form of the query?

I was also pairing up with Chris (original emailer) on this issue and in order to reproduce it, i've a created a two column table with following schema with 1.9 Million dummy rows:

=================================

a=# \d participants
Table "public.participants"
Column | Type | Modifiers
--------+------------------------+-----------------------------------------------------------
id | integer | not null default nextval('participants_id_seq'::regclass)
email | character varying(255) |

I've tried out various scenarios on this table and recorded it as a transcript below: (Please read it as we read a shell script from top to bottom continuously to get the whole idea):

; Create table and Insert 1.9 Million rows

a=# create table participants(id serial, email varchar(255));
NOTICE: CREATE TABLE will create implicit sequence "participants_id_seq" for serial column "participants.id"
CREATE TABLE
a=# \d participants
Table "public.participants"
Column | Type | Modifiers
--------+------------------------+-----------------------------------------------------------
id | integer | not null default nextval('participants_id_seq'::regclass)
email | character varying(255) |

a=# copy participants from '/tmp/a.csv' with csv;
COPY 1999935

Thanks for the detailed response. I don't have access to your /tmp/a.csv of course, I so I just used this:

insert into participants (email) select md5(floor(random()*1000000)::text) from generate_series(1,2000000);

This gives each email showing up about twice.

a=# EXPLAIN (ANALYZE) select count(1) from (select email from participants where email=email group by email) x;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=37874.94..37874.96 rows=1 width=0) (actual time=1549.258..1549.258 rows=1 loops=1)
-> HashAggregate (cost=37763.19..37812.86 rows=4967 width=16) (actual time=1168.114..1461.672 rows=1000000 loops=1)
-> Seq Scan on participants (cost=0.00..37738.19 rows=10000 width=16) (actual time=0.045..411.267 rows=1999935 loops=1)
Filter: ((email)::text = (email)::text)
Total runtime: 1567.586 ms
(5 rows)

What you have done here is trick the planner into thinking your query will be 200 times smaller than it actually is, and thus the hash table will be 200 times smaller than it actually is and therefore will fit in allowed memory. This is effective at getting the more efficient hash agg. But it no more safe than just giving it explicit permission to use that much memory for this query by increasing work_mem by 200 fold.

I am kind of surprised that the planner is so easily fooled by that.

; Creation of idx on email field

a=# create index email_idx on participants(email);
CREATE INDEX

a=# EXPLAIN (ANALYZE) select count(1) from (select email from participants group by email) x;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=48622.59..48622.60 rows=1 width=0) (actual time=1242.718..1242.718 rows=1 loops=1)
-> HashAggregate (cost=26273.09..36206.20 rows=993311 width=16) (actual time=855.215..1150.781 rows=1000000 loops=1)
-> Seq Scan on participants (cost=0.00..21273.25 rows=1999935 width=16) (actual time=0.058..217.105 rows=1999935 loops=1)
Total runtime: 1264.234 ms
(4 rows)

I can't reproduce this at all, except by increasing work_mem. The hash table needed for this is no smaller than the hash table needed before the index was built. Did you increase work_mem before the above plan?

Instead what I get is the index only scan (to provide order) feeding into a Group.

a=# drop index email_idx;
DROP INDEX

; Creation of partial index on email

a=# create index email_idx on participants(email) where email=email;
CREATE INDEX

a=# EXPLAIN (ANALYZE) select count(distinct email) from participants where email=email;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=12949.26..12949.27 rows=1 width=16) (actual time=3465.472..3465.473 rows=1 loops=1)
-> Bitmap Heap Scan on participants (cost=249.14..12924.26 rows=10000 width=16) (actual time=161.776..421.223 rows=1999935 loops=1)
Recheck Cond: ((email)::text = (email)::text)
-> Bitmap Index Scan on email_idx (cost=0.00..246.64 rows=10000 width=0) (actual time=159.446..159.446 rows=1999935 loops=1)
Total runtime: 3466.867 ms

I also cannot get this.

(5 rows)

a=# set enable_bitmapscan = false;
a=# set seq_page_cost = 0.1;
SET
a=# set random_page_cost = 0.2;
SET

These don't seem to accomplish anything for you. They switch the slow form of the query between two plans with about the same speed.

a=# explain analyze select count(distinct email) from participants where email=email;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1517.16..1517.17 rows=1 width=16) (actual time=3504.310..3504.310 rows=1 loops=1)
-> Index Only Scan using email_idx on participants (cost=0.00..1492.16 rows=10000 width=16) (actual time=0.101..795.595 rows=1999935 loops=1)
Heap Fetches: 1999935
Total runtime: 3504.358 ms
(4 rows)

a=# explain analyze select count(1) from (select email from participants where email=email group by email) x;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=1579.25..1579.26 rows=1 width=0) (actual time=1193.912..1193.912 rows=1 loops=1)
-> Group (cost=0.00..1517.16 rows=4967 width=16) (actual time=0.096..1101.828 rows=1000000 loops=1)
-> Index Only Scan using email_idx on participants (cost=0.00..1492.16 rows=10000 width=16) (actual time=0.091..719.281 rows=1999935 loops=1)
Heap Fetches: 1999935
Total runtime: 1193.963 ms
(5 rows)

; Oh yes, cluster the rows by email

a=# create index email_idx_2 on participants(email)
a-# ;
CREATE INDEX

a=# cluster participants using email_idx_2;
CLUSTER

a=# EXPLAIN (ANALYZE) select count(1) from (select email from participants where email=email group by email) x;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=3376.63..3376.64 rows=1 width=0) (actual time=942.118..942.119 rows=1 loops=1)
-> Group (cost=0.00..3314.54 rows=4967 width=16) (actual time=0.080..851.315 rows=1000000 loops=1)
-> Index Only Scan using email_idx on participants (cost=0.00..3289.54 rows=10000 width=16) (actual time=0.076..472.101 rows=1999935 loops=1)
Heap Fetches: 1999935
Total runtime: 942.159 ms
(5 rows)

Once I cluster and vacuum and analyze the table, I get this plan without having the partial index (just the regular index), without the "where email=email" and without disabling the bitmap scan or changing the page_cost parameters.

I usually get this plan without the cluster, to. I think it depends on the luck of the sampling in the autoanalyze.

Cheers,

Jeff

Re: Slow Count-Distinct Query

From

Varadharajan Mukundan

Date:

07 April 2014, 02:34:30

Hi Jeff,

Instead what I get is the index only scan (to provide order) feeding into a Group.

That's interesting. We tested out in two versions of Postgres (9.2 and 9.3) in different Mac machines and ended up with index-only scan only after the partial index. I remember doing a vacuum full analyse after each and every step.

I usually get this plan without the cluster, to. I think it depends on the luck of the sampling in the autoanalyze.

That's interesting as well. I think something like increasing the sample size would make it much better? Because, we had to perform so many steps to get the index-only scan working whereas its really obvious for anyone to guess that it should be the right approach. Also in a far corner of my mind, i'm also thinking whether any OS specific parameter would be considered (and is different in your system compared to my system) for coming up plans and choosing one of them.