Re: Fwd: Slow Count-Distinct Query - Mailing list pgsql-performance

From Varadharajan Mukundan
Subject Re: Fwd: Slow Count-Distinct Query
Date
Msg-id CACKkDGG7kVkPCoE3X+=THR92dAJwC8u606Q8v8dA_BODLJagng@mail.gmail.com
Whole thread Raw
In response to Re: Fwd: Slow Count-Distinct Query  (Jeff Janes <jeff.janes@gmail.com>)
Responses Slow Count-Distinct Query  (Jeff Janes <jeff.janes@gmail.com>)
List pgsql-performance
Hi Jeff,

It looks like the original emailer wrote a query that the planner is not smart enough to plan properly (A known limitation of that kind of query).  He then made a bunch of changes, none of which worked.  He then re-wrote the query into a form for which the planner does a better job on.  What we do not know is, what would happen if he undoes all of those other changes and *just* uses the new form of the query?


I was also pairing up with Chris (original emailer) on this issue and in order to reproduce it, i've a created a two column table with following schema with 1.9 Million dummy rows:

=================================

a=# \d participants
                                 Table "public.participants"
 Column |          Type          |                         Modifiers
--------+------------------------+-----------------------------------------------------------
 id     | integer                | not null default nextval('participants_id_seq'::regclass)
 email  | character varying(255) |

I've tried out various scenarios on this table and recorded it as a transcript below: (Please read it as we read a shell script from top to bottom continuously to get the whole idea):

; Create table and Insert 1.9 Million rows

a=# create table participants(id serial, email varchar(255));
NOTICE:  CREATE TABLE will create implicit sequence "participants_id_seq" for serial column "participants.id"
CREATE TABLE
a=# \d participants
                                 Table "public.participants"
 Column |          Type          |                         Modifiers
--------+------------------------+-----------------------------------------------------------
 id     | integer                | not null default nextval('participants_id_seq'::regclass)
 email  | character varying(255) |

a=# copy participants from '/tmp/a.csv' with csv;
COPY 1999935


; Queries without any index

a=# EXPLAIN (ANALYZE)  select count(1) from (select email from participants group by email) x;
                                                                QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=263449.51..263449.52 rows=1 width=0) (actual time=3898.329..3898.329 rows=1 loops=1)
   ->  Group  (cost=241033.44..251033.12 rows=993311 width=16) (actual time=2766.805..3807.693 rows=1000000 loops=1)
         ->  Sort  (cost=241033.44..246033.28 rows=1999935 width=16) (actual time=2766.803..3453.922 rows=1999935 loops=1)
               Sort Key: participants.email
               Sort Method: external merge  Disk: 52552kB
               ->  Seq Scan on participants  (cost=0.00..21910.20 rows=1999935 width=16) (actual time=0.013..362.511 rows=1999935 loops=1)
 Total runtime: 3902.460 ms
(7 rows)


a=#  EXPLAIN (ANALYZE)  select count(distinct email) from participants;                                                          QUERY PLAN                                                           ------------------------------------------------------------------------------------------------------------------------------- Aggregate  (cost=37738.19..37738.20 rows=1 width=16) (actual time=3272.854..3272.855 rows=1 loops=1)
   ->  Seq Scan on participants  (cost=0.00..32738.35 rows=1999935 width=16) (actual time=0.049..236.518 rows=1999935 loops=1)
 Total runtime: 3272.905 ms
(3 rows)

a=# EXPLAIN (ANALYZE) select count(1) from (select email from participants where email=email group by email) x;
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=37874.94..37874.96 rows=1 width=0) (actual time=1549.258..1549.258 rows=1 loops=1)
   ->  HashAggregate  (cost=37763.19..37812.86 rows=4967 width=16) (actual time=1168.114..1461.672 rows=1000000 loops=1)
         ->  Seq Scan on participants  (cost=0.00..37738.19 rows=10000 width=16) (actual time=0.045..411.267 rows=1999935 loops=1)
               Filter: ((email)::text = (email)::text)
 Total runtime: 1567.586 ms
(5 rows)


; Creation of idx on email field

a=# create index email_idx on participants(email);
CREATE INDEX

a=#  EXPLAIN (ANALYZE)  select count(distinct email) from participants;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=37738.19..37738.20 rows=1 width=16) (actual time=3305.203..3305.203 rows=1 loops=1)
   ->  Seq Scan on participants  (cost=0.00..32738.35 rows=1999935 width=16) (actual time=0.052..237.409 rows=1999935 loops=1)
 Total runtime: 3305.253 ms
(3 rows)

a=#  EXPLAIN (ANALYZE)  select count(1) from (select email from participants group by email) x;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=48622.59..48622.60 rows=1 width=0) (actual time=1242.718..1242.718 rows=1 loops=1)
   ->  HashAggregate  (cost=26273.09..36206.20 rows=993311 width=16) (actual time=855.215..1150.781 rows=1000000 loops=1)
         ->  Seq Scan on participants  (cost=0.00..21273.25 rows=1999935 width=16) (actual time=0.058..217.105 rows=1999935 loops=1)
 Total runtime: 1264.234 ms
(4 rows)

a=# drop index email_idx;
DROP INDEX

; Creation of partial index on email 

a=# create index email_idx on participants(email) where email=email;
CREATE INDEX

a=#  EXPLAIN (ANALYZE)  select count(distinct email) from participants where email=email;
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=12949.26..12949.27 rows=1 width=16) (actual time=3465.472..3465.473 rows=1 loops=1)
   ->  Bitmap Heap Scan on participants  (cost=249.14..12924.26 rows=10000 width=16) (actual time=161.776..421.223 rows=1999935 loops=1)
         Recheck Cond: ((email)::text = (email)::text)
         ->  Bitmap Index Scan on email_idx  (cost=0.00..246.64 rows=10000 width=0) (actual time=159.446..159.446 rows=1999935 loops=1)
 Total runtime: 3466.867 ms
(5 rows)

a=# set enable_bitmapscan = false;
a=# set seq_page_cost = 0.1;
SET
a=# set random_page_cost = 0.2;
SET

a=# explain analyze select count(distinct email) from participants where email=email;
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1517.16..1517.17 rows=1 width=16) (actual time=3504.310..3504.310 rows=1 loops=1)
   ->  Index Only Scan using email_idx on participants  (cost=0.00..1492.16 rows=10000 width=16) (actual time=0.101..795.595 rows=1999935 loops=1)
         Heap Fetches: 1999935
 Total runtime: 3504.358 ms
(4 rows)

a=# explain analyze select count(1) from (select email from participants where email=email group by email) x;
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1579.25..1579.26 rows=1 width=0) (actual time=1193.912..1193.912 rows=1 loops=1)
   ->  Group  (cost=0.00..1517.16 rows=4967 width=16) (actual time=0.096..1101.828 rows=1000000 loops=1)
         ->  Index Only Scan using email_idx on participants  (cost=0.00..1492.16 rows=10000 width=16) (actual time=0.091..719.281 rows=1999935 loops=1)
               Heap Fetches: 1999935
 Total runtime: 1193.963 ms
(5 rows)

; Oh yes, cluster the rows by email

a=# create index email_idx_2 on participants(email)
a-# ;
CREATE INDEX

a=# cluster participants using email_idx_2;
CLUSTER


a=# EXPLAIN (ANALYZE) select count(1) from (select email from participants where email=email group by email) x;
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=3376.63..3376.64 rows=1 width=0) (actual time=942.118..942.119 rows=1 loops=1)
   ->  Group  (cost=0.00..3314.54 rows=4967 width=16) (actual time=0.080..851.315 rows=1000000 loops=1)
         ->  Index Only Scan using email_idx on participants  (cost=0.00..3289.54 rows=10000 width=16) (actual time=0.076..472.101 rows=1999935 loops=1)
               Heap Fetches: 1999935
 Total runtime: 942.159 ms
(5 rows)

; Description of the table

a=# \d participants
                                 Table "public.participants"
 Column |          Type          |                         Modifiers
--------+------------------------+-----------------------------------------------------------
 id     | integer                | not null default nextval('participants_id_seq'::regclass)
 email  | character varying(255) |
Indexes:
    "email_idx" btree (email) WHERE email::text = email::text
    "email_idx_2" btree (email) CLUSTER


Summary : Query execution time dropped from 3.9 secs to 900 ms


====================================


The gain from 3.9 secs to 900 ms is huge in this case and it would be more evident in a bigger table (with more rows and more columns). 



I did a similar test with around 2 million tuples with work_mem = 250 MB and got the query to respond with 2x speed up. But the speed-up got with index-only scan was huge and response was in sub-seconds whereas with work_mem the response was couple of seconds.
 
This change is almost certainly due to the change from a sort to a hash aggregate, and nothing to do with the index-only scan at all.


I think i did not represent things more clearly. The experiment consisted of two options:

1. Increase work_mem and use the query without any fancy tuning [select count(1) from (select email from participants group by email) x]
2. No increase in work_mem, just force index_only scan.

Here is how they performed:

a=# explain analyze select count(1) from (select email from participants group by email) x;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=60087.68..60087.69 rows=1 width=0) (actual time=1256.626..1256.626 rows=1 loops=1)
   ->  HashAggregate  (cost=37738.19..47671.30 rows=993311 width=16) (actual time=871.800..1168.298 rows=1000000 loops=1)
         ->  Seq Scan on participants  (cost=0.00..32738.35 rows=1999935 width=16) (actual time=0.060..216.678 rows=1999935 loops=1)
 Total runtime: 1277.964 ms
(4 rows)

a=# EXPLAIN (ANALYZE) select count(1) from (select email from participants where email=email group by email) x;
                                                                       QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=3376.63..3376.64 rows=1 width=0) (actual time=942.118..942.119 rows=1 loops=1)
   ->  Group  (cost=0.00..3314.54 rows=4967 width=16) (actual time=0.080..851.315 rows=1000000 loops=1)
         ->  Index Only Scan using email_idx on participants  (cost=0.00..3289.54 rows=10000 width=16) (actual time=0.076..472.101 rows=1999935 loops=1)
               Heap Fetches: 1999935
 Total runtime: 942.159 ms
(5 rows)

Comment i made was that with index_only scan, i was able to get the query to respond in 940 ms with a work_mem of about 1 MB (default in my system) whereas in other case (scenario 1) it took  a work_mem of 250 MB (I agree that 250MB might not be optimal, but its just a ballpark number) to respond in 1.2 seconds.

--
Thanks,
M. Varadharajan

------------------------------------------------

"Experience is what you get when you didn't get what you wanted"
               -By Prof. Randy Pausch in "The Last Lecture"

My Journal :- www.thinkasgeek.wordpress.com

pgsql-performance by date:

Previous
From: "ktm@rice.edu"
Date:
Subject: Re: PGSQL 9.3 - Materialized View - multithreading
Next
From: Ryan Johnson
Date:
Subject: SSI slows down over time