hash agg is slower on wide tables? - Mailing list pgsql-hackers

From Pavel Stehule
Subject hash agg is slower on wide tables?
Date
Msg-id CAFj8pRA843nYiHS33jx9=EprNwKYSNrPQdRWzHcdQDw=RvCikA@mail.gmail.com
Whole thread Raw
Responses Re: hash agg is slower on wide tables?  (Pavel Stehule <pavel.stehule@gmail.com>)
List pgsql-hackers
Hi

I did some benchmarks and I found some strange numbers.

do $$
begin
  drop table if exists t1;
  execute 'create table t1(' ||
     array_to_string(array(select 'a' || i || ' smallint' from generate_series(1,30) g(i)), ',') || ')';
  -- special column a2, a11
  insert into t1
    select 2008, i % 12 + 1, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20,
           case when random() < 0.9 then 1 else 0 end, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20,
           random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20, random()*20
      from generate_series(1,7009728) g(i);
  drop table if exists t2;
  create table t2 as select a2, a11 from t1;
  analyze t1; analyze t2;
end;
$$;

postgres=# \dt+ t*
                  List of relations
 Schema | Name | Type  | Owner |  Size  | Description
--------+------+-------+-------+--------+-------------
 public | t1   | table | pavel | 622 MB |
 public | t2   | table | pavel | 242 MB |
(2 rows)

postgres=# explain analyze select count(*), a2, a11 from t1 group by 3,2 order by 3,2;
                                                        QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=202327.03..202327.09 rows=24 width=4) (actual time=2609.159..2609.161 rows=24 loops=1)
   Sort Key: a11, a2
   Sort Method: quicksort  Memory: 26kB
   ->  HashAggregate  (cost=202326.24..202326.48 rows=24 width=4) (actual time=2609.137..2609.139 rows=24 loops=1) --- grouping 1997 ms
         Group Key: a11, a2
         ->  Seq Scan on t1  (cost=0.00..149753.28 rows=7009728 width=4) (actual time=0.071..616.222 rows=7009728 loops=1)
 Planning time: 0.138 ms
 Execution time: 2609.247 ms
(8 rows)

postgres=# explain analyze select count(*), a2, a11 from t2 group by 3,2 order by 3,2;
                                                        QUERY PLAN                                                        
---------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=153688.03..153688.09 rows=24 width=4) (actual time=2100.058..2100.059 rows=24 loops=1)
   Sort Key: a11, a2
   Sort Method: quicksort  Memory: 26kB
   ->  HashAggregate  (cost=153687.24..153687.48 rows=24 width=4) (actual time=2100.037..2100.040 rows=24 loops=1) --- grouping 1567 ms -- 25% faster
         Group Key: a11, a2
         ->  Seq Scan on t2  (cost=0.00..101114.28 rows=7009728 width=4) (actual time=0.043..532.680 rows=7009728 loops=1)
 Planning time: 0.178 ms
 Execution time: 2100.158 ms
(8 rows)

postgres=# \dt+ t*
                   List of relations
 Schema | Name | Type  | Owner |  Size   | Description
--------+------+-------+-------+---------+-------------
 public | t1   | table | pavel | 6225 MB |
 public | t2   | table | pavel | 2423 MB |
(2 rows)

postgres=# explain analyze select count(*), a2, a11 from t1 group by 3,2 order by 3,2;
                                                           QUERY PLAN                                                           
---------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=2023263.19..2023263.25 rows=24 width=4) (actual time=99453.272..99453.274 rows=24 loops=1)
   Sort Key: a11, a2
   Sort Method: quicksort  Memory: 26kB
   ->  HashAggregate  (cost=2023262.40..2023262.64 rows=24 width=4) (actual time=99453.244..99453.249 rows=24 loops=1) --- 31891 ms
         Group Key: a11, a2
         ->  Seq Scan on t1  (cost=0.00..1497532.80 rows=70097280 width=4) (actual time=16.935..67562.615 rows=70097280 loops=1)
 Planning time: 14.526 ms
 Execution time: 99453.413 ms
(8 rows)

postgres=# explain analyze select count(*), a2, a11 from t2 group by 3,2 order by 3,2;
                                                          QUERY PLAN                                                          
-------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=1536868.33..1536868.39 rows=24 width=4) (actual time=20656.397..20656.399 rows=24 loops=1)
   Sort Key: a11, a2
   Sort Method: quicksort  Memory: 26kB
   ->  HashAggregate  (cost=1536867.54..1536867.78 rows=24 width=4) (actual time=20656.375..20656.378 rows=24 loops=1) --- 15248 ms --100% faster
         Group Key: a11, a2
         ->  Seq Scan on t2  (cost=0.00..1011137.88 rows=70097288 width=4) (actual time=0.060..5408.205 rows=70097280 loops=1)
 Planning time: 0.161 ms
 Execution time: 20656.475 ms
(8 rows)

It looks like hah agg is slower when it is based on wide table about 25-100%. Is it - or I don't see something?

Regards

Pavel

pgsql-hackers by date:

Previous
From: Jeff Davis
Date:
Subject: Re: 9.5: Better memory accounting, towards memory-bounded HashAgg
Next
From: Pavel Stehule
Date:
Subject: Re: hash agg is slower on wide tables?