Re: group by of multi columns - Mailing list pgsql-general
From | Andreas Kretschmer |
---|---|
Subject | Re: group by of multi columns |
Date | |
Msg-id | 20150104115952.GA4700@tux Whole thread Raw |
In response to | group by of multi columns (Flyingfox Lee <flyingfoxlee@gmail.com>) |
List | pgsql-general |
Flyingfox Lee <flyingfoxlee@gmail.com> wrote: > I am doing a `group by` on a table with ~ 3 million rows, the code is simply > `select A, B, C, D,E count(1) from t group by A, B, C, D, E order by 6`, it > takes ~ 3 minutes for this operation and there are ~ 500 rows returned. So, to > speed this up, should I add a composite index on A, B, C, D, E or there are > some parameters in postgresql.conf I can tweak, I am new to postgres, all the > parameters in postgresql.conf are the default. The only thing you can do is run the query with explain analyse and see how it work. You can tweak work_mem, a simple example: test=# create table b (a int, b int, c int, d int); CREATE TABLE Time: 0,735 ms test=*# insert into b select (random() * 1000)::int, (random()*1000)::int, (random() * 1000)::int, (random() * 1000)::intfrom generate_series(1,100000) s; INSERT 0 100000 Time: 332,212 ms test=*# explain analyse select a,b,c,d, count(1) from b group by a,b,c,d; QUERY PLAN --------------------------------------------------------------------------------------------------------------- HashAggregate (cost=2695.53..2791.29 rows=9576 width=16) (actual time=126.904..191.598 rows=100000 loops=1) Group Key: a, b, c, d -> Seq Scan on b (cost=0.00..1498.57 rows=95757 width=16) (actual time=0.012..33.520 rows=100000 loops=1) Planning time: 0.095 ms Execution time: 214.584 ms (5 rows) Time: 215,121 ms test=*# set work_mem to '64kB'; SET Time: 0,109 ms test=*# explain analyse select a,b,c,d, count(1) from b group by a,b,c,d; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=12697.07..14229.18 rows=9576 width=16) (actual time=206.603..388.892 rows=100000 loops=1) Group Key: a, b, c, d -> Sort (cost=12697.07..12936.46 rows=95757 width=16) (actual time=206.577..276.864 rows=100000 loops=1) Sort Key: a, b, c, d Sort Method: external merge Disk: 2552kB -> Seq Scan on b (cost=0.00..1498.57 rows=95757 width=16) (actual time=0.014..33.412 rows=100000 loops=1) Planning time: 0.071 ms Execution time: 413.876 ms (8 rows) Time: 414,246 ms test=*# set work_mem to '4MB'; SET Time: 0,059 ms test=*# explain analyse select a,b,c,d, count(1) from b group by a,b,c,d; QUERY PLAN --------------------------------------------------------------------------------------------------------------- HashAggregate (cost=2695.53..2791.29 rows=9576 width=16) (actual time=129.093..194.711 rows=100000 loops=1) Group Key: a, b, c, d -> Seq Scan on b (cost=0.00..1498.57 rows=95757 width=16) (actual time=0.014..33.762 rows=100000 loops=1) Planning time: 0.067 ms Execution time: 219.694 ms (5 rows) so, if you can see a 'Sort Method: external merge Disk', you should increase work_mem. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknown) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
pgsql-general by date: