Re: Optimizer on sort aggregate - Mailing list pgsql-hackers
From | Tatsuo Ishii |
---|---|
Subject | Re: Optimizer on sort aggregate |
Date | |
Msg-id | 20141018.083516.694943555601559481.t-ishii@sraoss.co.jp Whole thread Raw |
In response to | Optimizer on sort aggregate (Feng Tian <fengttt@gmail.com>) |
Responses |
Re: Optimizer on sort aggregate
|
List | pgsql-hackers |
> The query, > select count(distinct j) from t group by t, i; > > runs for 35 seconds. However, if I change the query to, > select count(distinct j) from t group by i, t; -- note switching the > ordering > select count(distinct j) from t group by decode(t, 'escape'), i; -- convert > t to bytea > > Run times are just about 5 and 6.5 seconds. The reason is clear, compare a > string with collation is slow, which is well understood by pg hackers. > However, here, the sorting order is forced by the planner, not user. > Planner can do the following optimizations, Interesting. I got following result: test=# explain analyze select count(distinct j) from t group by t, i; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------GroupAggregate (cost=137519.84..157519.84 rows=1000000 width=22) (actual time=1332.937..2431.238 rows=1000000 loops=1) Group Key: t, i -> Sort (cost=137519.84..140019.84 rows=1000000 width=22) (actual time=1332.922..1507.413 rows=1000000 loops=1) Sort Key: t, i Sort Method: external merge Disk: 33232kB -> Seq Scan on t (cost=0.00..17352.00 rows=1000000width=22) (actual time=0.006..131.406 rows=1000000 loops=1)Planning time: 0.031 msExecution time: 2484.271 ms (8 rows) Time: 2484.520 ms test=# explain analyze select count(distinct j) from t group by i, t; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------GroupAggregate (cost=137519.84..157519.84 rows=1000000 width=22) (actual time=602.510..1632.087 rows=1000000 loops=1) Group Key: i, t -> Sort (cost=137519.84..140019.84 rows=1000000 width=22) (actual time=602.493..703.274 rows=1000000 loops=1) SortKey: i, t Sort Method: external sort Disk: 33240kB -> Seq Scan on t (cost=0.00..17352.00 rows=1000000width=22) (actual time=0.014..129.213 rows=1000000 loops=1)Planning time: 0.176 msExecution time: 1685.575 ms (8 rows) Time: 1687.641 ms Not so big difference here (maybe because I use SSD) but there is still about 50% difference in execution time. Note that I disable locale support. > 1. for the sort we generated for sort agg, planner can switch column > ordering, put int before string, > 2. for the sort we generated for sort agg, use bytea compare instead of > string compare. > > They will bring big improvement to this common query. Is this something > reasonable? Not looking at sort and planner code closely, I guess planner does not recognize the cost difference between "external merge" and "external sort" because cost estimations for sort step in each plan are exactly same (137519.84..140019.84). However I am not sure if we should fix the planner or should fix our sort machinery since it is possible that the sort machinery should not expose such a big difference in the two sort methods. -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp
pgsql-hackers by date: