Re: Additional select fields in a GROUP BY - Mailing list pgsql-performance
From | Vitaly Belman |
---|---|
Subject | Re: Additional select fields in a GROUP BY |
Date | |
Msg-id | fa96e3c604061310354113e9f8@mail.gmail.com Whole thread Raw |
In response to | Re: Additional select fields in a GROUP BY (Bruno Wolff III <bruno@wolff.to>) |
Responses |
Re: Additional select fields in a GROUP BY
|
List | pgsql-performance |
Bruno: It wasn't exactly my case but you did give me an idea by this tip, changing a perspective did quite good to the timing of this query. Tom: Hmm.. I am not sure how I can demonstrate this to you... To see the time differences you'd need the whole table.. That's quite a lot of data to be posted on a mailing list, if you wish to test it on your side, I'll dump this table partly and send them to you somehow. I do stand by what I said though, here's the real query example: Original query (execution time, 800ms): select s.series_id, avg(vote_avg), sum(vote_count) from v_bookseries s, bv_seriesgenres sg where s.series_id = sg.series_id and sg.genre_id = 1 group by s.series_id order by sum(vote_count) desc limit 10 QUERY PLAN: Limit (cost=6523.51..6523.53 rows=10 width=12) -> Sort (cost=6523.51..6566.27 rows=17104 width=12) Sort Key: sum(b.vote_count) -> GroupAggregate (cost=1368.54..5320.92 rows=17104 width=12) -> Merge Join (cost=1368.54..4796.91 rows=58466 width=12) Merge Cond: ("outer".series_id = "inner".series_id) -> Merge Join (cost=0.00..6676.41 rows=65902 width=16) Merge Cond: ("outer".series_id = "inner".series_id) -> Index Scan using bv_series_pkey on bv_series s (cost=0.00..386.83 rows=17104 width=4) -> Index Scan using i_books_series_id on bv_books b (cost=0.00..14148.38 rows=171918 width=12) -> Sort (cost=1368.54..1406.47 rows=15173 width=4) Sort Key: sg.series_id -> Index Scan using i_seriesgenres_genre_id on bv_seriesgenres sg (cost=0.00..314.83 rows=15173 width=4) Index Cond: (genre_id = 1) Query with added GROUP BY members (execution time, 1400ms): select s.series_id, s.series_name, s.series_picture, avg(vote_avg), sum(vote_count) from v_bookseries s, bv_seriesgenres sg where s.series_id = sg.series_id and sg.genre_id = 1 group by s.series_id, s.series_name, s.series_picture order by sum(vote_count) desc limit 10 QUERY PLAN: Limit (cost=12619.76..12619.79 rows=10 width=47) -> Sort (cost=12619.76..12662.52 rows=17104 width=47) Sort Key: sum(b.vote_count) -> GroupAggregate (cost=10454.67..11417.18 rows=17104 width=47) -> Sort (cost=10454.67..10600.83 rows=58466 width=47) Sort Key: s.series_id, s.series_name, s.series_picture -> Merge Join (cost=1368.54..4796.91 rows=58466 width=47) Merge Cond: ("outer".series_id = "inner".series_id) -> Merge Join (cost=0.00..6676.41 rows=65902 width=51) Merge Cond: ("outer".series_id = "inner".series_id) -> Index Scan using bv_series_pkey on bv_series s (cost=0.00..386.83 rows=17104 width=39) -> Index Scan using i_books_series_id on bv_books b (cost=0.00..14148.38 rows=171918 width=12) -> Sort (cost=1368.54..1406.47 rows=15173 width=4) Sort Key: sg.series_id -> Index Scan using i_seriesgenres_genre_id on bv_seriesgenres sg (cost=0.00..314.83 rows=15173 width=4) Index Cond: (genre_id = 1) Notice that the GROUP BY items added the following to the plan: -> GroupAggregate (cost=10454.67..11417.18 rows=17104 width=47) -> Sort (cost=10454.67..10600.83 rows=58466 width=47) Sort Key: s.series_id, s.series_name, s.series_picture Which eventually almost doubles the execution time. On Sun, 13 Jun 2004 08:52:12 -0500, Bruno Wolff III <bruno@wolff.to> wrote: > > On Sun, Jun 13, 2004 at 06:21:17 +0300, > Vitaly Belman <vitalib@012.net.il> wrote: > > > > Consider the following query: > > > > select t1field1, avg(t2fieild2) > > from t1, t2 > > where t1.field1 = t2.field2 > > group by t1field1 > > > > That works fine. But I'd really like to see more fields of t1 in this > > query, however I can't add them into the select because they're not > > part of the GROUP BY, thus I have to add them to there too: > > If t1.field1 is a candiate key for t1, then the normal thing to do is > to group t2 by t2.field1 (assuming you really meant to join on t2.field1, > not t2.field2) and THEN join to t1. That may even be faster than the way you > are doing things now. > > So the query would look like: > > SELECT t1.field1, t1.field2, t1.field3, a.t2avg FROM t1, > (SELECT field1, avg(field2) as t2avg FROM t2 GROUP BY field1) as a > WHERE t1.field1 = a.field1 > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings >
pgsql-performance by date: