Re: Additional select fields in a GROUP BY - Mailing list pgsql-performance

From Vitaly Belman
Subject Re: Additional select fields in a GROUP BY
Date
Msg-id fa96e3c604061310354113e9f8@mail.gmail.com
Whole thread Raw
In response to Re: Additional select fields in a GROUP BY  (Bruno Wolff III <bruno@wolff.to>)
Responses Re: Additional select fields in a GROUP BY
List pgsql-performance
Bruno:

It wasn't exactly my case but you did give me an idea by this tip,
changing a perspective did quite good to the timing of this query.

Tom:

Hmm.. I am not sure how I can demonstrate this to you... To see the
time differences you'd need the whole table.. That's quite a lot of
data to be posted on a mailing list, if you wish to test it on your
side, I'll dump this table partly and send them to you somehow.

I do stand by what I said though, here's the real query example:

Original query (execution time, 800ms):

select s.series_id, avg(vote_avg), sum(vote_count)
from v_bookseries s, bv_seriesgenres sg
where s.series_id = sg.series_id and sg.genre_id = 1
group by s.series_id
order by sum(vote_count) desc
limit 10

QUERY PLAN:

Limit  (cost=6523.51..6523.53 rows=10 width=12)
  ->  Sort  (cost=6523.51..6566.27 rows=17104 width=12)
        Sort Key: sum(b.vote_count)
        ->  GroupAggregate  (cost=1368.54..5320.92 rows=17104 width=12)
              ->  Merge Join  (cost=1368.54..4796.91 rows=58466 width=12)
                    Merge Cond: ("outer".series_id = "inner".series_id)
                    ->  Merge Join  (cost=0.00..6676.41 rows=65902 width=16)
                          Merge Cond: ("outer".series_id = "inner".series_id)
                          ->  Index Scan using bv_series_pkey on
bv_series s  (cost=0.00..386.83 rows=17104 width=4)
                          ->  Index Scan using i_books_series_id on
bv_books b  (cost=0.00..14148.38 rows=171918 width=12)
                    ->  Sort  (cost=1368.54..1406.47 rows=15173 width=4)
                          Sort Key: sg.series_id
                          ->  Index Scan using i_seriesgenres_genre_id
on bv_seriesgenres sg  (cost=0.00..314.83 rows=15173 width=4)
                                Index Cond: (genre_id = 1)


Query with added GROUP BY members (execution time, 1400ms):

select s.series_id, s.series_name, s.series_picture, avg(vote_avg),
sum(vote_count)
from v_bookseries s, bv_seriesgenres sg
where s.series_id = sg.series_id and sg.genre_id = 1
group by s.series_id, s.series_name, s.series_picture
order by sum(vote_count) desc
limit 10

QUERY PLAN:

Limit  (cost=12619.76..12619.79 rows=10 width=47)
  ->  Sort  (cost=12619.76..12662.52 rows=17104 width=47)
        Sort Key: sum(b.vote_count)
        ->  GroupAggregate  (cost=10454.67..11417.18 rows=17104 width=47)
              ->  Sort  (cost=10454.67..10600.83 rows=58466 width=47)
                    Sort Key: s.series_id, s.series_name, s.series_picture
                    ->  Merge Join  (cost=1368.54..4796.91 rows=58466 width=47)
                          Merge Cond: ("outer".series_id = "inner".series_id)
                          ->  Merge Join  (cost=0.00..6676.41
rows=65902 width=51)
                                Merge Cond: ("outer".series_id =
"inner".series_id)
                                ->  Index Scan using bv_series_pkey on
bv_series s  (cost=0.00..386.83 rows=17104 width=39)
                                ->  Index Scan using i_books_series_id
on bv_books b  (cost=0.00..14148.38 rows=171918 width=12)
                          ->  Sort  (cost=1368.54..1406.47 rows=15173 width=4)
                                Sort Key: sg.series_id
                                ->  Index Scan using
i_seriesgenres_genre_id on bv_seriesgenres sg  (cost=0.00..314.83
rows=15173 width=4)
                                      Index Cond: (genre_id = 1)

Notice that the GROUP BY items added the following to the plan:

        ->  GroupAggregate  (cost=10454.67..11417.18 rows=17104 width=47)
              ->  Sort  (cost=10454.67..10600.83 rows=58466 width=47)
                    Sort Key: s.series_id, s.series_name, s.series_picture

Which eventually almost doubles the execution time.


On Sun, 13 Jun 2004 08:52:12 -0500, Bruno Wolff III <bruno@wolff.to> wrote:
>
> On Sun, Jun 13, 2004 at 06:21:17 +0300,
>  Vitaly Belman <vitalib@012.net.il> wrote:
> >
> > Consider the following query:
> >
> > select t1field1, avg(t2fieild2)
> > from t1, t2
> > where t1.field1 = t2.field2
> > group by t1field1
> >
> > That works fine. But I'd really like to see more fields of t1 in this
> > query, however I can't add them into the select because they're not
> > part of the GROUP BY, thus I have to add them to there too:
>
> If t1.field1 is a candiate key for t1, then the normal thing to do is
> to group t2 by t2.field1 (assuming you really meant to join on t2.field1,
> not t2.field2) and THEN join to t1. That may even be faster than the way you
> are doing things now.
>
> So the query would look like:
>
> SELECT t1.field1, t1.field2, t1.field3, a.t2avg FROM t1,
>  (SELECT field1, avg(field2) as t2avg FROM t2 GROUP BY field1) as a
>  WHERE t1.field1 = a.field1
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>

pgsql-performance by date:

Previous
From: Bruno Wolff III
Date:
Subject: Re: Additional select fields in a GROUP BY
Next
From: Tom Lane
Date:
Subject: Re: Additional select fields in a GROUP BY