Thread: GROUP BY Vs. Sub SELECT

GROUP BY Vs. Sub SELECT

From

"Bruno Almeida do Lago"

Date:

23 April 2006, 21:01:07

Hi,

I'm just getting familiar with EXPLAIN ANALYZE output, so I'd like to get
some help to identify which one of the following queries would be better:

teste=# EXPLAIN ANALYZE SELECT aa.avaliacao_id, MAX(aa.avaliacao_versao) AS
avaliacao_versao, a.avaliacao_nome, aa.editar
teste-# FROM teo01tb104_areas_avaliacoes aa, teo01tb201_avaliacoes a
teste-# WHERE aa.avaliacao_id=10
teste-# AND aa.avaliacao_id=a.avaliacao_id
teste-# GROUP BY aa.avaliacao_id, a.avaliacao_nome, aa.editar;
                                                                    QUERY
PLAN
----------------------------------------------------------------------------
-----------------------------------------------------------------------
 HashAggregate  (cost=45.93..45.94 rows=1 width=52) (actual
time=0.466..0.469 rows=1 loops=1)
   ->  Nested Loop  (cost=4.04..45.66 rows=27 width=52) (actual
time=0.339..0.356 rows=1 loops=1)
         ->  Bitmap Heap Scan on teo01tb201_avaliacoes a  (cost=2.01..8.49
rows=3 width=47) (actual time=0.219..0.223 rows=1 loops=1)
               Recheck Cond: (avaliacao_id = 10)
               ->  Bitmap Index Scan on teo01tb201_avaliacoes_pk
(cost=0.00..2.01 rows=3 width=0) (actual time=0.166..0.166 rows=1 loops=1)
                     Index Cond: (avaliacao_id = 10)
         ->  Bitmap Heap Scan on teo01tb104_areas_avaliacoes aa
(cost=2.03..12.30 rows=9 width=9) (actual time=0.060..0.066 rows=1 loops=1)
               Recheck Cond: (avaliacao_id = 10)
               ->  Bitmap Index Scan on teo01tb104_areas_avaliacoes_pk
(cost=0.00..2.03 rows=9 width=0) (actual time=0.040..0.040 rows=1 loops=1)
                     Index Cond: (avaliacao_id = 10)
 Total runtime: 1.339 ms
(11 rows)

teste=# SELECT a.avaliacao_id, a.avaliacao_versao, a.avaliacao_nome,
aa.editar
teste-# FROM teo01tb201_avaliacoes a, teo01tb104_areas_avaliacoes aa
teste-# WHERE a.avaliacao_id=10
teste-# AND a.avaliacao_versao=(SELECT MAX(avaliacao_versao)
teste(#                         FROM teo01tb201_avaliacoes
teste(#                         WHERE avaliacao_id=10)
teste-# AND a.avaliacao_id=aa.avaliacao_id;
 avaliacao_id | avaliacao_versao | avaliacao_nome | editar
--------------+------------------+----------------+--------
           10 |                1 | Teste          | t
(1 row)

teste=# EXPLAIN ANALYZE SELECT a.avaliacao_id, a.avaliacao_versao,
a.avaliacao_nome, aa.editar
teste-# FROM teo01tb201_avaliacoes a, teo01tb104_areas_avaliacoes aa
teste-# WHERE a.avaliacao_id=10
teste-# AND a.avaliacao_versao=(SELECT MAX(avaliacao_versao)
teste(#                         FROM teo01tb201_avaliacoes
teste(#                         WHERE avaliacao_id=10)
teste-# AND a.avaliacao_id=aa.avaliacao_id;

QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
------------------------
 Nested Loop  (cost=6.20..22.38 rows=9 width=52) (actual time=0.573..0.596
rows=1 loops=1)
   InitPlan
     ->  Result  (cost=4.16..4.17 rows=1 width=0) (actual time=0.315..0.319
rows=1 loops=1)
           InitPlan
             ->  Limit  (cost=0.00..4.16 rows=1 width=4) (actual
time=0.257..0.261 rows=1 loops=1)
                   ->  Index Scan Backward using teo01tb201_avaliacoes_pk on
teo01tb201_avaliacoes  (cost=0.00..12.48 rows=3 width=4) (actual
time=0.245..0.245 rows=1 loops=1)
                         Index Cond: (avaliacao_id = 10)
                         Filter: (avaliacao_versao IS NOT NULL)
   ->  Index Scan using teo01tb201_avaliacoes_pk on teo01tb201_avaliacoes a
(cost=0.00..5.83 rows=1 width=51) (actual time=0.410..0.420 rows=1 loops=1)
         Index Cond: ((avaliacao_id = 10) AND (avaliacao_versao = $1))
   ->  Bitmap Heap Scan on teo01tb104_areas_avaliacoes aa  (cost=2.03..12.30
rows=9 width=5) (actual time=0.110..0.114 rows=1 loops=1)
         Recheck Cond: (avaliacao_id = 10)
         ->  Bitmap Index Scan on teo01tb104_areas_avaliacoes_pk
(cost=0.00..2.03 rows=9 width=0) (actual time=0.074..0.074 rows=1 loops=1)
               Index Cond: (avaliacao_id = 10)
 Total runtime: 1.418 ms
(15 rows)


I think 2nd would be better, since when database grow up the GROUP BY may
become too costly. Is that right?

Regards,
Bruno

Re: GROUP BY Vs. Sub SELECT

From

Tom Lane

Date:

23 April 2006, 23:34:33

"Bruno Almeida do Lago" <teolupus@gmail.com> writes:
> I'm just getting familiar with EXPLAIN ANALYZE output, so I'd like to get
> some help to identify which one of the following queries would be better:

Well, you're breaking one of the first laws of PG performance analysis,
which is to not try to extrapolate the behavior on large tables from the
behavior on toy tables.  You can't really see where the bottlenecks are
on a toy example, and what's more there's no reason to think that the
planner will use the same plan when presented with much larger tables.
So you need to load up a meaningful amount of data (don't forget to
ANALYZE afterward!) and then see what it does.

> I think 2nd would be better, since when database grow up the GROUP BY may
> become too costly. Is that right?

The two queries don't give the same answer, so asking which is faster
is a bit irrelevant.  (When there's more than one group, wouldn't the
per-group MAXes be different?)

            regards, tom lane

Re: GROUP BY Vs. Sub SELECT

From

"Bruno Almeida do Lago"

Date:

24 April 2006, 18:47:35

OK! I totally understand what you said. I'll load this table with a
simulated data and see how PG deals with it.

About the queries being different, yes, I'm sure they are :-) I did not
mention that application is able to handle both.

I'd like to get more info on EXPLAIN ANALYZE output... where can I read more
about it?

Thank you very much for your attention!!

Regards,
Bruno

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Sunday, April 23, 2006 8:34 PM
To: Bruno Almeida do Lago
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] GROUP BY Vs. Sub SELECT

"Bruno Almeida do Lago" <teolupus@gmail.com> writes:
> I'm just getting familiar with EXPLAIN ANALYZE output, so I'd like to get
> some help to identify which one of the following queries would be better:

Well, you're breaking one of the first laws of PG performance analysis,
which is to not try to extrapolate the behavior on large tables from the
behavior on toy tables.  You can't really see where the bottlenecks are
on a toy example, and what's more there's no reason to think that the
planner will use the same plan when presented with much larger tables.
So you need to load up a meaningful amount of data (don't forget to
ANALYZE afterward!) and then see what it does.

> I think 2nd would be better, since when database grow up the GROUP BY may
> become too costly. Is that right?

The two queries don't give the same answer, so asking which is faster
is a bit irrelevant.  (When there's more than one group, wouldn't the
per-group MAXes be different?)

            regards, tom lane

Re: GROUP BY Vs. Sub SELECT

From

Richard Broersma Jr

Date:

24 April 2006, 19:54:52

> I'd like to get more info on EXPLAIN ANALYZE output... where can I read more
> about it?

I believe this link has what you are looking for:
http://www.postgresql.org/docs/8.1/interactive/performance-tips.html


Regards,

Richard Broersma Jr.

Re: GROUP BY Vs. Sub SELECT

From

"Jim C. Nasby"

Date:

24 April 2006, 21:05:53

On Mon, Apr 24, 2006 at 12:07:39PM -0700, Richard Broersma Jr wrote:
>
> > I'd like to get more info on EXPLAIN ANALYZE output... where can I read more
> > about it?
>
> I believe this link has what you are looking for:
> http://www.postgresql.org/docs/8.1/interactive/performance-tips.html

http://www.pervasive-postgres.com/lp/newsletters/2006/Insights_postgres_Apr.asp#4
might also be worth your time to read...
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461