Home > mailing lists

"group by" is quite expensive - Mailing list pgsql-general

From	Albertson, Chris
Subject	"group by" is quite expensive
Date	May 2, 2001 20:17:58
Msg-id	71EFB3F67FADD3119C0A00508B5544419661ED@mailtest123.com Whole thread Raw
Responses	Re: "group by" is quite expensive (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-general

Tree view

What can I do to speed up queries like the following

   select count(*) from ttt group by xxx;

Using Postgres 7.1 on Solaris 8 the above query on a 1M row
database takes 6 times longer than the following

   select count(*) from ttt;

With Postgres "group by" is apparently quite expensive.  Using
Oracle and MySQL the same query and the same data, the first query
takes only 25% longer.  I think both of these DBMSes use a single
sequential scan of the data while Postgresql (using the explain query)
uses a multi pass process.  Yes I did a vacuum analyze.

Is there anything I as user can do.  Build some kind of index?

In absolut times "group by" is a killer.  All querries on a single
1M row table that include "group by" take about 3 minutes.  Oracle
takes about 40 seconds and MySQL about 25 seconds.

Here is what EXPLAIN shows.


alberch=# explain select count(*) from tyc_main group by nphoto;
NOTICE:  QUERY PLAN:

Aggregate  (cost=170404.22..175695.88 rows=105833 width=2)
  ->  Group  (cost=170404.22..173050.05 rows=1058332 width=2)
        ->  Sort  (cost=170404.22..170404.22 rows=1058332 width=2)
              ->  Seq Scan on tyc_main  (cost=0.00..49705.32 rows=1058332
width=2)

EXPLAIN
alberch=# explain select count(*) from tyc_main;
NOTICE:  QUERY PLAN:

Aggregate  (cost=52351.15..52351.15 rows=1 width=0)
  ->  Seq Scan on tyc_main  (cost=0.00..49705.32 rows=1058332 width=0)

EXPLAIN

When I run the queries the last takes 18 sec, the first
a little over 5 _minutes_

pgsql-general by date:

From: "Christian Marschalek"
Date: 02 May 2001, 20:07:21
Subject: Strange SERIAL / Sequence error

From: Andy Koch
Date: 02 May 2001, 20:18:03
Subject: Re: Cannot build PL/Perl ...

"group by" is quite expensive - Mailing list pgsql-general

Previous

Next