Thread: "group by" is quite expensive
What can I do to speed up queries like the following select count(*) from ttt group by xxx; Using Postgres 7.1 on Solaris 8 the above query on a 1M row database takes 6 times longer than the following select count(*) from ttt; With Postgres "group by" is apparently quite expensive. Using Oracle and MySQL the same query and the same data, the first query takes only 25% longer. I think both of these DBMSes use a single sequential scan of the data while Postgresql (using the explain query) uses a multi pass process. Yes I did a vacuum analyze. Is there anything I as user can do. Build some kind of index? In absolut times "group by" is a killer. All querries on a single 1M row table that include "group by" take about 3 minutes. Oracle takes about 40 seconds and MySQL about 25 seconds. Here is what EXPLAIN shows. alberch=# explain select count(*) from tyc_main group by nphoto; NOTICE: QUERY PLAN: Aggregate (cost=170404.22..175695.88 rows=105833 width=2) -> Group (cost=170404.22..173050.05 rows=1058332 width=2) -> Sort (cost=170404.22..170404.22 rows=1058332 width=2) -> Seq Scan on tyc_main (cost=0.00..49705.32 rows=1058332 width=2) EXPLAIN alberch=# explain select count(*) from tyc_main; NOTICE: QUERY PLAN: Aggregate (cost=52351.15..52351.15 rows=1 width=0) -> Seq Scan on tyc_main (cost=0.00..49705.32 rows=1058332 width=0) EXPLAIN When I run the queries the last takes 18 sec, the first a little over 5 _minutes_
"Albertson, Chris" <CAlbertson@primeadvantage.com> writes: > What can I do to speed up queries like the following > select count(*) from ttt group by xxx; Not much at the user level, I'm afraid. Currently GROUP BY requires a sort on the grouping column, and that's expensive for a big table. There has been talk of reimplementing GROUP BY to avoid sorting (instead maintaining a hash table with one entry for each distinct value of the grouping column), and it might get done for 7.2 or so. regards, tom lane