Thread: Poor performance of group by query

Poor performance of group by query

From
"Jim C. Nasby"
Date:
Anyone have any ideas why this query would be so slow?

stats=# explain analyze SELECT work_units, min(raw_rank) AS rank  FROM Trank_work_overall GROUP BY work_units;
                                                          QUERY PLAN
       

-------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=1050.12..1085.98 rows=14347 width=16) (actual time=163149.981..163227.758 rows=17849 loops=1)
   ->  Seq Scan on trank_work_overall  (cost=0.00..804.41 rows=49141 width=16) (actual time=0.071..328.682 rows=49091
loops=1)
 Total runtime: 163296.212 ms

(3 rows)

stats=# \d Trank_work_overall
Table "pg_temp_1.trank_work_overall"
   Column   |  Type  | Modifiers
------------+--------+-----------
 raw_rank   | bigint |
 work_units | bigint |

stats=#

FreeBSD fritz.distributed.net 5.2.1-RELEASE FreeBSD 5.2.1-RELEASE #1:
Wed Apr  7 18:42:52 CDT 2004
root@fritz.distributed.net:/usr/obj/usr/src/sys/FRITZ  amd64

The machine is a dual opteron with 4G of memory. The query in question
was not hitting the disk at all. PostgreSQL 7.4.2 compiled with -O3.

Also, if I set enable_hashagg = false, it runs in less than a second.
--
Jim C. Nasby, Database Consultant                  jim@nasby.net
Member: Triangle Fraternity, Sports Car Club of America
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

Re: Poor performance of group by query

From
Tom Lane
Date:
"Jim C. Nasby" <jim@nasby.net> writes:
> Anyone have any ideas why this query would be so slow?

That seems very bizarre.  Would you be willing to send me a dump of the
table off-list?

            regards, tom lane

Re: Poor performance of group by query

From
Greg Stark
Date:

> stats=# explain analyze SELECT work_units, min(raw_rank) AS rank  FROM Trank_work_overall GROUP BY work_units;
>
> ...
>
>  raw_rank   | bigint |
>  work_units | bigint |


If you create a copy of the same table using regular integers does that run
fast? And a copy of the table using bigints is still slow like the original?

I know bigints are less efficient than integers because they're handled using
dynamically allocated memory. This especially bites aggregate functions. But I
don't see why it would be any slower for a hash aggregate than a regular
aggregate. It's a pretty gross amount of time for 18k records.

There was a thought a while back about making 64-bit machines handle 64-bit
datatypes like bigints without pointers. That would help on your Opteron.


--
greg

Re: Poor performance of group by query

From
Mark Kirkwood
Date:
It might be worth trying out a build with -O2, just to rule out any -O3
oddness.

regards

Mark

Jim C. Nasby wrote:

> PostgreSQL 7.4.2 compiled with -O3.
>
>
>
>