Home > mailing lists

Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 - Mailing list pgsql-performance

From	Tom Lane
Subject	Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17
Date	April 3 22:24:15
Msg-id	3825075.1775244255@sss.pgh.pa.us Whole thread Raw
In response to	Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17
List	pgsql-performance

Tree view

I wrote:
> So the main problem here is we're leaking the arrays made by
> array_agg, and a secondary problem is that that drives the
> cost of hash_agg_check_limits to an unacceptable level.

After further study I no longer think there's a leak.  This
test query involves 50000 GROUP BY groups, and we have an
array being accumulated for each one.  The difference between
the fast array_agg implementation and the slow one is just
that the fast one keeps all its working state in the aggregate
context, while the slow one makes a separate sub-context for
each "expanded array".  v12 creates 50000 "expanded arrays"
too, but it's not noticeably slow.

So the problem is exactly that repeating hash_agg_check_limits
each time we start a new group is O(N^2), because if there is
a sub-context for each group then MemoryContextMemAllocated
requires O(N) time.

The other little problem with this approach is acknowledged in the
comments:

 * ... Allocations may happen without adding new groups (for instance,
 * if the transition state size grows), so this check is imperfect.

So really the whole thing is kind of unsatisfactory.
I'm not seeing cheap ways to make it better though.

A very quick and dirty idea is to tell MemoryContextMemAllocated
not to recurse, which would restore it to constant-time.  But
that would make the results much less accurate in cases like
this one.

            regards, tom lane

pgsql-performance by date:

From: Frank Heikens
Date: 03 April, 07:19:34
Subject: Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17

From: Jeff Davis
Date: 03 April, 22:24:49
Subject: Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17

Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 - Mailing list pgsql-performance

Previous

Next