Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 - Mailing list pgsql-performance

From Scott Carey
Subject Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17
Date
Msg-id CA+vubOH1mcEbsHnkm5L7aJWR+eeiVpOnGfcdWax9td6pyOkMww@mail.gmail.com
Whole thread Raw
In response to Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17
List pgsql-performance


On Thu, Apr 2, 2026, 10:38 Tom Lane <tgl@sss.pgh.pa.us> wrote:

I did some bisecting using the attached simplified test case, and found
that the query execution time jumps from circa 60ms to circa 7500ms here:

1f39bce021540fde00990af55b4432c55ef4b3c7 is the first bad commit
commit 1f39bce021540fde00990af55b4432c55ef4b3c7
Author: Jeff Davis <jdavis@postgresql.org>
Date:   Wed Mar 18 15:42:02 2020 -0700

    Disk-based Hash Aggregation.

    While performing hash aggregation, track memory usage when adding new
    groups to a hash table. If the memory usage exceeds work_mem, enter
    "spill mode".

(Times quoted are on a Mac M4 Pro, but in assert-enabled builds so
maybe not directly comparable to production.)

I'm bemused as to why: the test case has work_mem set high enough that
we shouldn't be triggering spill mode, so why did this change affect
it at all?

Even stranger, the more spills induced via smaller work_mem the faster it goes.

This suggests something getting more expensive as the hash table gets larger.  Significantly more, like O(n^2) or worse.


I wonder if it is the size of the hash table itself (entry count) or the size of the entries?  Does a table with one row matching each entry have the problem or only when the hash bucket is hit multiple times and values aggregated?   Why is the reported size used so much larger with the custom function?

I have some experiments in mind that could answer some of these. 

Tracking hash table memory usage dynamically can be tricky.   I would imagine that user defined aggregates make it more difficult.


                        regards, tom lane

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17
Next
From: Tom Lane
Date:
Subject: Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17