Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 - Mailing list pgsql-performance
| From | David Rowley |
|---|---|
| Subject | Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 |
| Date | |
| Msg-id | CAApHDvq5fMzhK7OwGOhOgqRF-wErGNsT75yie5LtobSir1HhnA@mail.gmail.com Whole thread |
| In response to | Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 (Scott Carey <scott.carey@algonomy.com>) |
| Responses |
Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17
|
| List | pgsql-performance |
On Tue, 31 Mar 2026 at 22:29, Scott Carey <scott.carey@algonomy.com> wrote: > A simple aggregate using array_agg goes significantly faster and faster the smaller the (work_mem * hash_mem_multiplier),with the same simple query plan: HashAggregate over a sequential scan. Changing to a simple aggregate,such as max() does not have this behavior and is always fast. Switching to another aggregate that grows in sizefor each element, such as json_agg or string_agg also does not have this behavior. If I add an order by clause insidearray_agg, performance significantly improves as it changes from a HashAggregate of a sequential scan to a GroupAggregateover a sort over a sequential scan. Something seems specifically broken with array_agg + HashAggregate. > > These queries are anywhere from 10x to 1000x slower on Postgres 17.9 than they were on Postgres 12.19 on production data. Some of our OLTP queries have gone from minutes to 6 hours to complete. I do not know if this happens on Postgres18, I can confirm it also happens on Postgres 16.8. I do not know about 13 through 15. I tried and failed to recreate this locally on 17.9. For me the json_agg query is slower than array_agg(). I tried making the table 10x bigger and still don't see the same issue. The one with more work_mem and fewer batches is always faster for me. Is the machine under a lot of memory pressure and swapping pages to disk? Maybe you need to consider running a lower work_mem setting. How much RAM is installed in this machine? > set hash_mem_multiplier = 2; > set work_mem = "100MB"; > > explain (analyze, buffers) select product_id, array_agg(region_id) from array_agg_test group by product_id; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------- > HashAggregate (cost=4274.00..4771.49 rows=49749 width=40) (actual time=4628.278..4643.765 rows=50000 loops=1) > Group Key: product_id > Batches: 1 Memory Usage: 55649kB > Buffers: shared hit=1274 > -> Seq Scan on array_agg_test (cost=0.00..3274.00 rows=200000 width=16) (actual time=0.030..16.694 rows=200000 loops=1) > Buffers: shared hit=1274 > Planning Time: 0.067 ms > Execution Time: 4648.698 ms > Below, note json_agg does not have this problem: > > set hash_mem_multiplier = 2; > set work_mem = "500MB"; > > explain (analyze, buffers) select product_id, json_agg(region_id) from array_agg_test group by product_id; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------- > HashAggregate (cost=4274.00..4895.86 rows=49749 width=40) (actual time=110.975..122.781 rows=50000 loops=1) > Group Key: product_id > Batches: 1 Memory Usage: 67089kB > Buffers: shared read=1274 > -> Seq Scan on array_agg_test (cost=0.00..3274.00 rows=200000 width=16) (actual time=0.034..17.659 rows=200000 loops=1) > Buffers: shared read=1274 > Planning: > Buffers: shared hit=10 > Planning Time: 0.054 ms > Execution Time: 124.929 ms What changed here apart from the aggregate function? Why are the buffers being read on this run and not the previous? Same machine? Was there a restart? json_agg allocates slightly more memory per agg state than array_agg. You can see that in the reported Hash Aggregate memory usage and I expect the actual transition function call between array_agg() and json_agg() not to differ very much in cost, so it very much feels like something else is going on here. David
pgsql-performance by date: