Home > mailing lists

Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 - Mailing list pgsql-performance

From	David Rowley
Subject	Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17
Date	April 11 05:42:54
Msg-id	CAApHDvq6dwbTsPbeEDgEJKnJw6e4Y3uE8ZZbnHCTLJY2hZX+Lw@mail.gmail.com Whole thread
In response to	Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 (Jeff Davis <pgsql@j-davis.com>)
Responses	Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17
List	pgsql-performance

Tree view

On Wed, 8 Apr 2026 at 09:58, Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Sat, 2026-04-04 at 13:21 +1300, David Rowley wrote:
> > A slight variation on this that I was thinking of would be to
> > introduce a MemoryPool struct that could be tagged onto a
> > MemoryContext which contains a pool_limit. A child MemoryContext
> > would, by default, inherit its parent's MemoryPool. On malloc/free,
> > if
> > the owning context has a non-null MemoryPool, the MemoryPool's
> > memory_allocated is updated. At a safe point in nodeAgg.c, we'd check
> > if the pool limit has been reached. I assume there's some simple
> > inline function that just checks if memory_allocated is greater than
> > pool_limit. Doing it this way would mean there's no need to
> > recursively propagate the mentioned child_mem_allocated field up the
> > hierarchy, as there is only a single field to update if the
> > MemoryPool
> > field is set.
>
> I like that idea, because it could also be a good place to hold a max
> block size for that tree of contexts. That's important to ensure that
> the block size is significantly less than work_mem.
>
> But it also means there's only one pool in any given subtree (unless
> you mean that we should make that work somehow), which is an awkward
> requirement, especially with MemoryContextSetParent().

I had imagined we'd have an Assert to ensure there's no other
MemoryPool in the context hierarchy. I guess it would be possible for
a MemoryPool to have a MemoryPool *next field and chain them, but
since we've no need for that right now, I imagine it's fine to
disallow that. If we did allow chaining, it might get complex as each
MemoryPool would be allocated in a different context. Removing items
from that linked list might be tricky.

I did expect that we'd have some function like; MemoryPool
*MemoryContextCreateMemoryPool(MemoryContext ctx, size_t bytes_limit)
and either have that recursively populate the MemoryContext's
memory_pool field for that and all child contexts with new MemoryPool
which is palloc'd in ctx, or just have that function insist that no
child contexts have been created yet.  When we create a new context
check if has a parent, and if that parent has a non-NULL memory_pool,
then set the memory_pool field to that value. However, on looking at
hash_create_memory() in nodeAgg.c, we create the hash_metacxt and
hash_tuplescxt with aggstate->ss.ps.state->es_query_cxt as the parent
of both. Since we won't want to add a MemoryPool to the parent of
those contexts, we might need some way to have an existing context
"join" an existing MemoryPool. That would mean an API more like:
MemoryPool *MemoryPoolCreate(size_t bytes_limit); then void
MemoryContextJoinPool(MemoryContext ctx, MemoryPool *pool);

All the code that currently does "+= mem_allocated" or "-=
mem_allocated" would need extra code to do "if (context->memory_pool
!= NULL) context->memory_pool->memory_allocated (+/-)= blksize;".
I.e., no looping up the hierarchy.

I imagined we'd document that a MemoryPool is designed to more easily
keep track of malloc'd memory for a given (or group of) MemoryContext
and all of its child contexts so that the total memory allocated can
easily be checked without recursively visiting the memory_used fields
in all child contexts. Also, that it is intended for short-lived
contexts. Any longer-lived usages would mean we'd have MemoryPools in
contexts that live longer than the query, or an executor node and that
would mean we'd have to code up the ability to have multiple
MemoryPools in the hierarchy, which seems more complex than what we
need today.

Maybe in the future there'd be some need to have MemoryPools with a
hard limit that ERRORs when it goes above a threshold, but that's not
what we need for nodeAgg.c. That seems to be more along the lines of
what Tomas was mentioning in regards to the "Add the ability to limit
the amount of memory that can be allocated to backends" thread.

David

pgsql-performance by date:

From: David Rowley
Date: 11 April, 05:09:45
Subject: Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17

From: ikramuddin
Date: 11 April, 20:55:31
Subject: how to switch user in postgres

Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 - Mailing list pgsql-performance

Previous

Next