Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 - Mailing list pgsql-performance
| From | David Rowley |
|---|---|
| Subject | Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 |
| Date | |
| Msg-id | CAApHDvq6dwbTsPbeEDgEJKnJw6e4Y3uE8ZZbnHCTLJY2hZX+Lw@mail.gmail.com Whole thread |
| In response to | Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 (Jeff Davis <pgsql@j-davis.com>) |
| Responses |
Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17
|
| List | pgsql-performance |
On Wed, 8 Apr 2026 at 09:58, Jeff Davis <pgsql@j-davis.com> wrote: > > On Sat, 2026-04-04 at 13:21 +1300, David Rowley wrote: > > A slight variation on this that I was thinking of would be to > > introduce a MemoryPool struct that could be tagged onto a > > MemoryContext which contains a pool_limit. A child MemoryContext > > would, by default, inherit its parent's MemoryPool. On malloc/free, > > if > > the owning context has a non-null MemoryPool, the MemoryPool's > > memory_allocated is updated. At a safe point in nodeAgg.c, we'd check > > if the pool limit has been reached. I assume there's some simple > > inline function that just checks if memory_allocated is greater than > > pool_limit. Doing it this way would mean there's no need to > > recursively propagate the mentioned child_mem_allocated field up the > > hierarchy, as there is only a single field to update if the > > MemoryPool > > field is set. > > I like that idea, because it could also be a good place to hold a max > block size for that tree of contexts. That's important to ensure that > the block size is significantly less than work_mem. > > But it also means there's only one pool in any given subtree (unless > you mean that we should make that work somehow), which is an awkward > requirement, especially with MemoryContextSetParent(). I had imagined we'd have an Assert to ensure there's no other MemoryPool in the context hierarchy. I guess it would be possible for a MemoryPool to have a MemoryPool *next field and chain them, but since we've no need for that right now, I imagine it's fine to disallow that. If we did allow chaining, it might get complex as each MemoryPool would be allocated in a different context. Removing items from that linked list might be tricky. I did expect that we'd have some function like; MemoryPool *MemoryContextCreateMemoryPool(MemoryContext ctx, size_t bytes_limit) and either have that recursively populate the MemoryContext's memory_pool field for that and all child contexts with new MemoryPool which is palloc'd in ctx, or just have that function insist that no child contexts have been created yet. When we create a new context check if has a parent, and if that parent has a non-NULL memory_pool, then set the memory_pool field to that value. However, on looking at hash_create_memory() in nodeAgg.c, we create the hash_metacxt and hash_tuplescxt with aggstate->ss.ps.state->es_query_cxt as the parent of both. Since we won't want to add a MemoryPool to the parent of those contexts, we might need some way to have an existing context "join" an existing MemoryPool. That would mean an API more like: MemoryPool *MemoryPoolCreate(size_t bytes_limit); then void MemoryContextJoinPool(MemoryContext ctx, MemoryPool *pool); All the code that currently does "+= mem_allocated" or "-= mem_allocated" would need extra code to do "if (context->memory_pool != NULL) context->memory_pool->memory_allocated (+/-)= blksize;". I.e., no looping up the hierarchy. I imagined we'd document that a MemoryPool is designed to more easily keep track of malloc'd memory for a given (or group of) MemoryContext and all of its child contexts so that the total memory allocated can easily be checked without recursively visiting the memory_used fields in all child contexts. Also, that it is intended for short-lived contexts. Any longer-lived usages would mean we'd have MemoryPools in contexts that live longer than the query, or an executor node and that would mean we'd have to code up the ability to have multiple MemoryPools in the hierarchy, which seems more complex than what we need today. Maybe in the future there'd be some need to have MemoryPools with a hard limit that ERRORs when it goes above a threshold, but that's not what we need for nodeAgg.c. That seems to be more along the lines of what Tomas was mentioning in regards to the "Add the ability to limit the amount of memory that can be allocated to backends" thread. David
pgsql-performance by date: