Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17 - Mailing list pgsql-performance

From Jeff Davis
Subject Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17
Date
Msg-id ee1f6701d76b3ef5f511d494c337bc3d76f8bfab.camel@j-davis.com
Whole thread
In response to Re: Significant performance issues with array_agg() + HashAggregate plans on Postgres 17  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-performance
On Sat, 2026-04-11 at 14:42 +1200, David Rowley wrote:
> I had imagined we'd have an Assert to ensure there's no other
> MemoryPool in the context hierarchy. I guess it would be possible for
> a MemoryPool to have a MemoryPool *next field and chain them, but
> since we've no need for that right now, I imagine it's fine to
> disallow that.

The thread started with a user-defined aggregate making a memory
context per group. If it was doing some more complex things with
MemoryContextSetParent, are we sure we can just disallow it? It's hard
for me to say for sure what a user-defined aggregate might want to do.

> Since we won't want to add a MemoryPool to the parent of
> those contexts, we might need some way to have an existing context
> "join" an existing MemoryPool. That would mean an API more like:
> MemoryPool *MemoryPoolCreate(size_t bytes_limit); then void
> MemoryContextJoinPool(MemoryContext ctx, MemoryPool *pool);

Right.

> All the code that currently does "+= mem_allocated" or "-=
> mem_allocated" would need extra code to do "if (context->memory_pool
> != NULL) context->memory_pool->memory_allocated (+/-)= blksize;".
> I.e., no looping up the hierarchy.

OK.

> I imagined we'd document that a MemoryPool is designed to more easily
> keep track of malloc'd memory for a given (or group of) MemoryContext
> and all of its child contexts so that the total memory allocated can
> easily be checked without recursively visiting the memory_used fields
> in all child contexts. Also, that it is intended for short-lived
> contexts. Any longer-lived usages would mean we'd have MemoryPools in
> contexts that live longer than the query, or an executor node and
> that
> would mean we'd have to code up the ability to have multiple
> MemoryPools in the hierarchy, which seems more complex than what we
> need today.

Yeah, I hope we don't need to go that far.

> Maybe in the future there'd be some need to have MemoryPools with a
> hard limit that ERRORs when it goes above a threshold, but that's not
> what we need for nodeAgg.c. That seems to be more along the lines of
> what Tomas was mentioning in regards to the "Add the ability to limit
> the amount of memory that can be allocated to backends" thread.

Agreed.


Attached v2-0001 simply tracks the totals for all contexts, and works
across MemoryContextSetParent(). I was (and still am) slightly worried
that it will cause a regression, but some basic pgbench runs didn't
show any slowdown. Can you suggest some settings that might focus more
on allocation performance?

Assuming we only want to track for a subtree, then the two approaches
suggested are:

1. Have some "invalid" marker that only tracks the memory upward as far
as it's needed/enabled, then stops. This adds a bit of complexity
because:
  a. to enable it after the context is created, we need a new memory
context method to give the current allocated total so we can properly
initialize the size
  b. when doing MemoryContextSetParent(), we need to subtract from the
old subtree and add to the new subtree, and if the current subtree
isn't already tracked, then we need to recalculate

2. The memory pools as you suggest. The complexity here is:
  a. in MemoryContextSetParent() for the same reason as above
  b. mixing subtrees gets messy, and I'm not sure we can just disallow
that. Does that mean MemoryContextSetParent() would just fail?
  c. joining in to an existing pool -- not terribly complex, but adds
to the API

I think if we are taking on the complexity with memory pools, we should
do so with the idea that they'll be useful beyond just optimizing the
size calculation. For instance, carrying around information that it's
part of a "work mem" context, which can do things like limit the max
block size.

Regards,
    Jeff Davis



Attachment

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: how to switch user in postgres