On Dec 28, 2024, at 12:26 PM, Jeremy Schneider <schneider@ardentperf.com> wrote:
While I don't have a detailed design in mind, I'd like to add a strong
+1 on the general idea that work_mem is hard to effectively use because
queries can vary so widely in how many nodes might need work memory.
I'd almost like to have two limits:
First, a hard per-connection limit which could be set very high - we
can track total memory usage across contexts inside of palloc and pfree
(and maybe this could also be exposed in pg_stat_activity for easy
visibility into a snapshot of memory use across all backends). If
palloc detects that an allocation would take the total over the hard
limit, then you just fail the palloc with an OOM. This protects
postgres from a memory leak in a single backend OOM'ing the whole
system and restarting the whole database; failing a single connection
is better than failing all of them.
Second, a soft per-connection "total_connection_work_mem_target" which
could be set lower. The planner can just look at the total number of
nodes that it expects to allocate work memory, divide the target by
this and then set the work_mem for that query. There should be a
reasonable floor (minimum) for work_mem - maybe the value of work_mem
itself becomes this and the new target doesn't do anything besides
increasing runtime work_mem.
Maybe even could do a "total_instance_work_mem_target" where it's
divided by the number of average active connections or something.
IMHO none of this will be very sane until we actually have cluster-level limits. One sudden burst in active connections and you still OOM the instance. And while we could have such a mechanism do something clever like dynamically lowering every sessions query_mem/work_mem/whatever, ultimately I think it would also need the ability to deny or delay sessions from starting new transactions.