On Sat, 28 Dec 2024 15:57:44 +0100
Tomas Vondra <tomas@vondra.me> wrote:
>
> Not sure a simple memory limit like in the patch (which just adds
> memory accounting + OOM after hitting the limit) can work as anything
> but a the last safety measure. It seems to me the limit would have to
> be set to a value that's much higher than any backend would
> realistically need.
>
> The first challenge I envision is that without any feedback (either to
> the planner or executor), it may break queries quite easily. It just
> takes the planner to add one more sort / hash join / materialize
> (which it can do arbitrarily, as it has no concept of the memory
> limit), and now the query can't run.
>
> And secondly, there are allocations that we don't restrict by
> work_mem, but this memory limit would include them, ofc. The main
> example that I can think of is hash join, where we (currently) don't
> account for the BufFile arrays, and that can already lead to annoying
> OOM issues, see e.g. [1] [2] and [3] (I'm sure there are more threads
> about the issue).
Hi James!
While I don't have a detailed design in mind, I'd like to add a strong
+1 on the general idea that work_mem is hard to effectively use because
queries can vary so widely in how many nodes might need work memory.
I'd almost like to have two limits:
First, a hard per-connection limit which could be set very high - we
can track total memory usage across contexts inside of palloc and pfree
(and maybe this could also be exposed in pg_stat_activity for easy
visibility into a snapshot of memory use across all backends). If
palloc detects that an allocation would take the total over the hard
limit, then you just fail the palloc with an OOM. This protects
postgres from a memory leak in a single backend OOM'ing the whole
system and restarting the whole database; failing a single connection
is better than failing all of them.
Second, a soft per-connection "total_connection_work_mem_target" which
could be set lower. The planner can just look at the total number of
nodes that it expects to allocate work memory, divide the target by
this and then set the work_mem for that query. There should be a
reasonable floor (minimum) for work_mem - maybe the value of work_mem
itself becomes this and the new target doesn't do anything besides
increasing runtime work_mem.
Maybe even could do a "total_instance_work_mem_target" where it's
divided by the number of average active connections or something.
In practice this target idea doesn't guarantee anything about total work
memory used, but it's the tool I'd most like as an admin. And the
per-connection hard limit is the tool I'd like to have for protecting
myself against memory leaks or individual connections going bonkers and
killing all my connections for an instance restart.
-Jeremy