Re: Add the ability to limit the amount of memory that can be allocated to backends. - Mailing list pgsql-hackers

From James Hunter
Subject Re: Add the ability to limit the amount of memory that can be allocated to backends.
Date
Msg-id CAJVSvF6NRsnAqOYSkveiBERU+QyHV1vBrjd0ORFD+kw2N5-rEQ@mail.gmail.com
Whole thread Raw
In response to Re: Add the ability to limit the amount of memory that can be allocated to backends.  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: Add the ability to limit the amount of memory that can be allocated to backends.
List pgsql-hackers
Reviving this thread, because I am thinking about something related --
please ignore the "On Fri, Dec 27, 2024" date, this seems to be an
artifact of me re-sending the message, from the list archive. The
original message was from January 28, 2024.

On Fri, Dec 27, 2024 at 11:02 AM Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> Firstly, I agree with the goal of having a way to account for memory
> used by the backends, and also ability to enforce some sort of limit.
> It's difficult to track the memory at the OS level (interpreting RSS
> values is not trivial), and work_mem is not sufficient to enforce a
> backend-level limit, not even talking about a global limit.
>
> But as I said earlier, it seems quite strange to start by introducing
> some sort of global limit, combining memory for all backends. I do
> understand that the intent is to have such global limit in order to
> prevent issues with the OOM killer and/or not to interfere with other
> stuff running on the same machine. And while I'm not saying we should
> not have such limit, every time I wished to have a memory limit it was a
> backend-level one. Ideally a workmem-like limit that would "adjust" the
> work_mem values used by the optimizer (but that's not what this patch
> aims to do), or at least a backstop in case something goes wrong (say, a
> memory leak, OLTP application issuing complex queries, etc.).

I think what Tomas suggests is the right strategy. I am thinking of
something like:

1. Say we have a backend_work_mem limit. Then the total amount of
memory available on the entire system, for all queries, as work_mem,
would be backend_work_mem * max_connections.

2. We use this backend_work_mem to "adjust" work_mem values used by
the executor. (I don't care about the optimizer right now -- optimizer
just does its best to predict what will happen at runtime.)

At runtime, every node that uses work_mem currently checks its memory
usage against the session work_mem (and possibly hash_mem_multiplier)
GUC(s). Instead, now, every node would check against its node-specific
"adjusted" work_mem. If it exceeds this limit, it spills, using
existing logic.

In other words -- existing logic spills based on comparison to global
work_mem GUC. Let's make it spill, instead, based on comparison to an
operator-local "PlanState.work_mem" field.

And then let's set that "PlanState.work_mem" field based on a new
"backend_work_mem" GUC. Then no queries will run OOM (at least, not
due to work_mem -- we'll address other memory usages separately), so
they won't need to be canceled. Instead, they'll spill.

This strategy solves the ongoing problem of how to set work_mem, if
some queries have lots of operators and others don't -- now we just
set backend_work_mem, as a limit on the entire query's total work_mem.
And a bit of integration with the optimizer will allow us to
distribute the total backend_work_mem to individual execution nodes,
with the goal of minimizing spilling, without exceeding the
backend_work_mem limit.

Anyway, I revived this thread to see if there's interest in this sort
of strategy --

Thanks,
James



pgsql-hackers by date:

Previous
From: Jeremy Schneider
Date:
Subject: Re: RFC: Allow EXPLAIN to Output Page Fault Information
Next
From: Bruce Momjian
Date:
Subject: Re: [PATCHES] Post-special page storage TDE support