Re: Add the ability to limit the amount of memory that can be allocated to backends. - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Add the ability to limit the amount of memory that can be allocated to backends.
Date
Msg-id 6be20166-734d-432f-a2d2-40f405517cd1@vondra.me
Whole thread Raw
In response to Re: Add the ability to limit the amount of memory that can be allocated to backends.  (Jim Nasby <jnasby@upgrade.com>)
Responses Re: Add the ability to limit the amount of memory that can be allocated to backends.
Re: Add the ability to limit the amount of memory that can be allocated to backends.
List pgsql-hackers
On 1/2/25 22:09, Jim Nasby wrote:
> 
>> On Dec 31, 2024, at 5:41 PM, Tomas Vondra <tomas@vondra.me> wrote:
>>
>> On 12/31/24 21:46, Jim Nasby wrote:
>>> On Dec 30, 2024, at 7:05 PM, James Hunter <james.hunter.pg@gmail.com>
>>> wrote:
>>>>
>>>> On Sat, Dec 28, 2024 at 11:24 PM Jim Nasby <jnasby@upgrade.com> wrote:
>>>>>
>>>>> IMHO none of this will be very sane until we actually have cluster-
>>>>> level limits. One sudden burst in active connections and you still
>>>>> OOM the instance.
>>>>
>>>> Fwiw, PG does support "max_connections" GUC, so a backend/connection -
>>>> level limit, times "max_connections", yields a cluster-level limit.
>>>
>>> max_connections is useless here, for two reasons:
>>>
>>> 1. Changing it requires a restart. That’s at *best* a real PITA in
>>> production. [1]
>>> 2. It still doesn’t solve the actual problem. Unless your workload
>>> *and* your data are extremely homogeneous you can’t simply limit the
>>> number of connections and call it a day. A slight change in incoming
>>> queries, OR in the data that the queries are looking at and you go
>>> from running fine to meltdown. You don’t even need a plan flip for
>>> this to happen, just the same plan run at the same rate but now
>>> accessing more data than before.
>>>
>>
>> I really don't follow your argument ...
>>
>> Yes, changing max_connections requires a restart - so what? AFAIK the
>> point James was making is that if you multiply max_connections by the
>> per-backend limit, you get a cluster-wide limit. And presumably the
>> per-backend limit would be a GUC not requiring a restart.
>>
>> Yes, high values of max_connections are problematic. I don't see how a
>> global limit would fundamentally change that. In fact, it could
>> introduce yet more weird failures because some unrelated backend did
>> something weird.
> 
> That’s basically my argument for having workload management. If a system
> becomes loaded enough for the global limit to start kicking in it’s
> likely that query response time is increasing, which means you will soon
> have more and more active backends trying to run queries. That’s just
> going to make the situation even worse. You’d either have to start
> trying to “take memory away” from already running backends or backends
> that are just starting would have such a low limit as to cause them to
> spill very quickly, creating further load on the system.
> 

I'm not opposed to having a some sort of "workload management" (similar
to what's available in some databases), but my guess is that's (at
least) an order of magnitude more complex than introducing the memory
limit discussed here. I can only guess, because no one really explained
what would it include, how would it be implemented. Which makes it easy
to dream about a solution that would fix the problem ...

What I'm afraid will happen everyone mostly agrees a comprehensive
workload management system would be better than a memory limit (be it
per-backend or a global one). But without a workable proposal how to
implement it no one ends up working on it. And no one gets to work on a
memory limit because the imagined workload management would be better.
So we get nothing ...

FWIW I have a hard time imagining a workload management system without
some sort of a memory limit.

>> FWIW I'm not opposed to having some global memory limit, but as I
>> explained earlier, I don't see a way to do that sensibly without having
>> a per-backend limit first. Because if you have a global limit, a single
>> backend consuming memory could cause all kinds of weird failures in
>> random other backends.
> 
> I agree, but I’m also not sure how much a per-backend limit would
> actually help on its own, especially in OLTP environments.
> 

So what if it doesn't help in every possible case? It'd be valuable even
for OLAP/mixed workloads with sensible max_connection values, because
it'd allow setting the work_mem more aggressively.

>>> Most of what I’ve seen on this thread is discussing ways to
>>> *optimize* how much memory the set of running backends can consume.
>>> Adjusting how you slice the memory pie across backends, or even
>>> within a single backend, is optimization. While that’s a great goal
>>> that I do support, it will never fully fix the problem. At some point
>>> you need to either throw your hands in the air and start tossing
>>> memory errors, because you don’t have control over how much work is
>>> being thrown at the engine. The only way that the engine can exert
>>> control over that would be to hold new transactions from starting
>>> when the system is under duress (ie, workload management). While
>>> workload managers can be quite sophisticated (aka, complex), the nice
>>> thing about limiting this scope to work_mem, and only as a means to
>>> prevent complete overload, is that the problem becomes a lot simpler
>>> since you’re only looking at one metric and not trying to support any
>>> kind of priority system. The only fanciness I think an MVP would need
>>> is a GUC to control how long a transaction can sit waiting before it
>>> throws an error. Frankly, that sounds a lot less complex and much
>>> easier for DBAs to adjust than trying to teach the planner how to
>>> apportion out per-node work_mem limits.
>>>
>>> As I said, I’m not opposed to optimizations, I just think they’re
>>> very much cart-before-the-horse.
>>>
>>
>> What optimization? I didn't notice anything like that. I don't see how
>> "adjusting how you slice the memory pie across backends" counts as an
>> optimization. I mean, that's exactly what a memory limit is meant to do.
>>
>> Similarly, there was a proposal to do planning with work_mem, and then
>> go back and adjust the per-node limits to impose a global limit. That
>> does not seem like an optimization either ... (more an opposite of it).
> 
> It’s optimization in that you’re trying to increase how many active
> backends you can have before getting memory errors. It’s an alternative
> to throwing more memory at the problem or limiting the rate of incoming
> workload.
> 

I don't see it that way. I see it as a "safety measure" for queries that
happen to be more complex than expected, or something. Yes, it may allow
you to use a higher max_connections, or higher work_mem. But it's still
just a safety measure.

>>> 1: While it’d be a lot of work to make max_connections dynamic one
>>> thing we could do fairly easily would be to introduce another GUC
>>> (max_backends?) that actually controls the total number of allowed
>>> backends for everything. The sum of max_backends + autovac workers +
>>> background workers + whatever else I’m forgetting would have to be
>>> less than that. The idea here is that you’d normally run with
>>> max_connections set significantly lower than max_backends. That means
>>> that if you need to adjust any of these GUCs (other than
>>> max_backends) you don’t need to restart - the new limits would just
>>> apply to new connection requests.
>>
>> I don't quite understad how max_backends helps with anything except
>> allowing to change the limit of connections without a restart, or why
>> would it be needed for introducing a memory limit. To me those seem very
>> much like two separate features.
> 
> It’s related to this because the number of active backends is directly
> related to memory consumption. Yet because max_connections requires a
> restart it’s very hard to actually manage how many active backends you
> have. Your only option is a single-point connection pool, but that
> introduces its own problems.
> 

I think you need to choose. Either problems with many active backends,
or problems with connection pools ...

> That said, I do think a workload manager would be more effective than
> trying to limit total connections.

The "workload management" concept is so abstract I find it very
difficult to discuss without much more detail about how would it
actually work / be implemented.

I believe implementing some rudimentary "global" memory accounting would
not be *that* hard (possibly along the lines of the patches early in
this thread), and adding some sort of dynamic connection limit would not
be much harder I think. But then comes the hard part of actually doing
the "workload management" part, which seems pretty comparable to what a
QoS / scheduler needs to do. With all the weird corner cases.


regards

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Log a warning in pg_createsubscriber for max_slot_wal_keep_size
Next
From: Peter Smith
Date:
Subject: Re: Logical Replication of sequences