Re: Limiting memory allocation - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: Limiting memory allocation |
Date | |
Msg-id | CAOuzzgquG2+ROK6iMJNOuqWP7V7cYYzeD0Wpneeo2QhDDkcQoQ@mail.gmail.com Whole thread Raw |
In response to | Re: Limiting memory allocation (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Limiting memory allocation
Re: Limiting memory allocation |
List | pgsql-hackers |
Greetings,
On Tue, May 17, 2022 at 18:12 Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jan Wieck <jan@wi3ck.info> writes:
> On 5/17/22 15:42, Stephen Frost wrote:
>> Thoughts?
> Using cgroups one can actually force a certain process (or user, or
> service) to use swap if and when that service is using more memory than
> it was "expected" to use.
I wonder if we shouldn't just provide documentation pointing to OS-level
facilities like that one. The kernel has a pretty trivial way to check
the total memory used by a process. We don't: it'd require tracking total
space used in all our memory contexts, and then extracting some number out
of our rear ends for allocations made directly from malloc. In short,
anything we do here will be slow and unreliable, unless you want to depend
on platform-specific things like looking at /proc/self/maps.
This isn’t actually a solution though and that’s the problem- you end up using swap but if you use more than “expected” the OOM killer comes in and happily blows you up anyway. Cgroups are containers and exactly what kube is doing.
I agree with the general statement that it would be better for the kernel to do this, and a patch was written for it but then rejected by the kernel folks. I’m hoping to push on that with the kernel developers but they seemed pretty against this and that’s quite unfortunate.
As for the performance concern and other mallocs: For the former, thanks to our memory contexts, I don’t expect it to be all that much of an issue as the actual allocations we do aren’t all that frequently done and apparently a relatively trivial implementation was done and performance was tested and it was claimed that there was basically negligible impact. Sadly that code isn’t open (yet… this is under discussion, supposedly) but my understanding was that they just used a simple bit of shared memory to keep the count. As for the latter, we could at least review the difference between our count and actual memory allocated and see how big that difference is in some testing (which might be enlightening anyway..) and review our direct mallocs and see if there’s a real concern there. Naturally this approach would necessitate some amount less than the total amount of memory available being used by PG anyway, but that could certainly be desirable in some scenarios where there are other processes running and to ensure not all of the filesystem cache is ejected.
ulimit might be interesting to check into as well. The last time I
looked, it wasn't too helpful for this on Linux, but that was years ago.
Unfortunately I really don’t think anything here has materially changed in a way which would help us. This would also apply across all of PG’s processes and I would think it’d be nice to differentiate between user backends running away and sucking up a ton of memory vs backend processes that shouldn’t be constrained in this way.
Thanks,
Stephen
pgsql-hackers by date: