Re: Failing to allocate memory when I think it shouldn't - Mailing list pgsql-general
From | Thomas Ziegler |
---|---|
Subject | Re: Failing to allocate memory when I think it shouldn't |
Date | |
Msg-id | f8abb93a-5b03-4c0e-a69f-1f3cdfd4c4d4@holmsecurity.com Whole thread Raw |
In response to | Re: Failing to allocate memory when I think it shouldn't (Christoph Moench-Tegeder <cmt@burggraben.net>) |
Responses |
Re: Failing to allocate memory when I think it shouldn't
|
List | pgsql-general |
Hello Christoph, Thanks for your answer and the suggestions, it already helped me out a lot! On 2024-09-14 22:11, Christoph Moench-Tegeder wrote: > Hi, > > ## Thomas Ziegler (thomas.ziegler@holmsecurity.com): > > There's a lot of information missing here. Let's start from the top. > >> I have had my database killed by the kernel oom-killer. After that I >> set turned off memory over-committing and that is where things got weird. > What exactly did you set? When playing with vm.overcommit, did you > understand "Committed Address Space" and the workings of the > overcommit accounting? This is the document: > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/mm/overcommit-accounting.rst > Hint: when setting overcommit_memory=2 you might end up with way > less available adress space than you thought you would. Also keep > an eye on /proc/meminfo - it's sometimes hard to estimate "just off > your cuff" what's in memory and how it's mapped. (Also, anything > else on that machine which might hog memory?). I set overcommit_memory=2, but completely missed 'overcommit_ratio'. That is most probably why the database got denied the RAM a lot sooner than I expected. > Finally, there's this: >> 2024-09-12 05:18:36.073 UTC [1932776] LOG: background worker "parallel worker" (PID 3808076) exited with exit code 1 >> terminate called after throwing an instance of 'std::bad_alloc' >> what(): std::bad_alloc >> 2024-09-12 05:18:36.083 UTC [1932776] LOG: background worker "parallel worker" (PID 3808077) was terminated by signal6: Aborted > That "std::bad_alloc" sounds a lot like C++ and not like the C our > database is written in. My first suspicion would be that you're using > LLVM-JIT (unless you have other - maybe even your own - C++ extensions > in the database?) and that in itself can use a good chunk of memory. > And it looks like that exception bubbled up as a signal 6 (SIGABRT) > which made the process terminate immediately without any cleanup, > and after that the server has no other chance than to crash-restart. Except for pgAudit, I don't have any extensions, so it is probably the JIT. I had no idea there was a JIT, even it should have been obvious. Thanks for pointing this out! Is the memory the JIT takes limited by 'work_mem' or will it just take as much memory as it needs? > I recommend starting with understanding the actual memory limits > as set by your configuration (personally I believe that memory > overcommit is less evil than some people think). Have a close look > at /proc/meminfo and if possible disable JIT and check if it changes > anything. Also if possible try starting with only a few active > connections and increase load carefully once a steady state (in > terms of memory usage) has been reached. Yes, understanding the memory limits is what I was trying to do. I was questioning my understanding but it seems it was Linux that tripped me, or more like my lack of understanding there, rather than the database. Memory management and /proc/meminfo still manages to confuse me. Again, thanks for your help! Cheers, Thomas p.s.: To anybody who stumbles upon this in the future, if you set `overcommit_memory=2`, don't forget `overcommit_ratio`.
pgsql-general by date: