Re: Failing to allocate memory when I think it shouldn't - Mailing list pgsql-general

From Thomas Ziegler
Subject Re: Failing to allocate memory when I think it shouldn't
Date
Msg-id f8abb93a-5b03-4c0e-a69f-1f3cdfd4c4d4@holmsecurity.com
Whole thread Raw
In response to Re: Failing to allocate memory when I think it shouldn't  (Christoph Moench-Tegeder <cmt@burggraben.net>)
Responses Re: Failing to allocate memory when I think it shouldn't
List pgsql-general
Hello Christoph,

Thanks for your answer and the suggestions, it already helped me out a lot!

On 2024-09-14 22:11, Christoph Moench-Tegeder wrote:
> Hi,
>
> ## Thomas Ziegler (thomas.ziegler@holmsecurity.com):
>
> There's a lot of information missing here. Let's start from the top.
>
>> I have had my database killed by the kernel oom-killer. After that I
>> set turned off memory over-committing and that is where things got weird.
> What exactly did you set? When playing with vm.overcommit, did you
> understand "Committed Address Space" and the workings of the
> overcommit accounting? This is the document:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/Documentation/mm/overcommit-accounting.rst
> Hint: when setting overcommit_memory=2 you might end up with way
> less available adress space than you thought you would. Also keep
> an eye on /proc/meminfo - it's sometimes hard to estimate "just off
> your cuff" what's in memory and how it's mapped. (Also, anything
> else on that machine which might hog memory?).

I set overcommit_memory=2, but completely missed 'overcommit_ratio'. 
That is most probably why the database got denied the RAM a lot sooner 
than I expected.

> Finally, there's this:
>> 2024-09-12 05:18:36.073 UTC [1932776] LOG:  background worker "parallel worker" (PID 3808076) exited with exit code
1
>> terminate called after throwing an instance of 'std::bad_alloc'
>>    what():  std::bad_alloc
>> 2024-09-12 05:18:36.083 UTC [1932776] LOG:  background worker "parallel worker" (PID 3808077) was terminated by
signal6: Aborted
 
> That "std::bad_alloc" sounds a lot like C++ and not like the C our
> database is written in. My first suspicion would be that you're using
> LLVM-JIT (unless you have other - maybe even your own - C++ extensions
> in the database?) and that in itself can use a good chunk of memory.
> And it looks like that exception bubbled up as a signal 6 (SIGABRT)
> which made the process terminate immediately without any cleanup,
> and after that the server has no other chance than to crash-restart.

Except for pgAudit, I don't have any extensions, so it is probably the 
JIT. I had no idea there was a JIT, even it should have been obvious. 
Thanks for pointing this out!

Is the memory the JIT takes limited by 'work_mem' or will it just take 
as much memory as it needs?

> I recommend starting with understanding the actual memory limits
> as set by your configuration (personally I believe that memory
> overcommit is less evil than some people think). Have a close look
> at /proc/meminfo and if possible disable JIT and check if it changes
> anything. Also if possible try starting with only a few active
> connections and increase load carefully once a steady state (in
> terms of memory usage) has been reached.

Yes, understanding the memory limits is what I was trying to do.
I was questioning my understanding but it seems it was Linux that 
tripped me,
or more like my lack of understanding there, rather than the database.
Memory management and /proc/meminfo still manages to confuse me.

Again, thanks for your help!

Cheers,
Thomas

p.s.: To anybody who stumbles upon this in the future,
if you set `overcommit_memory=2`, don't forget `overcommit_ratio`.




pgsql-general by date:

Previous
From: Muhammad Usman Khan
Date:
Subject: Re: update faster way
Next
From: Alvaro Herrera
Date:
Subject: Re: update faster way