Thread: Getting OOM errors from PostgreSQL

Getting OOM errors from PostgreSQL

From

Chris Lajoie

Date:

05 March 2016, 00:23:07

In my log I have been getting errors that look like this:
LOG:  could not fork new process for connection: Cannot allocate memory

This seems to only happen during periods of high activity. Shortly after the last one occurred I checked the free mem
andgot this: 
# free -h
              total        used        free      shared  buff/cache   available
Mem:           9.4G        737M        283M        2.1G        8.4G        6.3G
Swap:          2.0G        8.3M        2.0G

From this I see the vast majority of the 10GB of memory is being used by the OS cache. If I understand how it works
correctly,this means it is basically free memory because if an application needs it the kernel will free some of it for
use...is my understanding correct? 

Here are the memory-related settings I am using:

max_connections = 50
shared_buffers = 2000MB
maintenance_work_mem = 500MB
autovacuum_work_mem = 500MB
work_mem = 50MB
temp_buffers = 16MB

Can anyone help me determine what is causing postgresql to be unable to fork a new process because it can't allocate
memory?

Chris

Re: Getting OOM errors from PostgreSQL

From

Kevin Grittner

Date:

05 March 2016, 19:12:49

On Fri, Mar 4, 2016 at 3:23 PM, Chris Lajoie <chris@etriptrader.com> wrote:
> In my log I have been getting errors that look like this:
> LOG:  could not fork new process for connection: Cannot allocate memory
>
> This seems to only happen during periods of high activity.
> Shortly after the last one occurred I checked the free mem and got
> this:
> # free -h
>               total        used        free      shared    buff/cache   available
> Mem:           9.4G        737M        283M        2.1G          8.4G        6.3G
> Swap:          2.0G        8.3M        2.0G
>
> From this I see the vast majority of the 10GB of memory is being
> used by the OS cache. If I understand how it works correctly, this
> means it is basically free memory because if an application needs
> it the kernel will free some of it for use... is my understanding
> correct?

That depends.  In general, this report is light on useful detail;
you might want to review this page and fill in some of the missing
info, like OS version:

https://wiki.postgresql.org/wiki/Guide_to_reporting_problems

The most important missing information can be captured by running
these shell statements:

cat /proc/sys/vm/overcommit_memory
cat /proc/sys/vm/overcommit_ratio

If overcommit_memory is 2, you will get "out of memory" when an
allocation would push allocated memory past swap space (2GB in your
case) plus the percentage of machine RAM indicated by
overcommit_ratio.  Effectively, the difference between
overcommit_ratio and 100 is the percentage of RAM you are reserving
for the OS to keep things memory-resident (like buffers and cache).
A value of 20, for example, would mean that an allocation that
would reduce OS memory-resident pages below 80% of machine RAM will
trigger an "out of memory" error; so when you set
vm.overcommit_memory = 2 (which is generally a very good thing to
do when running PostgreSQL on the machine) you should raise
vm.overcommit_ratio = 80 or 90.  You don't want to totally blow
away OS cache, but you probably want to survive other demands on
RAM that bring it down quite a bit.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Getting OOM errors from PostgreSQL

From

Chris Lajoie

Date:

07 March 2016, 21:30:41

On 03/05/2016 09:12 AM, Kevin Grittner wrote:
> On Fri, Mar 4, 2016 at 3:23 PM, Chris Lajoie <chris@etriptrader.com> wrote:
>> In my log I have been getting errors that look like this:
>> LOG:  could not fork new process for connection: Cannot allocate memory
>>
>> This seems to only happen during periods of high activity.
>> Shortly after the last one occurred I checked the free mem and got
>> this:
>> # free -h
>>                total        used        free      shared    buff/cache   available
>> Mem:           9.4G        737M        283M        2.1G          8.4G        6.3G
>> Swap:          2.0G        8.3M        2.0G
>>
>>  From this I see the vast majority of the 10GB of memory is being
>> used by the OS cache. If I understand how it works correctly, this
>> means it is basically free memory because if an application needs
>> it the kernel will free some of it for use... is my understanding
>> correct?
> That depends.  In general, this report is light on useful detail;
> you might want to review this page and fill in some of the missing
> info, like OS version:
>
> https://wiki.postgresql.org/wiki/Guide_to_reporting_problems
Appologies for not providing more useful info. I will endeavor to do so
in the future.
> The most important missing information can be captured by running
> these shell statements:
>
> cat /proc/sys/vm/overcommit_memory
> cat /proc/sys/vm/overcommit_ratio
>
> If overcommit_memory is 2, you will get "out of memory" when an
> allocation would push allocated memory past swap space (2GB in your
> case) plus the percentage of machine RAM indicated by
> overcommit_ratio.  Effectively, the difference between
> overcommit_ratio and 100 is the percentage of RAM you are reserving
> for the OS to keep things memory-resident (like buffers and cache).
> A value of 20, for example, would mean that an allocation that
> would reduce OS memory-resident pages below 80% of machine RAM will
> trigger an "out of memory" error; so when you set
> vm.overcommit_memory = 2 (which is generally a very good thing to
> do when running PostgreSQL on the machine) you should raise
> vm.overcommit_ratio = 80 or 90.  You don't want to totally blow
> away OS cache, but you probably want to survive other demands on
> RAM that bring it down quite a bit.
Despite not having provided more information you seem to have gotten to
the crux of my problem anyway. As I was a little desperate to resolve
the problem, I ended up changing various things in an attempt to just
"make it go away" for now. I set overcommit_memory back to 0 and
shared_buffers = 500MB. I believe this has the effect of just allowing
the OS to cache things as it sees fit, and only reserving 500MB for
postgres to explicitly cache whatever it wants. But I did not have any
more OOM errors over the weekend.

Your explanation of how overcommit_ratio affects what the OS will allow
when overcommit_memory=2 is enlightening, but I am still not clear on
some specifics. For example I thought postgres allocates the shared
buffers on startup. If this is the case, why did I get OOM errors on
connections? Was that particular connection simply pushing the memory
past the amount established by the overcommit_ratio? overcommit_ratio
was at its default (which is apparently 50 looking at
/proc/sys/vm/overcommit_ratio).

At this point I am not sure if I should change anything... I have seen
conflicting advice at various places on the internet. Some suggest
(assuming this is a dedicated database system) to use up a large chunk
of RAM with shared_buffers, leaving enough for work_mem*max_connections
+ some extra. Others suggest allowing the OS to cache what it wants and
set shared_buffers to a smaller amount, allowing postgres to permanently
cache only the most frequently accessed smaller tables.

I would greatly appreciate your advice on this.
I am running PostgreSQL 9.4.6 on Linux kernel 3.10 (CentOS 7).

Thank you,
Chris

Re: Getting OOM errors from PostgreSQL

From

Scott Marlowe

Date:

09 March 2016, 20:00:37

On Mon, Mar 7, 2016 at 11:30 AM, Chris Lajoie <chris@etriptrader.com> wrote:
> On 03/05/2016 09:12 AM, Kevin Grittner wrote:
>>
>> On Fri, Mar 4, 2016 at 3:23 PM, Chris Lajoie <chris@etriptrader.com>
>> wrote:
>>>
>>> In my log I have been getting errors that look like this:
>>> LOG:  could not fork new process for connection: Cannot allocate memory
>>>
>>> This seems to only happen during periods of high activity.
>>> Shortly after the last one occurred I checked the free mem and got
>>> this:
>>> # free -h
>>>                total        used        free      shared    buff/cache
>>> available
>>> Mem:           9.4G        737M        283M        2.1G          8.4G
>>> 6.3G
>>> Swap:          2.0G        8.3M        2.0G
>>>
>>>  From this I see the vast majority of the 10GB of memory is being
>>> used by the OS cache. If I understand how it works correctly, this
>>> means it is basically free memory because if an application needs
>>> it the kernel will free some of it for use... is my understanding
>>> correct?
>>
>> That depends.  In general, this report is light on useful detail;
>> you might want to review this page and fill in some of the missing
>> info, like OS version:
>>
>> https://wiki.postgresql.org/wiki/Guide_to_reporting_problems
>
> Appologies for not providing more useful info. I will endeavor to do so in
> the future.
>>
>> The most important missing information can be captured by running
>> these shell statements:
>>
>> cat /proc/sys/vm/overcommit_memory
>> cat /proc/sys/vm/overcommit_ratio
>>
>> If overcommit_memory is 2, you will get "out of memory" when an
>> allocation would push allocated memory past swap space (2GB in your
>> case) plus the percentage of machine RAM indicated by
>> overcommit_ratio.  Effectively, the difference between
>> overcommit_ratio and 100 is the percentage of RAM you are reserving
>> for the OS to keep things memory-resident (like buffers and cache).
>> A value of 20, for example, would mean that an allocation that
>> would reduce OS memory-resident pages below 80% of machine RAM will
>> trigger an "out of memory" error; so when you set
>> vm.overcommit_memory = 2 (which is generally a very good thing to
>> do when running PostgreSQL on the machine) you should raise
>> vm.overcommit_ratio = 80 or 90.  You don't want to totally blow
>> away OS cache, but you probably want to survive other demands on
>> RAM that bring it down quite a bit.
>
> Despite not having provided more information you seem to have gotten to the
> crux of my problem anyway. As I was a little desperate to resolve the
> problem, I ended up changing various things in an attempt to just "make it
> go away" for now. I set overcommit_memory back to 0 and shared_buffers =
> 500MB. I believe this has the effect of just allowing the OS to cache things
> as it sees fit, and only reserving 500MB for postgres to explicitly cache
> whatever it wants. But I did not have any more OOM errors over the weekend.
>
> Your explanation of how overcommit_ratio affects what the OS will allow when
> overcommit_memory=2 is enlightening, but I am still not clear on some
> specifics. For example I thought postgres allocates the shared buffers on
> startup. If this is the case, why did I get OOM errors on connections? Was
> that particular connection simply pushing the memory past the amount
> established by the overcommit_ratio? overcommit_ratio was at its default
> (which is apparently 50 looking at /proc/sys/vm/overcommit_ratio).
>
> At this point I am not sure if I should change anything... I have seen
> conflicting advice at various places on the internet. Some suggest (assuming
> this is a dedicated database system) to use up a large chunk of RAM with
> shared_buffers, leaving enough for work_mem*max_connections + some extra.
> Others suggest allowing the OS to cache what it wants and set shared_buffers
> to a smaller amount, allowing postgres to permanently cache only the most
> frequently accessed smaller tables.
>
> I would greatly appreciate your advice on this.
> I am running PostgreSQL 9.4.6 on Linux kernel 3.10 (CentOS 7).


Note that running with overcommit_memory = 0 if you do start to run
out of memory, the oom killer will often kill the postmaster. This is
bad. If this happens make sure to kill all postgres children before
trying to restart the db, as starting a new postmaster with children
still running will instantly and permanently corrupt / destroy your
db.

The eventual state you want is to be able to run with overcommit = 2
and settings that prevent out of memory allocations.

Note too that if you never seem to actually run out of memory, but get
allocation errors, it can also be a lack of file handles (I think
that's what caused it in the past for me. Been a while) Point being
that you can get a failure to allocate memory when there's plenty of
memory due to other settings on your server.

Re: Getting OOM errors from PostgreSQL

From

Kevin Grittner

Date:

09 March 2016, 22:05:24

On Wed, Mar 9, 2016 at 11:00 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

> Note that running with overcommit_memory = 0 if you do start to run
> out of memory, the oom killer will often kill the postmaster. This is
> bad. If this happens make sure to kill all postgres children before
> trying to restart the db, as starting a new postmaster with children
> still running will instantly and permanently corrupt / destroy your
> db.

And even if it kills one of the "normal" backends (perhaps even the
one responsible for some excessive allocations) it causes a PANIC,
which is a crash and restart of the entire database service.  It
will usually do this on a reference to memory which appeared to be
successfully allocated, which can be fairly confusing.  With
overcommit_memory = 2 you usually get just a FATAL error (loss of
connection) on the one connection doing the memory allocation which
puts things over the top, with a dump of space used by memory
contexts in the log.

> The eventual state you want is to be able to run with overcommit = 2
> and settings that prevent out of memory allocations.

Agreed.

> Note too that if you never seem to actually run out of memory, but get
> allocation errors, it can also be a lack of file handles (I think
> that's what caused it in the past for me. Been a while) Point being
> that you can get a failure to allocate memory when there's plenty of
> memory due to other settings on your server.

Yes, I have seen that, too.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company