Re: Getting OOM errors from PostgreSQL - Mailing list pgsql-admin

From Chris Lajoie
Subject Re: Getting OOM errors from PostgreSQL
Date
Msg-id 56DDC8CB.8050204@etriptrader.com
Whole thread Raw
In response to Re: Getting OOM errors from PostgreSQL  (Kevin Grittner <kgrittn@gmail.com>)
Responses Re: Getting OOM errors from PostgreSQL
List pgsql-admin
On 03/05/2016 09:12 AM, Kevin Grittner wrote:
> On Fri, Mar 4, 2016 at 3:23 PM, Chris Lajoie <chris@etriptrader.com> wrote:
>> In my log I have been getting errors that look like this:
>> LOG:  could not fork new process for connection: Cannot allocate memory
>>
>> This seems to only happen during periods of high activity.
>> Shortly after the last one occurred I checked the free mem and got
>> this:
>> # free -h
>>                total        used        free      shared    buff/cache   available
>> Mem:           9.4G        737M        283M        2.1G          8.4G        6.3G
>> Swap:          2.0G        8.3M        2.0G
>>
>>  From this I see the vast majority of the 10GB of memory is being
>> used by the OS cache. If I understand how it works correctly, this
>> means it is basically free memory because if an application needs
>> it the kernel will free some of it for use... is my understanding
>> correct?
> That depends.  In general, this report is light on useful detail;
> you might want to review this page and fill in some of the missing
> info, like OS version:
>
> https://wiki.postgresql.org/wiki/Guide_to_reporting_problems
Appologies for not providing more useful info. I will endeavor to do so
in the future.
> The most important missing information can be captured by running
> these shell statements:
>
> cat /proc/sys/vm/overcommit_memory
> cat /proc/sys/vm/overcommit_ratio
>
> If overcommit_memory is 2, you will get "out of memory" when an
> allocation would push allocated memory past swap space (2GB in your
> case) plus the percentage of machine RAM indicated by
> overcommit_ratio.  Effectively, the difference between
> overcommit_ratio and 100 is the percentage of RAM you are reserving
> for the OS to keep things memory-resident (like buffers and cache).
> A value of 20, for example, would mean that an allocation that
> would reduce OS memory-resident pages below 80% of machine RAM will
> trigger an "out of memory" error; so when you set
> vm.overcommit_memory = 2 (which is generally a very good thing to
> do when running PostgreSQL on the machine) you should raise
> vm.overcommit_ratio = 80 or 90.  You don't want to totally blow
> away OS cache, but you probably want to survive other demands on
> RAM that bring it down quite a bit.
Despite not having provided more information you seem to have gotten to
the crux of my problem anyway. As I was a little desperate to resolve
the problem, I ended up changing various things in an attempt to just
"make it go away" for now. I set overcommit_memory back to 0 and
shared_buffers = 500MB. I believe this has the effect of just allowing
the OS to cache things as it sees fit, and only reserving 500MB for
postgres to explicitly cache whatever it wants. But I did not have any
more OOM errors over the weekend.

Your explanation of how overcommit_ratio affects what the OS will allow
when overcommit_memory=2 is enlightening, but I am still not clear on
some specifics. For example I thought postgres allocates the shared
buffers on startup. If this is the case, why did I get OOM errors on
connections? Was that particular connection simply pushing the memory
past the amount established by the overcommit_ratio? overcommit_ratio
was at its default (which is apparently 50 looking at
/proc/sys/vm/overcommit_ratio).

At this point I am not sure if I should change anything... I have seen
conflicting advice at various places on the internet. Some suggest
(assuming this is a dedicated database system) to use up a large chunk
of RAM with shared_buffers, leaving enough for work_mem*max_connections
+ some extra. Others suggest allowing the OS to cache what it wants and
set shared_buffers to a smaller amount, allowing postgres to permanently
cache only the most frequently accessed smaller tables.

I would greatly appreciate your advice on this.
I am running PostgreSQL 9.4.6 on Linux kernel 3.10 (CentOS 7).

Thank you,
Chris





pgsql-admin by date:

Previous
From: Stephen Kuntz
Date:
Subject: Re: WAL replay asking for very old WAL
Next
From: HEMPLEMAN Matthew
Date:
Subject: Pgpool failing to connect to standby postgres server