Re: Getting OOM errors from PostgreSQL - Mailing list pgsql-admin
From | Chris Lajoie |
---|---|
Subject | Re: Getting OOM errors from PostgreSQL |
Date | |
Msg-id | 56DDC8CB.8050204@etriptrader.com Whole thread Raw |
In response to | Re: Getting OOM errors from PostgreSQL (Kevin Grittner <kgrittn@gmail.com>) |
Responses |
Re: Getting OOM errors from PostgreSQL
|
List | pgsql-admin |
On 03/05/2016 09:12 AM, Kevin Grittner wrote: > On Fri, Mar 4, 2016 at 3:23 PM, Chris Lajoie <chris@etriptrader.com> wrote: >> In my log I have been getting errors that look like this: >> LOG: could not fork new process for connection: Cannot allocate memory >> >> This seems to only happen during periods of high activity. >> Shortly after the last one occurred I checked the free mem and got >> this: >> # free -h >> total used free shared buff/cache available >> Mem: 9.4G 737M 283M 2.1G 8.4G 6.3G >> Swap: 2.0G 8.3M 2.0G >> >> From this I see the vast majority of the 10GB of memory is being >> used by the OS cache. If I understand how it works correctly, this >> means it is basically free memory because if an application needs >> it the kernel will free some of it for use... is my understanding >> correct? > That depends. In general, this report is light on useful detail; > you might want to review this page and fill in some of the missing > info, like OS version: > > https://wiki.postgresql.org/wiki/Guide_to_reporting_problems Appologies for not providing more useful info. I will endeavor to do so in the future. > The most important missing information can be captured by running > these shell statements: > > cat /proc/sys/vm/overcommit_memory > cat /proc/sys/vm/overcommit_ratio > > If overcommit_memory is 2, you will get "out of memory" when an > allocation would push allocated memory past swap space (2GB in your > case) plus the percentage of machine RAM indicated by > overcommit_ratio. Effectively, the difference between > overcommit_ratio and 100 is the percentage of RAM you are reserving > for the OS to keep things memory-resident (like buffers and cache). > A value of 20, for example, would mean that an allocation that > would reduce OS memory-resident pages below 80% of machine RAM will > trigger an "out of memory" error; so when you set > vm.overcommit_memory = 2 (which is generally a very good thing to > do when running PostgreSQL on the machine) you should raise > vm.overcommit_ratio = 80 or 90. You don't want to totally blow > away OS cache, but you probably want to survive other demands on > RAM that bring it down quite a bit. Despite not having provided more information you seem to have gotten to the crux of my problem anyway. As I was a little desperate to resolve the problem, I ended up changing various things in an attempt to just "make it go away" for now. I set overcommit_memory back to 0 and shared_buffers = 500MB. I believe this has the effect of just allowing the OS to cache things as it sees fit, and only reserving 500MB for postgres to explicitly cache whatever it wants. But I did not have any more OOM errors over the weekend. Your explanation of how overcommit_ratio affects what the OS will allow when overcommit_memory=2 is enlightening, but I am still not clear on some specifics. For example I thought postgres allocates the shared buffers on startup. If this is the case, why did I get OOM errors on connections? Was that particular connection simply pushing the memory past the amount established by the overcommit_ratio? overcommit_ratio was at its default (which is apparently 50 looking at /proc/sys/vm/overcommit_ratio). At this point I am not sure if I should change anything... I have seen conflicting advice at various places on the internet. Some suggest (assuming this is a dedicated database system) to use up a large chunk of RAM with shared_buffers, leaving enough for work_mem*max_connections + some extra. Others suggest allowing the OS to cache what it wants and set shared_buffers to a smaller amount, allowing postgres to permanently cache only the most frequently accessed smaller tables. I would greatly appreciate your advice on this. I am running PostgreSQL 9.4.6 on Linux kernel 3.10 (CentOS 7). Thank you, Chris
pgsql-admin by date: