Thread: Getting OOM errors from PostgreSQL
In my log I have been getting errors that look like this: LOG: could not fork new process for connection: Cannot allocate memory This seems to only happen during periods of high activity. Shortly after the last one occurred I checked the free mem andgot this: # free -h total used free shared buff/cache available Mem: 9.4G 737M 283M 2.1G 8.4G 6.3G Swap: 2.0G 8.3M 2.0G From this I see the vast majority of the 10GB of memory is being used by the OS cache. If I understand how it works correctly,this means it is basically free memory because if an application needs it the kernel will free some of it for use...is my understanding correct? Here are the memory-related settings I am using: max_connections = 50 shared_buffers = 2000MB maintenance_work_mem = 500MB autovacuum_work_mem = 500MB work_mem = 50MB temp_buffers = 16MB Can anyone help me determine what is causing postgresql to be unable to fork a new process because it can't allocate memory? Chris
On Fri, Mar 4, 2016 at 3:23 PM, Chris Lajoie <chris@etriptrader.com> wrote: > In my log I have been getting errors that look like this: > LOG: could not fork new process for connection: Cannot allocate memory > > This seems to only happen during periods of high activity. > Shortly after the last one occurred I checked the free mem and got > this: > # free -h > total used free shared buff/cache available > Mem: 9.4G 737M 283M 2.1G 8.4G 6.3G > Swap: 2.0G 8.3M 2.0G > > From this I see the vast majority of the 10GB of memory is being > used by the OS cache. If I understand how it works correctly, this > means it is basically free memory because if an application needs > it the kernel will free some of it for use... is my understanding > correct? That depends. In general, this report is light on useful detail; you might want to review this page and fill in some of the missing info, like OS version: https://wiki.postgresql.org/wiki/Guide_to_reporting_problems The most important missing information can be captured by running these shell statements: cat /proc/sys/vm/overcommit_memory cat /proc/sys/vm/overcommit_ratio If overcommit_memory is 2, you will get "out of memory" when an allocation would push allocated memory past swap space (2GB in your case) plus the percentage of machine RAM indicated by overcommit_ratio. Effectively, the difference between overcommit_ratio and 100 is the percentage of RAM you are reserving for the OS to keep things memory-resident (like buffers and cache). A value of 20, for example, would mean that an allocation that would reduce OS memory-resident pages below 80% of machine RAM will trigger an "out of memory" error; so when you set vm.overcommit_memory = 2 (which is generally a very good thing to do when running PostgreSQL on the machine) you should raise vm.overcommit_ratio = 80 or 90. You don't want to totally blow away OS cache, but you probably want to survive other demands on RAM that bring it down quite a bit. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 03/05/2016 09:12 AM, Kevin Grittner wrote: > On Fri, Mar 4, 2016 at 3:23 PM, Chris Lajoie <chris@etriptrader.com> wrote: >> In my log I have been getting errors that look like this: >> LOG: could not fork new process for connection: Cannot allocate memory >> >> This seems to only happen during periods of high activity. >> Shortly after the last one occurred I checked the free mem and got >> this: >> # free -h >> total used free shared buff/cache available >> Mem: 9.4G 737M 283M 2.1G 8.4G 6.3G >> Swap: 2.0G 8.3M 2.0G >> >> From this I see the vast majority of the 10GB of memory is being >> used by the OS cache. If I understand how it works correctly, this >> means it is basically free memory because if an application needs >> it the kernel will free some of it for use... is my understanding >> correct? > That depends. In general, this report is light on useful detail; > you might want to review this page and fill in some of the missing > info, like OS version: > > https://wiki.postgresql.org/wiki/Guide_to_reporting_problems Appologies for not providing more useful info. I will endeavor to do so in the future. > The most important missing information can be captured by running > these shell statements: > > cat /proc/sys/vm/overcommit_memory > cat /proc/sys/vm/overcommit_ratio > > If overcommit_memory is 2, you will get "out of memory" when an > allocation would push allocated memory past swap space (2GB in your > case) plus the percentage of machine RAM indicated by > overcommit_ratio. Effectively, the difference between > overcommit_ratio and 100 is the percentage of RAM you are reserving > for the OS to keep things memory-resident (like buffers and cache). > A value of 20, for example, would mean that an allocation that > would reduce OS memory-resident pages below 80% of machine RAM will > trigger an "out of memory" error; so when you set > vm.overcommit_memory = 2 (which is generally a very good thing to > do when running PostgreSQL on the machine) you should raise > vm.overcommit_ratio = 80 or 90. You don't want to totally blow > away OS cache, but you probably want to survive other demands on > RAM that bring it down quite a bit. Despite not having provided more information you seem to have gotten to the crux of my problem anyway. As I was a little desperate to resolve the problem, I ended up changing various things in an attempt to just "make it go away" for now. I set overcommit_memory back to 0 and shared_buffers = 500MB. I believe this has the effect of just allowing the OS to cache things as it sees fit, and only reserving 500MB for postgres to explicitly cache whatever it wants. But I did not have any more OOM errors over the weekend. Your explanation of how overcommit_ratio affects what the OS will allow when overcommit_memory=2 is enlightening, but I am still not clear on some specifics. For example I thought postgres allocates the shared buffers on startup. If this is the case, why did I get OOM errors on connections? Was that particular connection simply pushing the memory past the amount established by the overcommit_ratio? overcommit_ratio was at its default (which is apparently 50 looking at /proc/sys/vm/overcommit_ratio). At this point I am not sure if I should change anything... I have seen conflicting advice at various places on the internet. Some suggest (assuming this is a dedicated database system) to use up a large chunk of RAM with shared_buffers, leaving enough for work_mem*max_connections + some extra. Others suggest allowing the OS to cache what it wants and set shared_buffers to a smaller amount, allowing postgres to permanently cache only the most frequently accessed smaller tables. I would greatly appreciate your advice on this. I am running PostgreSQL 9.4.6 on Linux kernel 3.10 (CentOS 7). Thank you, Chris
On Mon, Mar 7, 2016 at 11:30 AM, Chris Lajoie <chris@etriptrader.com> wrote: > On 03/05/2016 09:12 AM, Kevin Grittner wrote: >> >> On Fri, Mar 4, 2016 at 3:23 PM, Chris Lajoie <chris@etriptrader.com> >> wrote: >>> >>> In my log I have been getting errors that look like this: >>> LOG: could not fork new process for connection: Cannot allocate memory >>> >>> This seems to only happen during periods of high activity. >>> Shortly after the last one occurred I checked the free mem and got >>> this: >>> # free -h >>> total used free shared buff/cache >>> available >>> Mem: 9.4G 737M 283M 2.1G 8.4G >>> 6.3G >>> Swap: 2.0G 8.3M 2.0G >>> >>> From this I see the vast majority of the 10GB of memory is being >>> used by the OS cache. If I understand how it works correctly, this >>> means it is basically free memory because if an application needs >>> it the kernel will free some of it for use... is my understanding >>> correct? >> >> That depends. In general, this report is light on useful detail; >> you might want to review this page and fill in some of the missing >> info, like OS version: >> >> https://wiki.postgresql.org/wiki/Guide_to_reporting_problems > > Appologies for not providing more useful info. I will endeavor to do so in > the future. >> >> The most important missing information can be captured by running >> these shell statements: >> >> cat /proc/sys/vm/overcommit_memory >> cat /proc/sys/vm/overcommit_ratio >> >> If overcommit_memory is 2, you will get "out of memory" when an >> allocation would push allocated memory past swap space (2GB in your >> case) plus the percentage of machine RAM indicated by >> overcommit_ratio. Effectively, the difference between >> overcommit_ratio and 100 is the percentage of RAM you are reserving >> for the OS to keep things memory-resident (like buffers and cache). >> A value of 20, for example, would mean that an allocation that >> would reduce OS memory-resident pages below 80% of machine RAM will >> trigger an "out of memory" error; so when you set >> vm.overcommit_memory = 2 (which is generally a very good thing to >> do when running PostgreSQL on the machine) you should raise >> vm.overcommit_ratio = 80 or 90. You don't want to totally blow >> away OS cache, but you probably want to survive other demands on >> RAM that bring it down quite a bit. > > Despite not having provided more information you seem to have gotten to the > crux of my problem anyway. As I was a little desperate to resolve the > problem, I ended up changing various things in an attempt to just "make it > go away" for now. I set overcommit_memory back to 0 and shared_buffers = > 500MB. I believe this has the effect of just allowing the OS to cache things > as it sees fit, and only reserving 500MB for postgres to explicitly cache > whatever it wants. But I did not have any more OOM errors over the weekend. > > Your explanation of how overcommit_ratio affects what the OS will allow when > overcommit_memory=2 is enlightening, but I am still not clear on some > specifics. For example I thought postgres allocates the shared buffers on > startup. If this is the case, why did I get OOM errors on connections? Was > that particular connection simply pushing the memory past the amount > established by the overcommit_ratio? overcommit_ratio was at its default > (which is apparently 50 looking at /proc/sys/vm/overcommit_ratio). > > At this point I am not sure if I should change anything... I have seen > conflicting advice at various places on the internet. Some suggest (assuming > this is a dedicated database system) to use up a large chunk of RAM with > shared_buffers, leaving enough for work_mem*max_connections + some extra. > Others suggest allowing the OS to cache what it wants and set shared_buffers > to a smaller amount, allowing postgres to permanently cache only the most > frequently accessed smaller tables. > > I would greatly appreciate your advice on this. > I am running PostgreSQL 9.4.6 on Linux kernel 3.10 (CentOS 7). Note that running with overcommit_memory = 0 if you do start to run out of memory, the oom killer will often kill the postmaster. This is bad. If this happens make sure to kill all postgres children before trying to restart the db, as starting a new postmaster with children still running will instantly and permanently corrupt / destroy your db. The eventual state you want is to be able to run with overcommit = 2 and settings that prevent out of memory allocations. Note too that if you never seem to actually run out of memory, but get allocation errors, it can also be a lack of file handles (I think that's what caused it in the past for me. Been a while) Point being that you can get a failure to allocate memory when there's plenty of memory due to other settings on your server.
On Wed, Mar 9, 2016 at 11:00 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > Note that running with overcommit_memory = 0 if you do start to run > out of memory, the oom killer will often kill the postmaster. This is > bad. If this happens make sure to kill all postgres children before > trying to restart the db, as starting a new postmaster with children > still running will instantly and permanently corrupt / destroy your > db. And even if it kills one of the "normal" backends (perhaps even the one responsible for some excessive allocations) it causes a PANIC, which is a crash and restart of the entire database service. It will usually do this on a reference to memory which appeared to be successfully allocated, which can be fairly confusing. With overcommit_memory = 2 you usually get just a FATAL error (loss of connection) on the one connection doing the memory allocation which puts things over the top, with a dump of space used by memory contexts in the log. > The eventual state you want is to be able to run with overcommit = 2 > and settings that prevent out of memory allocations. Agreed. > Note too that if you never seem to actually run out of memory, but get > allocation errors, it can also be a lack of file handles (I think > that's what caused it in the past for me. Been a while) Point being > that you can get a failure to allocate memory when there's plenty of > memory due to other settings on your server. Yes, I have seen that, too. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company