Thread: Really out of memory?
I have a linux postgres server in the field. Its version is: PostgreSQL 8.2.4 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-51) (aka postgresql-8.2.4-1PGDG) A few days ago, its log started showing this: May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR: out of memory May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL: Failed on request of size 16777212. May 31 03:02:40 sfmelwss postgres[31490]: [1-1] ERROR: out of memory May 31 03:02:40 sfmelwss postgres[31490]: [1-2] DETAIL: Failed on request of size 16777212. May 31 03:05:40 sfmelwss postgres[31913]: [1-1] ERROR: out of memory May 31 03:05:40 sfmelwss postgres[31913]: [1-2] DETAIL: Failed on request of size 16777212. That seems pretty self-explainitory. But I'm not so sure, because SAR says: 02:30:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad 02:40:01 AM 13332 1003316 98.69 130448 198188 1034572 13996 1.33 32 02:50:01 AM 17116 999532 98.32 128708 196384 1034596 13972 1.33 44 03:00:01 AM 16372 1000276 98.39 129128 196388 1034596 13972 1.33 44 03:10:01 AM 17220 999428 98.31 128268 196828 1034736 13832 1.32 132 03:20:01 AM 14416 1002232 98.58 130464 197348 1035224 13344 1.27 152 03:30:01 AM 16292 1000356 98.40 127604 196684 1035700 12868 1.23 168 ...which indicates there was still plenty of space left in swap. Now, I realize I don't want to be actually using my swap, but I'm wondering if the out of memory messages are a red herring. Should I be looking at something else, like the number of processes, open files, or shared memory segments? FWIW, I have disabled the OOM killer (but not, as I understand it, my swap space) by setting: vm.overcommit_memory = 2 vm.overcommit_ratio = 100
On Tue, Jun 02, 2009 at 11:10:04AM -0700, Ben Chobot wrote: > I have a linux postgres server in the field. Its version is: > > PostgreSQL 8.2.4 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-51) > > (aka postgresql-8.2.4-1PGDG) > > A few days ago, its log started showing this: > > May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR: out of memory > May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL: Failed on request of size 16777212. Add even more swap. By turning overcommit off you make the kernel really pessimistic about how much memory is in use. > ...which indicates there was still plenty of space left in swap. Now, I > realize I don't want to be actually using my swap, but I'm wondering if > the out of memory messages are a red herring. Should I be looking at > something else, like the number of processes, open files, or shared > memory segments? You got as much swap as memory, try doubling it. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
Ben Chobot wrote: > May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR: out of memory > May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL: Failed on > request of size 16777212. Thats a 16MB request is that your work_mem size or something by any chance? > 02:30:01 AM kbmemfree kbmemused %memused kbbuffers kbcached > kbswpfree kbswpused %swpused kbswpcad > 02:40:01 AM 13332 1003316 98.69 130448 198188 > 1034572 13996 1.33 32 so you only have 13MB memory free. you have -do- have free swap, however. hey, is any ULIMIT in effect for the postgres process?
On Tue, 2 Jun 2009, Martijn van Oosterhout wrote: > On Tue, Jun 02, 2009 at 11:10:04AM -0700, Ben Chobot wrote: >> May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR: out of memory >> May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL: Failed on request of size 16777212. > > Add even more swap. By turning overcommit off you make the kernel > really pessimistic about how much memory is in use. Is it so pessimistic that it won't try to swap out 16MB into almost 1GB of free swap? That seems surprising to me.
Ben Chobot <bench@silentmedia.com> writes: > May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR: out of memory > May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL: Failed on request of size 16777212. So the kernel isn't letting PG have any more memory. > That seems pretty self-explainitory. But I'm not so sure, because SAR > says: > ... > ...which indicates there was still plenty of space left in swap. Which the kernel isn't letting us use. Check the "ulimit" settings that the postmaster is being started with. On a Linux box, any of the -d -m or -v settings might cause this. It's possible you are running out of 32-bit address space in the backend process, but what seems more likely is that the per-process ulimit is unreasonably small. regards, tom lane
On Tue, 2 Jun 2009, John R Pierce wrote: > Ben Chobot wrote: >> May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR: out of memory >> May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL: Failed on request of size 16777212. > > Thats a 16MB request is that your work_mem size or something by any chance? work_mem is 1MB, but maintenance_work_mem is 16MB. So it's probably autovacuum kicking off most of these messages. >> 02:30:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree >> kbswpused %swpused kbswpcad >> 02:40:01 AM 13332 1003316 98.69 130448 198188 1034572 >> 13996 1.33 32 > > so you only have 13MB memory free. you have -do- have free swap, however. > > > hey, is any ULIMIT in effect for the postgres process? Not that I can tell. There's nothing special in /etc/init.d/postgresql or /etc/sysconfig/pgsql/postgresql, and ulimit -a shows: core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited max nice (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 16127 max locked memory (kbytes, -l) 32 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 max rt priority (-r) 0 stack size (kbytes, -s) 10240 cpu time (seconds, -t) unlimited max user processes (-u) 16127 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited Is there a way to see what the limits are for a given pid? I don't see anything obviously relevant in /proc/<pid>/....
On Tue, 2 Jun 2009, Tom Lane wrote: > It's possible you are running out of 32-bit address space in the backend > process, but what seems more likely is that the per-process ulimit is > unreasonably small. I only have 1GB in the machine, and another 1GB of swap, so running out of 32-bit address space seems unlikely. Is there any way to rule it out?
Ben Chobot <bench@silentmedia.com> writes: >> hey, is any ULIMIT in effect for the postgres process? > Not that I can tell. There's nothing special in /etc/init.d/postgresql or > /etc/sysconfig/pgsql/postgresql, and ulimit -a shows: That tells you the limits for your interactive shell, but a daemon might be started under some other set of limits. > Is there a way to see what the limits are for a given pid? I don't see > anything obviously relevant in /proc/<pid>/.... You don't have /proc/<pid>/limits ? regards, tom lane
On Tue, 2 Jun 2009, Tom Lane wrote: >> Is there a way to see what the limits are for a given pid? I don't see >> anything obviously relevant in /proc/<pid>/.... > > You don't have /proc/<pid>/limits ? Nope. I'd like to believe I would consider that "obviously relevant." :) This server is running 2.6.20-1.2962.fc6, but should be upgraded to 2.6.26.8-57.fc8 in a month or two, which does provide that file. I was hoping to not have to wait till then to understand what's going wrong though.
Ben Chobot <bench@silentmedia.com> writes: > On Tue, 2 Jun 2009, Tom Lane wrote: >> You don't have /proc/<pid>/limits ? > Nope. I'd like to believe I would consider that "obviously relevant." :) Next best thing I can think of is to stick "ulimit -a >/tmp/mylimits" into the postgres initscript and restart. If the initscript is starting postgres via "su -l", it might be better to add the command in postgres' ~/.bashrc or some such place. You have to consider the possibility that the su is changing the ulimit environment. regards, tom lane
On Tue, Jun 02, 2009 at 11:45:11AM -0700, Ben Chobot wrote: > On Tue, 2 Jun 2009, Martijn van Oosterhout wrote: > >> On Tue, Jun 02, 2009 at 11:10:04AM -0700, Ben Chobot wrote: > >>> May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR: out of memory >>> May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL: Failed on request of size 16777212. >> >> Add even more swap. By turning overcommit off you make the kernel >> really pessimistic about how much memory is in use. > > Is it so pessimistic that it won't try to swap out 16MB into almost 1GB > of free swap? That seems surprising to me. It's got nothing to do with how much swap is in use. It's preventing you from allocating memory that *hypothetically* might not be available if every byte of allocated memory were actually used. For example, on my desktop I have 1GB of RAM of which about 600MB is free, yet there is 1.4GB committed. With overcommit off my machine may not boot. As you can see, only 25% of committed memory is actually needed, because lots of pages are blank or shared. Ofcourse, all those copies of libc are realistically never not going to be shared so it's a good bet. But with overcommit off you can see that you might want to have double or triple the amount of swap to handle the hypothetical case. I'm not saying this is necessarily the case for you, but it's the first thing that came to mind and relatively easy to check. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Attachment
On Tue, 2 Jun 2009, Martijn van Oosterhout wrote: > It's got nothing to do with how much swap is in use. It's preventing > you from allocating memory that *hypothetically* might not be available > if every byte of allocated memory were actually used. > > For example, on my desktop I have 1GB of RAM of which about 600MB is > free, yet there is 1.4GB committed. With overcommit off my machine > may not boot. As you can see, only 25% of committed memory is actually > needed, because lots of pages are blank or shared. Ofcourse, all those > copies of libc are realistically never not going to be shared so it's a > good bet. > > But with overcommit off you can see that you might want to have double > or triple the amount of swap to handle the hypothetical case. No, sorry, I don't see why I would need more swap when I've disabled memory overcommit. As I understand it, the kernel should be able to allocate (swap + (physical * overcommit_ratio)), which in my case is just swap+physical, and it seems to not want to do that.