Thread: Memory settings, vm.overcommit, how to get it really safe?
As probably many people I've been running PostgreSQL on Linux with default overcommit settings and a lot of swap space for safety, though disabling overcommit is recommend according to recent documentation. PG's memory usage is not exactly predictable, for settings like work_mem I always monitored production load and tried to find a safe compromise, so that the box under typical load would never go into swap and on the other hand users don't need to raise it too often just to get a few OLAP queries perform OK. What I'm trying now is to get a safe configuration for vm.overcommit_memory = 2 and if possible run with much less or no swap space. On a clone box I disabled overcommit, lowered PG's memory settings a bit, disabled swap, mirrored production load to it and monitored it how it would behave. As I more or less expected, it got into trouble after about 6 hours. All memory was exhausted, it was even unable to fork bash again. To my surprise I haven't found any evidence of OOM going active in the logs. I blamed this behaviour to the swap space I've taken away, and not to disabling overcommit. However I just enabled overcommit again and tried to reproduce the behaviour. I was unable to get it into trouble again, even with artificial high load. Now I have a few questions: 1.) Why does it behave different when only changing overcommit? To my understanding it should have run out of memory in both cases, or can PG benefit from enabled overcommit? It's a minimal setup with PG being the only one using any noticeable amount of resources. 2.) Is it possible at all to put a cap on the memory PG uses in total from the OS side? kernel.shmmax, etc only limit some type of how PG might use memory? Of cause excluding OS/FS buffers etc. 3.) Can PG be made to use it's own temp files when it runs out of memory without setting memory settings so low that performance for typical load will be worse? I think it would be nice, if I wouldn't need s lot of swap, just to be safe under any load. Shouldn't that be more efficient than using paged out memory anyway? Currently it seems to me that I have to sacrifice the performance of typical load, when disabling overcommit and / or reducing swap, as I have to push PG's memory settings lower to be safe. What might make my case a little bit more predictable is that the number of backend processes / concurrent connections is fixed to 32. There will never be more or less. Thanks for any guidance / clarification. -- Best regards, Hannes Dorbath
Hannes Dorbath wrote: > As probably many people I've been running PostgreSQL on Linux with > default overcommit settings and a lot of swap space for safety, though > disabling overcommit is recommend according to recent documentation. > > PG's memory usage is not exactly predictable, for settings like work_mem > I always monitored production load and tried to find a safe compromise, > so that the box under typical load would never go into swap and on the > other hand users don't need to raise it too often just to get a few OLAP > queries perform OK. > > What I'm trying now is to get a safe configuration for > vm.overcommit_memory = 2 and if possible run with much less or no swap > space. > What distro / kernel version of linux are you running? We have a similar issue with late model hardware and RHEL4 recently here at work, where our workstations are running out of memory. They aren't running postgresql, they're java dev workstations and it appears to be a RHEL4 on 64 bit problem, so that's why I ask.
Scott Marlowe wrote: > What distro / kernel version of linux are you running? We have a > similar issue with late model hardware and RHEL4 recently here at work, > where our workstations are running out of memory. They aren't running > postgresql, they're java dev workstations and it appears to be a RHEL4 > on 64 bit problem, so that's why I ask. Linux 2.6.21-gentoo #2 SMP x86_64 Intel(R) Xeon(R) CPU 5130 GNU/Linux -- Best regards, Hannes Dorbath
On Thu, May 17, 2007 at 03:46:59PM +0200, Hannes Dorbath wrote: > On a clone box I disabled overcommit, lowered PG's memory settings a > bit, disabled swap, mirrored production load to it and monitored it how > it would behave. As I more or less expected, it got into trouble after > about 6 hours. All memory was exhausted, it was even unable to fork bash > again. To my surprise I haven't found any evidence of OOM going active > in the logs. I think you are misunderstanding what overcommit does. Normally when you're running programs and they fork(), the memory gets marked copy-on-write. This data is only once in memory but if it is written by one of the programs, they get they're own copy. Thus, it's memory allocated but not actually used. Thus "overcommit". Normally this is never a problem, but say that some unusual load happens and every process with shared usage actually wants it' own copy, and there's not enough memory+swap to hold it, something has to give. Thus the OOM killer. By disabling overcommit, all that happens is that in the above situation, if the kernel sees it's overcomitting total memry+swap, it returns ENOMEM instead. Thus instead of the unpredicatable OOM failure, you get unpredicatable fork()/malloc()/exec() failure. For example you can't start any more processes. > I blamed this behaviour to the swap space I've taken away, and not to > disabling overcommit. However I just enabled overcommit again and tried > to reproduce the behaviour. I was unable to get it into trouble again, > even with artificial high load. The default setting under Linux with overcommit off are that the total "allocated" pages in the system cannot exceed swap + 50% of memory. Thus by removing the swap you severely limited the amount of memory that could be used by programs. You need to give at least as much swap as memory, otherwise you'll never get the most out of your machine. > 1.) Why does it behave different when only changing overcommit? To my > understanding it should have run out of memory in both cases, or can PG > benefit from enabled overcommit? It's a minimal setup with PG being the > only one using any noticeable amount of resources. I hope I've answered you question above. Personally I don't disable overcommit as I find OOM less irritating that not being able to login when the machine is in trouble. > 2.) Is it possible at all to put a cap on the memory PG uses in total > from the OS side? kernel.shmmax, etc only limit some type of how PG > might use memory? Of cause excluding OS/FS buffers etc. You can make limit per process (see ulimit) and limit the number of connections. If a postgres process runs out of memory, it aborts the query. > 3.) Can PG be made to use it's own temp files when it runs out of memory > without setting memory settings so low that performance for typical load > will be worse? I think it would be nice, if I wouldn't need s lot of > swap, just to be safe under any load. Shouldn't that be more efficient > than using paged out memory anyway? Nope, letting the OS page is far more efficient than anything postgres can do. > Currently it seems to me that I have to sacrifice the performance of > typical load, when disabling overcommit and / or reducing swap, as I > have to push PG's memory settings lower to be safe. Make lots and lots of swap. You'll probably never use it, but at least it won't get in your way. I'd say 1-1.5 times your memory at least if you want overcommit off. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Hannes Dorbath wrote: > Scott Marlowe wrote: > >> What distro / kernel version of linux are you running? We have a >> similar issue with late model hardware and RHEL4 recently here at work, >> where our workstations are running out of memory. They aren't running >> postgresql, they're java dev workstations and it appears to be a RHEL4 >> on 64 bit problem, so that's why I ask. >> > > Linux 2.6.21-gentoo #2 SMP x86_64 Intel(R) Xeon(R) CPU 5130 GNU/Linux > I wonder if you could try it with the Uniprocessor kernel and see if your problem goes away. FYI, the machines we're having the problem with are RHEL4 with a kernel of: Linux 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux Note that 2.6.9-55 in RHEL speak is probably closer to 2.6.21 than 2.6.9, since they back port tons of stuff, but keep the same version number.
Martijn van Oosterhout wrote: > Make lots and lots of swap. You'll probably never use it, but at least > it won't get in your way. I'd say 1-1.5 times your memory at least if > you want overcommit off. Thanks for your detailed explanations. I indeed misunderstood overcommit with shared memory. So I just keep what I always had -- lots of swap and overcommit off. -- Best regards, Hannes Dorbath
Hannes Dorbath wrote: > So I just keep what I always had -- lots of swap and overcommit off. Ehrm, overcommit on I mean. -- Best regards, Hannes Dorbath
* Scott Marlowe: > What distro / kernel version of linux are you running? We have a > similar issue with late model hardware and RHEL4 recently here at > work, where our workstations are running out of memory. They aren't > running postgresql, they're java dev workstations and it appears to be > a RHEL4 on 64 bit problem, so that's why I ask. When Java sees that your machine has got plenty of RAM and more than one CPU, it assumes that it's a server and you want to run just a single VM, and configures itself to use a fair chunk of available RAM. This is more or less a Sun-specific issue. Other Java implementations make different choices. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
Florian Weimer wrote: > * Scott Marlowe: > > >> What distro / kernel version of linux are you running? We have a >> similar issue with late model hardware and RHEL4 recently here at >> work, where our workstations are running out of memory. They aren't >> running postgresql, they're java dev workstations and it appears to be >> a RHEL4 on 64 bit problem, so that's why I ask. >> > > When Java sees that your machine has got plenty of RAM and more than > one CPU, it assumes that it's a server and you want to run just a > single VM, and configures itself to use a fair chunk of available RAM. > > This is more or less a Sun-specific issue. Other Java implementations > make different choices. Yeah, but these boxes run out of free mem, buffer, cache and swap. On a machine with 4 gigs ram and 2 gigs swap, something is seriously wrong when all that ram just disappears, and it isn't just with java apps, though most of what the developers run are java apps / servers. We've had a machine with mozilla (no java extension in it) run out of memory just sitting idle.