Thread: Really out of memory?

Really out of memory?

From
Ben Chobot
Date:
I have a linux postgres server in the field. Its version is:

PostgreSQL 8.2.4 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-51)

(aka postgresql-8.2.4-1PGDG)

A few days ago, its log started showing this:

May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR:  out of memory
May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL:  Failed on request of size 16777212.
May 31 03:02:40 sfmelwss postgres[31490]: [1-1] ERROR:  out of memory
May 31 03:02:40 sfmelwss postgres[31490]: [1-2] DETAIL:  Failed on request of size 16777212.
May 31 03:05:40 sfmelwss postgres[31913]: [1-1] ERROR:  out of memory
May 31 03:05:40 sfmelwss postgres[31913]: [1-2] DETAIL:  Failed on request of size 16777212.

That seems pretty self-explainitory. But I'm not so sure, because SAR
says:

02:30:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree kbswpused  %swpused  kbswpcad
02:40:01 AM     13332   1003316     98.69    130448    198188   1034572     13996      1.33        32
02:50:01 AM     17116    999532     98.32    128708    196384   1034596     13972      1.33        44
03:00:01 AM     16372   1000276     98.39    129128    196388   1034596     13972      1.33        44
03:10:01 AM     17220    999428     98.31    128268    196828   1034736     13832      1.32       132
03:20:01 AM     14416   1002232     98.58    130464    197348   1035224     13344      1.27       152
03:30:01 AM     16292   1000356     98.40    127604    196684   1035700     12868      1.23       168

...which indicates there was still plenty of space left in swap. Now, I
realize I don't want to be actually using my swap, but I'm wondering if
the out of memory messages are a red herring. Should I be looking at
something else, like the number of processes, open files, or shared memory
segments?

FWIW, I have disabled the OOM killer (but not, as I understand it, my
swap space) by setting:
vm.overcommit_memory = 2
vm.overcommit_ratio = 100

Re: Really out of memory?

From
Martijn van Oosterhout
Date:
On Tue, Jun 02, 2009 at 11:10:04AM -0700, Ben Chobot wrote:
> I have a linux postgres server in the field. Its version is:
>
> PostgreSQL 8.2.4 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-51)
>
> (aka postgresql-8.2.4-1PGDG)
>
> A few days ago, its log started showing this:
>
> May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR:  out of memory
> May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL:  Failed on request of size 16777212.

Add even more swap. By turning overcommit off you make the kernel
really pessimistic about how much memory is in use.

> ...which indicates there was still plenty of space left in swap. Now, I
> realize I don't want to be actually using my swap, but I'm wondering if
> the out of memory messages are a red herring. Should I be looking at
> something else, like the number of processes, open files, or shared
> memory segments?

You got as much swap as memory, try doubling it.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

Re: Really out of memory?

From
John R Pierce
Date:
Ben Chobot wrote:
> May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR:  out of memory
> May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL:  Failed on
> request of size 16777212.

Thats a 16MB request  is that your work_mem size or something by any chance?


> 02:30:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached
> kbswpfree kbswpused  %swpused  kbswpcad
> 02:40:01 AM     13332   1003316     98.69    130448    198188
> 1034572     13996      1.33        32

so you only have 13MB memory free.  you have -do- have free swap, however.


hey, is any ULIMIT in effect for the postgres process?





Re: Really out of memory?

From
Ben Chobot
Date:
On Tue, 2 Jun 2009, Martijn van Oosterhout wrote:

> On Tue, Jun 02, 2009 at 11:10:04AM -0700, Ben Chobot wrote:

>> May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR:  out of memory
>> May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL:  Failed on request of size 16777212.
>
> Add even more swap. By turning overcommit off you make the kernel
> really pessimistic about how much memory is in use.

Is it so pessimistic that it won't try to swap out 16MB into almost 1GB of
free swap? That seems surprising to me.

Re: Really out of memory?

From
Tom Lane
Date:
Ben Chobot <bench@silentmedia.com> writes:
> May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR:  out of memory
> May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL:  Failed on request of size 16777212.

So the kernel isn't letting PG have any more memory.

> That seems pretty self-explainitory. But I'm not so sure, because SAR
> says:
> ...
> ...which indicates there was still plenty of space left in swap.

Which the kernel isn't letting us use.  Check the "ulimit" settings
that the postmaster is being started with.  On a Linux box, any of
the -d -m or -v settings might cause this.

It's possible you are running out of 32-bit address space in the backend
process, but what seems more likely is that the per-process ulimit is
unreasonably small.

            regards, tom lane

Re: Really out of memory?

From
Ben Chobot
Date:
On Tue, 2 Jun 2009, John R Pierce wrote:

> Ben Chobot wrote:
>>  May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR:  out of memory
>>  May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL:  Failed on request of size 16777212.
>
> Thats a 16MB request  is that your work_mem size or something by any chance?

work_mem is 1MB, but maintenance_work_mem is 16MB. So it's probably
autovacuum kicking off most of these messages.

>>  02:30:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached kbswpfree
>>  kbswpused  %swpused  kbswpcad
>>  02:40:01 AM     13332   1003316     98.69    130448    198188   1034572
>>  13996      1.33        32
>
> so you only have 13MB memory free.  you have -do- have free swap, however.
>
>
> hey, is any ULIMIT in effect for the postgres process?

Not that I can tell. There's nothing special in /etc/init.d/postgresql or
/etc/sysconfig/pgsql/postgresql, and ulimit -a shows:

core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
max nice                        (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 16127
max locked memory       (kbytes, -l) 32
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
max rt priority                 (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 16127
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Is there a way to see what the limits are for a given pid? I don't see
anything obviously relevant in /proc/<pid>/....

Re: Really out of memory?

From
Ben Chobot
Date:
On Tue, 2 Jun 2009, Tom Lane wrote:

> It's possible you are running out of 32-bit address space in the backend
> process, but what seems more likely is that the per-process ulimit is
> unreasonably small.

I only have 1GB in the machine, and another 1GB of swap, so running out of
32-bit address space seems unlikely. Is there any way to rule it out?

Re: Really out of memory?

From
Tom Lane
Date:
Ben Chobot <bench@silentmedia.com> writes:
>> hey, is any ULIMIT in effect for the postgres process?

> Not that I can tell. There's nothing special in /etc/init.d/postgresql or
> /etc/sysconfig/pgsql/postgresql, and ulimit -a shows:

That tells you the limits for your interactive shell, but a daemon might
be started under some other set of limits.

> Is there a way to see what the limits are for a given pid? I don't see
> anything obviously relevant in /proc/<pid>/....

You don't have /proc/<pid>/limits ?

            regards, tom lane

Re: Really out of memory?

From
Ben Chobot
Date:
On Tue, 2 Jun 2009, Tom Lane wrote:

>> Is there a way to see what the limits are for a given pid? I don't see
>> anything obviously relevant in /proc/<pid>/....
>
> You don't have /proc/<pid>/limits ?

Nope. I'd like to believe I would consider that "obviously relevant." :)

This server is running 2.6.20-1.2962.fc6, but should be upgraded to
2.6.26.8-57.fc8 in a month or two, which does provide that file. I was
hoping to not have to wait till then to understand what's going wrong
though.

Re: Really out of memory?

From
Tom Lane
Date:
Ben Chobot <bench@silentmedia.com> writes:
> On Tue, 2 Jun 2009, Tom Lane wrote:
>> You don't have /proc/<pid>/limits ?

> Nope. I'd like to believe I would consider that "obviously relevant." :)

Next best thing I can think of is to stick "ulimit -a >/tmp/mylimits"
into the postgres initscript and restart.

If the initscript is starting postgres via "su -l", it might be better
to add the command in postgres' ~/.bashrc or some such place.  You
have to consider the possibility that the su is changing the ulimit
environment.

            regards, tom lane

Re: Really out of memory?

From
Martijn van Oosterhout
Date:
On Tue, Jun 02, 2009 at 11:45:11AM -0700, Ben Chobot wrote:
> On Tue, 2 Jun 2009, Martijn van Oosterhout wrote:
>
>> On Tue, Jun 02, 2009 at 11:10:04AM -0700, Ben Chobot wrote:
>
>>> May 31 02:59:40 sfmelwss postgres[30103]: [1-1] ERROR:  out of memory
>>> May 31 02:59:40 sfmelwss postgres[30103]: [1-2] DETAIL:  Failed on request of size 16777212.
>>
>> Add even more swap. By turning overcommit off you make the kernel
>> really pessimistic about how much memory is in use.
>
> Is it so pessimistic that it won't try to swap out 16MB into almost 1GB
> of free swap? That seems surprising to me.

It's got nothing to do with how much swap is in use. It's preventing
you from allocating memory that *hypothetically* might not be available
if every byte of allocated memory were actually used.

For example, on my desktop I have 1GB of RAM of which about 600MB is
free, yet there is 1.4GB committed. With overcommit off my machine
may not boot. As you can see, only 25% of committed memory is actually
needed, because lots of pages are blank or shared. Ofcourse, all those
copies of libc are realistically never not going to be shared so it's a
good bet.

But with overcommit off you can see that you might want to have double
or triple the amount of swap to handle the hypothetical case.

I'm not saying this is necessarily the case for you, but it's the first
thing that came to mind and relatively easy to check.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

Re: Really out of memory?

From
Ben Chobot
Date:
On Tue, 2 Jun 2009, Martijn van Oosterhout wrote:

> It's got nothing to do with how much swap is in use. It's preventing
> you from allocating memory that *hypothetically* might not be available
> if every byte of allocated memory were actually used.
>
> For example, on my desktop I have 1GB of RAM of which about 600MB is
> free, yet there is 1.4GB committed. With overcommit off my machine
> may not boot. As you can see, only 25% of committed memory is actually
> needed, because lots of pages are blank or shared. Ofcourse, all those
> copies of libc are realistically never not going to be shared so it's a
> good bet.
>
> But with overcommit off you can see that you might want to have double
> or triple the amount of swap to handle the hypothetical case.

No, sorry, I don't see why I would need more swap when I've disabled
memory overcommit. As I understand it, the kernel should be able to
allocate (swap + (physical * overcommit_ratio)), which in my case is just
swap+physical, and it seems to not want to do that.