Re: configurability of OOM killer - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: configurability of OOM killer
Date
Msg-id 1202151446.10057.759.camel@dogma.ljc.laika.com
Whole thread Raw
In response to Re: configurability of OOM killer  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: configurability of OOM killer  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
On Fri, 2008-02-01 at 19:08 -0500, Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > This page
> > http://linux-mm.org/OOM_Killer
> 
> Egad.  Whoever thought *this* was a good idea should be taken out
> and shot:

+1

>         /*
>          * Processes which fork a lot of child processes are likely
>          * a good choice. We add the vmsize of the childs if they
>          * have an own mm. This prevents forking servers to flood the
>          * machine with an endless amount of childs
>          */
> 
> In other words, server daemons are preferentially killed, and the parent
> will *always* get zapped in place of its child (since the child cannot
> have a higher score).  No wonder we have to turn off OOM kill.
> 

Technically, the child could have a higher score, because it only counts
half of the total vm size of the children. At first glance it's not that
bad of an idea, except that it takes into account the total vm size
(including shared memory), not only memory that is exclusive to the
process in question.

It's pretty easy to see that badness() (the function that determines
which process is killed when the OOM killer is invoked) will count the
same byte of memory many times over when calculating the "badness" of a
process like the postgres daemon. If you have shared_buffers=1GB on a
4GB box, and 100 connections open, badness() apparently thinks
postgresql is using about 50GB of memory. Oops. One would think a VM
hacker would know better.

I tried bringing this up on LKML several times (Ron Mayer linked to one
of my posts: http://lkml.org/lkml/2007/2/9/275). If anyone has an inside
connection to the linux developer community, I suggest that they raise
this issue.

If you want to experiment, start a postgres process with shared_buffers
set at 25% of the available memory, and then start about 100 idle
connections. Then, start a process that just slowly eats memory, such
that it will invoke the OOM killer after a couple minutes (badness()
takes into account the time the process has been alive, as well, so you
can't just eat memory in a tight loop).

The postgres process will always be killed, and then it will realize
that it didn't alleviate the memory pressure much, and then kill the
runaway process.

Regards,Jeff Davis



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: [COMMITTERS] pgsql: configure tag'd 8.3.0 and built witih autoconf 2.59
Next
From: Tom Lane
Date:
Subject: Re: release checklist