Thread: PostgreSQL oom_adj postmaster process to -17

PostgreSQL oom_adj postmaster process to -17

From

Radovan Jablonovsky

Date:

24 July 2012, 16:19:00

Hello,

We are running PostgreSQL version 9.1.1 with 32GB of RAM, 32GB of SWAP and during high load we could reach (swap + RAM) memory limit. In this case OOM-killer kills postgresql process(es). (Out of Memory: Killed process 12345 (postgres)). As admin I would like to exclude postgresql system processes from being chosen by OOM-killer. Based on the PostgreSQL documentation it could be done by properly setting vm.overcommit_memory=2, vm.overcommit_ratio=(probably between 50 and 90) and set the process-specific oom_adj value for the postmaster process to -17, thereby guaranteeing it will not be targeted by the OOM killer. The PostgreSQL should build with -DLINUX_OOM_ADJ=0 added to CPPFLAGS to have child processes oom_adj equal to 0.

1) Will this setting means other system processes (logger process, writer process, stats collector process, wal receiver process, etc..), which are children of postmaster process, will be running with oom_adj set to 0 too?

2) Should the proper Postgresql compilation configuration be: ./configure ... CPPFLAGS="-DLINUX_OOM_ADJ=0" ...

3) Does somebody has experience with this solution for CentOS kernel 2.6.18-308.el5 x86_64?

Sincerely,

Radovan Jablonovsky

Re: PostgreSQL oom_adj postmaster process to -17

From

"Kevin Grittner"

Date:

01 August 2012, 14:49:08

Radovan Jablonovsky <radovan.jablonovsky@replicon.com> wrote:

> We are running PostgreSQL version 9.1.1

You should apply the latest bug fixes by updating to 9.1.4.

http://www.postgresql.org/support/versioning/

> with 32GB of RAM, 32GB of SWAP and during high load we could reach
> (swap + RAM) memory limit.

If you're even *starting* to swap you're doing something wrong, much
less exhausting swap space equal to actual RAM.  What is your
configuration?

http://wiki.postgresql.org/wiki/Server_Configuration

While it's probably a good idea to configure the OOM killer to
behave more sanely, we tend to ignore it in favor of ensuring that
it never comes into play.  We run about 200 databases 24/7 and I
think I've seen it kick in about twice -- when we ran queries that
leaked memory on each row in a big query.

-Kevin

Re: PostgreSQL oom_adj postmaster process to -17

From

Radovan Jablonovsky

Date:

03 August 2012, 15:09:05

Thanks you for your response.

Database config:

shared_buffers = 8GB

temp_buffers = 32MB

work_mem = 64MB

maintenance_work_mem = 512MB

effective_cache_size = 16GB

In usual load there are not much pressure on memory, but it is possible to have all clients start using heavy reports. They are valid requests and could consume all memory. In this border and not likely but possible scenario it could be useful to let OOM killer to kill client's processes/connections but leave PostgreSQL system processes (postmaster, writer, stat, log, streaming, ...) excluded from reach of OOM killer.

On Wed, Aug 1, 2012 at 11:48 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:

Radovan Jablonovsky <radovan.jablonovsky@replicon.com> wrote:

> We are running PostgreSQL version 9.1.1

You should apply the latest bug fixes by updating to 9.1.4.

http://www.postgresql.org/support/versioning/

> with 32GB of RAM, 32GB of SWAP and during high load we could reach
> (swap + RAM) memory limit.

If you're even *starting* to swap you're doing something wrong, much
less exhausting swap space equal to actual RAM. What is your
configuration?

http://wiki.postgresql.org/wiki/Server_Configuration

While it's probably a good idea to configure the OOM killer to
behave more sanely, we tend to ignore it in favor of ensuring that
it never comes into play. We run about 200 databases 24/7 and I
think I've seen it kick in about twice -- when we ran queries that
leaked memory on each row in a big query.

-Kevin

Radovan

Re: PostgreSQL oom_adj postmaster process to -17

From

Tom Lane

Date:

03 August 2012, 15:33:10

Radovan Jablonovsky <radovan.jablonovsky@replicon.com> writes:
> In usual load there are not much pressure on memory, but it is possible to
> have all clients start using heavy reports. They are valid requests and
> could consume all memory. In this border and not likely but possible
> scenario it could be useful to let OOM killer to kill client's
> processes/connections but leave PostgreSQL system processes (postmaster,
> writer, stat, log, streaming, ...) excluded  from reach of OOM killer.

The code already supports excluding the postmaster itself from OOM kills
while letting its children be subject to them.  Being selective about
which children are subject is pointless, though: if any child is kill
-9'd, we have to zap the rest and restart, because there is no way to be
sure that the victim left shared memory in a consistent state.

            regards, tom lane

Re: PostgreSQL oom_adj postmaster process to -17

From

"Kevin Grittner"

Date:

03 August 2012, 16:03:45

Radovan Jablonovsky <radovan.jablonovsky@replicon.com> wrote:

> In usual load there are not much pressure on memory, but it is
> possible to have all clients start using heavy reports. They are
> valid requests and could consume all memory.

Your clients will get their results back faster if you can arrange
some way to queue these sorts of requests when they get beyond some
reasonable limit.  You might be able to do that using a connection
pool, or you might want to create some sort of job queue which
releases a limited number of such jobs at a time; but I guarantee
that every single person to submit a job to such a queue, including
the last person, will get their results sooner with such queuing
than turning loose a "thundering herd" of requests which puts the
system into swapping.  I guarantee it.

-Kevin

Re: PostgreSQL oom_adj postmaster process to -17

From

Scott Marlowe

Date:

03 August 2012, 17:05:43

On Fri, Aug 3, 2012 at 12:08 PM, Radovan Jablonovsky
<radovan.jablonovsky@replicon.com> wrote:
> Thanks you for your response.
>
> Database config:
> shared_buffers = 8GB
> temp_buffers = 32MB
> work_mem = 64MB
> maintenance_work_mem = 512MB
> effective_cache_size = 16GB
>
> In usual load there are not much pressure on memory, but it is possible to
> have all clients start using heavy reports. They are valid requests and
> could consume all memory. In this border and not likely but possible
> scenario it could be useful to let OOM killer to kill client's
> processes/connections but leave PostgreSQL system processes (postmaster,
> writer, stat, log, streaming, ...) excluded  from reach of OOM killer.

You're only realistic solution is to either limit the incoming
connections via a connection pooler like pgbouncer or to lower your
work_mem to something smaller. What's you're current max connections
setting?

Re: PostgreSQL oom_adj postmaster process to -17

From

Radovan Jablonovsky

Date:

08 August 2012, 17:04:49

Currently there are maximum 600 connections.

On Fri, Aug 3, 2012 at 2:05 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote:

On Fri, Aug 3, 2012 at 12:08 PM, Radovan Jablonovsky
<radovan.jablonovsky@replicon.com> wrote:
> Thanks you for your response.
>
> Database config:
> shared_buffers = 8GB
> temp_buffers = 32MB
> work_mem = 64MB
> maintenance_work_mem = 512MB
> effective_cache_size = 16GB
>
> In usual load there are not much pressure on memory, but it is possible to
> have all clients start using heavy reports. They are valid requests and
> could consume all memory. In this border and not likely but possible
> scenario it could be useful to let OOM killer to kill client's
> processes/connections but leave PostgreSQL system processes (postmaster,
> writer, stat, log, streaming, ...) excluded from reach of OOM killer.

You're only realistic solution is to either limit the incoming
connections via a connection pooler like pgbouncer or to lower your
work_mem to something smaller. What's you're current max connections
setting?

Radovan Jablonovsky | SaaS DBA | Phone 1-403-262-6519 (ext. 256) | Fax 1-403-233-8046

We are hiring! | search jobs

Re: PostgreSQL oom_adj postmaster process to -17

From

"Kevin Grittner"

Date:

08 August 2012, 17:25:09

Radovan Jablonovsky <radovan.jablonovsky@replicon.com> wrote:

> PostgreSQL version 9.1.1 with 32GB of RAM

> shared_buffers = 8GB
> temp_buffers = 32MB
> work_mem = 64MB
> maintenance_work_mem = 512MB

> Currently there are maximum 600 connections.

Please read:

http://wiki.postgresql.org/wiki/Number_Of_Database_Connections

-Kevin

Re: PostgreSQL oom_adj postmaster process to -17

From

Craig Ringer

Date:

08 August 2012, 22:34:36

On 08/09/2012 04:24 AM, Kevin Grittner wrote:
> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections
>
Can we please please PLEASE link to that as a comment above max_connections?

Last time this came up nobody was happy with wording of a comment so
nothing got done. It's a real usability wart - causing real-world
performance and reliability problems - that people unwittingly raise
max_connections to absurd levels because they get no warnings, hints or
guidance of any sort.

--
Craig Ringer

Re: PostgreSQL oom_adj postmaster process to -17

From

"Kevin Grittner"

Date:

09 August 2012, 11:09:58

Craig Ringer <ringerc@ringerc.id.au> wrote:
> On 08/09/2012 04:24 AM, Kevin Grittner wrote:
>> http://wiki.postgresql.org/wiki/Number_Of_Database_Connections
>>
> Can we please please PLEASE link to that as a comment above
> max_connections?
>
> Last time this came up nobody was happy with wording of a comment
> so nothing got done. It's a real usability wart - causing
> real-world performance and reliability problems - that people
> unwittingly raise max_connections to absurd levels because they
> get no warnings, hints or guidance of any sort.

I see that we currently have five links to wiki.postgresql.org in
release notes and four more in the rest of the docs.  Are people OK
with adding this link to the docs on max_connections?  (Feel free to
improve it before answering if you have qualms about the specifics
on that page.)

We do seem to get an awful lot of posts (between here and
StackOverflow) from people who assume they need one database
connection per active user, and then are surprised that performance
is horrible.

If we get consensus on linking to this I'll put together a patch to
make a brief comment in the docs with a link to the Wiki.

-Kevin