Re: Getting out ahead of OOM - Mailing list pgsql-admin

From Joe Conway
Subject Re: Getting out ahead of OOM
Date
Msg-id efcca885-a954-43e2-99b8-8b993678c72c@joeconway.com
Whole thread Raw
In response to Re: Getting out ahead of OOM  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Getting out ahead of OOM
List pgsql-admin
On 3/7/25 14:26, Tom Lane wrote:
> Joseph Hammerman <joe.hammerman@datadoghq.com> writes:
>> We run Postgres in a Kubernetes environment, and we have not to date been
>> able to convince our Compute team to create a class of Kubernetes hosts
>> that have memory overcommit disabled.
> 
> :-(
> 
>> Has anyone had success tracking all the Postgres memory allocation
>> configurables and using that to administratively prevent OOMing?
> 
> I doubt anyone has tried that.  I would look into whether running
> the postmaster under a suitable ulimit helps.  I seem to recall
> discussions that in Linux, "ulimit -v" works better than the other
> likely-looking options.  But that might be stale information.

Problem with ulimit is that it is per process, but within a Kubernetes 
pod the memory accounting is for all the pod's processes.

>> Alternatively, has anyone has success implementing an extension or periodic
>> process to monitor the memory consumption of the Postgres children and
>> killing them before the OOM event occurs?
> 
> That's not going to be noticeably nicer than the kernel-induced
> OOM, I think.  The one thing it might do for you is ensure that
> the kill happens to a child process and not the postmaster; but
> you can already use PG_OOM_ADJUST_VALUE and PG_OOM_ADJUST_FILE
> to manage that if it's a problem.  (Recent kernels are alleged
> to usually do the right thing without that, though.)

Actually the problem here is likely that the Kubernetes Postgres pod was 
started with a memory limit. Disabling memory overcommit at the lost 
level will not help you if there is a memory limit set for the pod 
because that in turn sets memory.limit for the cgroup related to the pod 
and the oom killer will strike when memory.usage_in_bytes exceeds that 
value irrespective of the free memory at the host level. In these cases 
the oom_score_adj values don't end up mattering much.

This is a fairly complex topic -- I wrote a blog a few years ago which 
may or may not be out of date at this point:

https://www.crunchydata.com/blog/deep-postgresql-thoughts-the-linux-assassin

Additionally Jeremy Schneider wrote a more recent one that you might 
find helpful:

https://ardentperf.com/2024/09/22/kubernetes-requests-and-limits-for-postgres/

My quick and dirty recommendations:
1. Use cgroup v2 on the host if at all possible
2. Do not under any circumstances disable swap on the host. This is an
    anti-pattern unfortunately followed widely the last time I looked.
3. If nothing else, avoid setting a memory.limit on the cgroup. That
    will at least get you back to not getting whacked unless there is
    host level memory pressure. The blogs discuss how to do that with
    Kube pod settings.

HTH,

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



pgsql-admin by date:

Previous
From: Rui DeSousa
Date:
Subject: Re: Getting out ahead of OOM
Next
From: Achilleas Mantzios - cloud
Date:
Subject: People obsessed with docker - how can I help?