Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet - Mailing list pgsql-performance

From Joe Conway
Subject Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet
Date
Msg-id 3633df47-5693-4871-9e12-bfbb68dd3313@joeconway.com
Whole thread Raw
In response to Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet  (Priya V <mailme0216@gmail.com>)
Responses Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet
List pgsql-performance
On 8/5/25 13:01, Priya V wrote:
> *Environment:*
>     *PostgreSQL Versions:* Mix of 13.13 and 15.12 (upgrades in progress
>     to be at 15.12 currently both are actively in use)

PostgreSQL 13 end of life after November 13, 2025

>     *OS / Kernel:* RHEL 7 & RHEL 8 variants, kernels in the 4.14–4.18 range

RHEL 7 has been EOL for quite a while now. Note that you have to watch 
out for collation issues/corrupted indexes after OS upgrades due to 
collations changing with newer glibc versions.

>     *Swap:* Currently none

bad idea

>     *Workload:* Highly mixed — OLTP-style internal apps with
>     unpredictable query patterns and connection counts
> 
>     *Goal:* Uniform, safe memory settings across the fleet to avoid
>     kernel or database instability

> We’re considering:
>     *|vm.overcommit_memory = 2|* for strict accounting

yes

>     Increasing |vm.overcommit_ratio| from 50 → 80 or 90 to better
>     reflect actual PostgreSQL usage (e.g., |work_mem| reservations that
>     aren’t fully used)

work_mem does not reserve memory -- it is a maximum that might be used 
in memory for a particular operation

> *Our questions for those running large PostgreSQL fleets:*
>  1.
>     What |overcommit_ratio| do you find safe for PostgreSQL without
>     causing kernel memory crunches?

Read this:
https://www.cybertec-postgresql.com/en/what-you-should-know-about-linux-memory-overcommit-in-postgresql/

>  2.
>     Do you prefer |overcommit_memory = 1| or |= 2| for production stability?

Use overcommit_memory = 2 for production stability

>  3.
>     How much swap (if any) do you keep in large-memory servers where
>     PostgreSQL is the primary workload? Is having swap configured a good
>     idea or not ?

You don't necessary need a large amount of swap, but you definitely 
should not disable it.

Some background on that:
https://chrisdown.name/2018/01/02/in-defence-of-swap.html

>  4.
>     Any real-world cases where kernel accounting was too strict or too
>     loose for PostgreSQL?

In my experience the biggest issues are when postgres is running in a 
memory constrained cgroup. If you want to constrain memory with cgroups, 
use cgroup v2 (not 1) and use memory.high to constrain it, not memory.max.

>  5. What settings to go with if we are not planning on using swap ?

IMHO do not disable swap on Linux, at least not on production, ever.

> We’d like to avoid both extremes:
>     Too low a ratio → PostgreSQL backends failing allocations even with
>     free RAM

Have you actually seen this or are you theorizing?

>     Too high a ratio → OOM killer terminating PostgreSQL under load spikes

If overcommit_memory = 2, overcommit_ratio is reasonable (less than 100, 
maybe 80 or so as you suggested), and swap is not disabled, and you are 
not running in a memory constrained cgroup, I would be very surprised if 
you will ever get hit by the OOM killer. And if you do, things are so 
bad the database was probably dying anyway.

HTH,

-- 
Joe Conway
PostgreSQL Contributors Team
Amazon Web Services: https://aws.amazon.com



pgsql-performance by date:

Previous
From: Priya V
Date:
Subject: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet
Next
From: Frits Hoogland
Date:
Subject: Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet