Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet - Mailing list pgsql-performance
From | Priya V |
---|---|
Subject | Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet |
Date | |
Msg-id | CAFsZ43xF00OHNg6d8iD0zd0OCpWe8+r9p4A+HAupo0jxOdYgpQ@mail.gmail.com Whole thread Raw |
In response to | Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet (Frits Hoogland <frits.hoogland@gmail.com>) |
Responses |
Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet
|
List | pgsql-performance |
Hi Frits, Joe,
Thank you both for you insights
Current situation:
cat /proc/sys/vm/overcommit_memory
0
cat /proc/sys/vm/overcommit_ratio
50
$ cat /proc/sys/vm/swappiness
60
0
cat /proc/sys/vm/overcommit_ratio
50
$ cat /proc/sys/vm/swappiness
60
Workload: Multi-tenant PostgreSQL
uname -r
4.18.0-477.83.1.el8_8.x86_64
4.18.0-477.83.1.el8_8.x86_64
free -h
total used free shared buff/cache available
Mem: 249Gi 4.3Gi 1.7Gi 22Gi 243Gi 221Gi
Swap: 0B 0B 0B
total used free shared buff/cache available
Mem: 249Gi 4.3Gi 1.7Gi 22Gi 243Gi 221Gi
Swap: 0B 0B 0B
if we set overcommit_memory = 2, what should we set the overcommit_ration value to ? Can you pls suggest ?
Is there a rule of thumb to go with ?
Our goal is to not run into OOM issues, no memory wastage and also not starve kernel ?
Thanks!
On Wed, Aug 6, 2025 at 3:47 AM Frits Hoogland <frits.hoogland@gmail.com> wrote:
Joe,Can you name any technical reason why not having swap for a database is an actual bad idea?Memory always is limited. Swap was invented to overcome a situation where the (incidental) memory usage of paged in memory was could (regularly) get higher than physical memory would allow, and thus have the (clear) workaround of having swap to 'cushion' the memory shortage issue by allowing a "second level" memory storage on disk.Still, this does not making memory unlimited. Swap extends the physical memory available with the amount of swap. There still is a situation where you can run out of memory when swap is added, simply by paging in more memory than physical memory and swap.Today, most systems are not memory constrained anymore, or: it is possible to get a server with enough physical memory to hold your common needed total memory need.And given the latency sensitive nature of databases in general, which includes postgres, for any serious deployment you should get a server with enough memory to host your workload, and configure postgres not to overload the memory.If you do oversubscribe on (physical) memory, you will get pain somewhere, there is no way around that.The article in defense of swap in essence is saying that if you happen to oversubscribe on memory, sharing the pain between anonymous and file is better.I would say you are already in a bad place if that happens, which is especially bad for databases, and databases should allow you to make memory usage predictable.However, what I found is that with 4+ kernels (4.18 to be precise; rhel 8), the kernel can try to favour file pages in certain situations making anonymous memory getting paged out even if swappiness is set to 1 or 0, and if there is a wealth of inactive file memory. It seems to have to do with workingset protection(?) mechanisms, but given the lack of clear statistics I can't be sure about that. What it does lead to in my situations is a constant rate of swapping in and out in certain situations, whilst there is no technical reason for linux to do so because there is enough available memory.My point of view has been that vm.overcommit_memory set to 2 was the way to go, because that allows linux to limit based on a set limit on allocation time, which guarantees way to make the database never run out of memory.it does guarantees linux to never run out of memory, absolutely.However, this limit is hard, and is applied for the process at both usermode and system mode (kernel level), and thus can enforce not providing memory at times where it's not safe to do so, and thus corrupt execution. I have to be honest, I have not seen this myself, but trustworthy sources have reported this repeatedly, which I am inclined to believe. This means postgres execution can corrupt/terminate in unlucky situations, which is impacts availability.Frits HooglandOn 5 Aug 2025, at 20:52, Joe Conway <mail@joeconway.com> wrote:On 8/5/25 13:01, Priya V wrote:*Environment:*
*PostgreSQL Versions:* Mix of 13.13 and 15.12 (upgrades in progress
to be at 15.12 currently both are actively in use)
PostgreSQL 13 end of life after November 13, 2025*OS / Kernel:* RHEL 7 & RHEL 8 variants, kernels in the 4.14–4.18 range
RHEL 7 has been EOL for quite a while now. Note that you have to watch out for collation issues/corrupted indexes after OS upgrades due to collations changing with newer glibc versions.*Swap:* Currently none
bad idea*Workload:* Highly mixed — OLTP-style internal apps with
unpredictable query patterns and connection counts
*Goal:* Uniform, safe memory settings across the fleet to avoid
kernel or database instabilityWe’re considering:
*|vm.overcommit_memory = 2|* for strict accounting
yesIncreasing |vm.overcommit_ratio| from 50 → 80 or 90 to better
reflect actual PostgreSQL usage (e.g., |work_mem| reservations that
aren’t fully used)
work_mem does not reserve memory -- it is a maximum that might be used in memory for a particular operation*Our questions for those running large PostgreSQL fleets:*
1.
What |overcommit_ratio| do you find safe for PostgreSQL without
causing kernel memory crunches?
Read this:
https://www.cybertec-postgresql.com/en/what-you-should-know-about-linux-memory-overcommit-in-postgresql/2.
Do you prefer |overcommit_memory = 1| or |= 2| for production stability?
Use overcommit_memory = 2 for production stability3.
How much swap (if any) do you keep in large-memory servers where
PostgreSQL is the primary workload? Is having swap configured a good
idea or not ?
You don't necessary need a large amount of swap, but you definitely should not disable it.
Some background on that:
https://chrisdown.name/2018/01/02/in-defence-of-swap.html4.
Any real-world cases where kernel accounting was too strict or too
loose for PostgreSQL?
In my experience the biggest issues are when postgres is running in a memory constrained cgroup. If you want to constrain memory with cgroups, use cgroup v2 (not 1) and use memory.high to constrain it, not memory.max.5. What settings to go with if we are not planning on using swap ?
IMHO do not disable swap on Linux, at least not on production, ever.We’d like to avoid both extremes:
Too low a ratio → PostgreSQL backends failing allocations even with
free RAM
Have you actually seen this or are you theorizing?Too high a ratio → OOM killer terminating PostgreSQL under load spikes
If overcommit_memory = 2, overcommit_ratio is reasonable (less than 100, maybe 80 or so as you suggested), and swap is not disabled, and you are not running in a memory constrained cgroup, I would be very surprised if you will ever get hit by the OOM killer. And if you do, things are so bad the database was probably dying anyway.
HTH,
--
Joe Conway
PostgreSQL Contributors Team
Amazon Web Services: https://aws.amazon.com
pgsql-performance by date: