Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet - Mailing list pgsql-performance

From Priya V
Subject Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet
Date
Msg-id CAFsZ43xFxjSiONwRccXBQXZrPRd+Lh7XAkSVEG1ai165xPcoDA@mail.gmail.com
Whole thread Raw
Responses Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet
List pgsql-performance
Hello Postgres community,

We operate a large PostgreSQL fleet (~15,000 databases) on dedicated Linux hosts.
Each host runs multiple PostgreSQL instances (multi-instance setup, not just multiple DBs inside one instance).

Environment:

  • PostgreSQL Versions: Mix of 13.13 and 15.12 (upgrades in progress to be at 15.12 currently both are actively in use)

  • OS / Kernel: RHEL 7 & RHEL 8 variants, kernels in the 4.14–4.18 range

  • RAM: 256 GiB (varies slightly)

  • Swap: Currently none

  • Workload: Highly mixed — OLTP-style internal apps with unpredictable query patterns and connection counts

  • Goal: Uniform, safe memory settings across the fleet to avoid kernel or database instability

We’re reviewing vm.overcommit_* settings because we’ve seen conflicting guidance:

  • vm.overcommit_memory = 2 gives predictability but can reject allocations early

  • vm.overcommit_memory = 1 is more flexible but risks OOM kills if many backends hit peak memory usage at once

We’re considering:

  • vm.overcommit_memory = 2 for strict accounting

  • Increasing vm.overcommit_ratio from 50 → 80 or 90 to better reflect actual PostgreSQL usage (e.g., work_mem reservations that aren’t fully used)

Our questions for those running large PostgreSQL fleets:

  1. What overcommit_ratio do you find safe for PostgreSQL without causing kernel memory crunches?

  2. Do you prefer overcommit_memory = 1 or = 2 for production stability?

  3. How much swap (if any) do you keep in large-memory servers where PostgreSQL is the primary workload? Is having swap configured a good idea or not ? 

  4. Any real-world cases where kernel accounting was too strict or too loose for PostgreSQL?

  5. What settings to go with if we are not planning on using swap ?

We’d like to avoid both extremes:

  • Too low a ratio → PostgreSQL backends failing allocations even with free RAM

  • Too high a ratio → OOM killer terminating PostgreSQL under load spikes

Any operational experiences, tuning recommendations, or kernel/PG interaction pitfalls would be very helpful.

TIA



pgsql-performance by date:

Previous
From: Bruce Momjian
Date:
Subject: Making Postgres slower
Next
From: Joe Conway
Date:
Subject: Re: Safe vm.overcommit_ratio for Large Multi-Instance PostgreSQL Fleet