Re: Changing shared_buffers without restart - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Changing shared_buffers without restart
Date
Msg-id bwfpldgn2eugxkgmbxh5fypcmv766s7yhklxny2dydiflyhcuy@u5rzqddrvksr
Whole thread Raw
In response to Re: Changing shared_buffers without restart  (Jim Nasby <jnasby@upgrade.com>)
List pgsql-hackers
Hi,

On 2025-07-14 17:55:13 -0500, Jim Nasby wrote:
> I say that because unless you're running a workload that entirely fits in
> shared buffers, or a *really* small shared buffers compared to system
> memory, increasing shared buffers quickly gets into diminishing returns.

I don't think that's true, at all, today. And it certainly won't be true in a
world where we will be able to use direct_io for real workloads.

Particularly for write heavy workloads, the difference between a small buffer
pool and a large one can be *dramatic*, because the large buffer pool allows
most writes to be done by checkpointer (and thus largely sequentially) or by
backends and bgwriter (and thus largely randomly). Doing more writes
sequentially helps with short-term performance, but *particularly* helps with
sustained performance on SSDs. A larger buffer pool also reduces the *total*
number of writes dramatically, because the same buffer will often be dirtied
repeatedly within one checkpoint window.


r/w/ pgbench is a workload that *undersells* the benefit of a larger
shared_buffers, as each transaction is uncommonly small, making WAL flushes
much more of a bottleneck (the access pattern is too uniform, too). But even
for that the difference can be massive:


A scale 500 pgbench with 48 clients:
s_b= 512MB:
     averages 390MB/s of writes in steady state
     average TPS: 25072
s_b=8192MB:
     averages  48MB/s of writes in steady state
     average TPS: 47901
Nearly an order of magnitude difference in writes and nearly a 2x difference
in TPS.


25%, the advice we give for shared_buffers, is literally close to the worst
possible value. The only thing it maximizes is double buffering. While
removing information useful about what to cache for how long from both
postgres and the OS, leading to reduced cache hit rates.


> But since the default size for the other fixed sized areas is so much
> smaller than normal values for shared_buffers, increasing those areas can
> have a much, much larger impact on performance. (Especially for something
> like the MXID SLRUs.) I would certainly consider focusing on one of those
> areas before trying to tackle shared buffers.

I think that'd be a bad idea. There's simply no point in having the complexity
in place to allow for dynamically resizing a few megabytes of buffers. You
just configure them large enough (including probalby increasing some of the
defaults one of these years). Whereas you can't just do that for
shared_buffers, as we're talking really memory. Ahead of time you do not know
how much memory backends themselves need and the amount of memory in the
system may change.

Resizing shared_buffers is particularly important because it's becoming more
important to be able to dynamically increase/decrease the resources of a
running postgres instance to adjust for system load. Memory and CPUs can be
hot added/removed from VMs, but we need to utilize them...

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Laurenz Albe
Date:
Subject: Re: Fix PQport to never return NULL if the connection is valid
Next
From: Andres Freund
Date:
Subject: Re: Read-Write optimistic lock (Re: sinvaladt.c: remove msgnumLock, use atomic operations on maxMsgNum)