Re: Changing shared_buffers without restart - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: Changing shared_buffers without restart
Date
Msg-id CAMFBP2o9LXeB=znFzz8_0jL30jw=grd6m91Jv94yL6axpiJRgg@mail.gmail.com
Whole thread Raw
In response to Re: Changing shared_buffers without restart  (Dmitry Dolgov <9erthalion6@gmail.com>)
List pgsql-hackers
On Fri, Jul 4, 2025 at 9:42 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> On Fri, Jul 04, 2025 at 02:06:16AM +0200, Tomas Vondra wrote:

...
 
> 10) what to do about stuck resize?
>
> AFAICS the resize can get stuck for various reasons, e.g. because it
> can't evict pinned buffers, possibly indefinitely. Not great, it's not
> clear to me if there's a way out (canceling the resize) after a timeout,
> or something like that? Not great to start an "online resize" only to
> get stuck with all activity blocked for indefinite amount of time, and
> get to restart anyway.
>
> Seems related to Thomas' message [2], but AFAICS the patch does not do
> anything about this yet, right? What's the plan here?

It's another open discussion right now, with an idea to eventually allow
canceling after a timeout. I think canceling when stuck on buffer
eviction should be pretty straightforward (the evition must take place
before actual shared memory resize, so we know nothing has changed yet),
but in some other failure scenarios it would be harder (e.g. if one
backend is stuck resizing, while other have succeeded -- this would
require another round of synchronization and some way to figure out what
is the current status).

From a user standpoint, I would expect any kind of resize like this to be an online operation that happens in the background. If this is driven by a GUC I don't see how it could be anything else, but if something else is decided on I think it'd just be pain to require a session to stay connected until a resize was complete. (Of course we'd need to provide some means of monitoring a resize that was in-process, perhaps via a pg_stat_progress view or a system function.)

Also, while I haven't fully followed discussion about how to synchronize backends, I will say that I don't think it's at all unreasonable if a resize doesn't take full effect until every backend has at minimum ended any running transaction, or potentially even returned back to the equivalent of `PostgresMain()` for that type of backend. Obviously it'd be nicer to be more responsive than that, but I don't think the first version of the feature has to accomplish that.

For that matter, I also feel it'd be fine if the first version didn't even support shrinking shared buffers.

Finally, while shared buffers is the most visible target here, there are other shared memory settings that have a *much* smaller surface area, and in my experience are going to be much more valuable from a tuning perspective; notably wal_buffers and the MXID SLRUs (and possibly CLOG and subtrans). I say that because unless you're running a workload that entirely fits in shared buffers, or a *really* small shared buffers compared to system memory, increasing shared buffers quickly gets into diminishing returns. But since the default size for the other fixed sized areas is so much smaller than normal values for shared_buffers, increasing those areas can have a much, much larger impact on performance. (Especially for something like the MXID SLRUs.) I would certainly consider focusing on one of those areas before trying to tackle shared buffers.

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: fix organization wording in psql's \copyright command
Next
From: Peter Smith
Date:
Subject: Re: [WIP]Vertical Clustered Index (columnar store extension) - take2