Re: Changing shared_buffers without restart - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Changing shared_buffers without restart |
Date | |
Msg-id | CAExHW5uJujFkJzk_YgvFDKWknBBGoBpjvWB1n+k3szSN-xwN5Q@mail.gmail.com Whole thread Raw |
In response to | Re: Changing shared_buffers without restart (Dmitry Dolgov <9erthalion6@gmail.com>) |
List | pgsql-hackers |
On Tue, Feb 25, 2025 at 3:22 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > On Fri, Oct 18, 2024 at 09:21:19PM GMT, Dmitry Dolgov wrote: > > TL;DR A PoC for changing shared_buffers without PostgreSQL restart, via > > changing shared memory mapping layout. Any feedback is appreciated. > > Hi, > > Here is a new version of the patch, which contains a proposal about how to > coordinate shared memory resizing between backends. The rest is more or less > the same, a feedback about coordination is appreciated. It's a lot to read, but > the main difference is about: Thanks Dmitry for the summary. > > 1. Allowing to decouple a GUC value change from actually applying it, sort of a > "pending" change. The idea is to let a custom logic be triggered on an assign > hook, and then take responsibility for what happens later and how it's going to > be applied. This allows to use regular GUC infrastructure in cases where value > change requires some complicated processing. I was trying to make the change > not so invasive, plus it's missing GUC reporting yet. > > 2. Shared memory resizing patch became more complicated thanks to some > coordination between backends. The current implementation was chosen from few > more or less equal alternatives, which are evolving along following lines: > > * There should be one "coordinator" process overseeing the change. Having > postmaster to fulfill this role like in this patch seems like a natural idea, > but it poses certain challenges since it doesn't have locking infrastructure. > Another option would be to elect a single backend to be a coordinator, which > will handle the postmaster as a special case. If there will ever be a > "coordinator" worker in Postgres, that would be useful here. > > * The coordinator uses EmitProcSignalBarrier to reach out to all other backends > and trigger the resize process. Backends join a Barrier to synchronize and wait > untill everyone is finished. > > * There is some resizing state stored in shared memory, which is there to > handle backends that were for some reason late or didn't receive the signal. > What to store there is open for discussion. > > * Since we want to make sure all processes share the same understanding of what > NBuffers value is, any failure is mostly a hard stop, since to rollback the > change coordination is needed as well and sounds a bit too complicated for now. > I think we should add a way to monitor the progress of resizing; at least whether resizing is complete and whether the new GUC value is in effect. > We've tested this change manually for now, although it might be useful to try > out injection points. The testing strategy, which has caught plenty of bugs, > was simply to run pgbench workload against a running instance and change > shared_buffers on the fly. Some more subtle cases were verified by manually > injecting delays to trigger expected scenarios. I have shared a script with my changes but it's far from being full testing. We will need to use injection points to test specific scenarios. -- Best Wishes, Ashutosh Bapat
pgsql-hackers by date: