Re: Changing shared_buffers without restart - Mailing list pgsql-hackers

From Ni Ku
Subject Re: Changing shared_buffers without restart
Date
Msg-id CAPuPUJz4EB5NU1ah3NH9HjBaq-dCfJMAgmou7YY5sofQ-xBbQQ@mail.gmail.com
Whole thread Raw
In response to Re: Changing shared_buffers without restart  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Responses Re: Changing shared_buffers without restart
List pgsql-hackers
Dmitry / Ashutosh,
Thanks for the patch set. I've been doing some testing with it and in particular want to see if this solution would work with hugepage bufferpool.

I ran some simple tests (outside of PG) on linux kernel v6.1, which has this commit that added some hugepage support to mremap (
https://patchwork.kernel.org/project/linux-mm/patch/20211013195825.3058275-1-almasrymina@google.com/).

From reading the kernel code and testing, for a hugepage-backed mapping it seems mremap supports only shrinking but not growing. Further, for shrinking, what I observed is that after mremap is called the hugepage memory
is not released back to the OS, rather it's released when the fd is closed (or when the memory is unmapped for a mapping created with MAP_ANONYMOUS).
I'm not sure if this behavior is expected, but being able to release memory back to the OS immediately after mremap would be important for use cases such as supporting "serverless" PG instances on the cloud.

I'm no expert in the linux kernel so I could be missing something. It'd be great if you or somebody can comment on these observations and whether this mremap-based solution would work with hugepage bufferpool.

I also attached the test program in case someone can spot I did something wrong.

Regards,

Jack Ng

On Tue, Mar 18, 2025 at 11:02 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Tue, Feb 25, 2025 at 3:22 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>
> > On Fri, Oct 18, 2024 at 09:21:19PM GMT, Dmitry Dolgov wrote:
> > TL;DR A PoC for changing shared_buffers without PostgreSQL restart, via
> > changing shared memory mapping layout. Any feedback is appreciated.
>
> Hi,
>
> Here is a new version of the patch, which contains a proposal about how to
> coordinate shared memory resizing between backends. The rest is more or less
> the same, a feedback about coordination is appreciated. It's a lot to read, but
> the main difference is about:

Thanks Dmitry for the summary.

>
> 1. Allowing to decouple a GUC value change from actually applying it, sort of a
> "pending" change. The idea is to let a custom logic be triggered on an assign
> hook, and then take responsibility for what happens later and how it's going to
> be applied. This allows to use regular GUC infrastructure in cases where value
> change requires some complicated processing. I was trying to make the change
> not so invasive, plus it's missing GUC reporting yet.
>
> 2. Shared memory resizing patch became more complicated thanks to some
> coordination between backends. The current implementation was chosen from few
> more or less equal alternatives, which are evolving along following lines:
>
> * There should be one "coordinator" process overseeing the change. Having
> postmaster to fulfill this role like in this patch seems like a natural idea,
> but it poses certain challenges since it doesn't have locking infrastructure.
> Another option would be to elect a single backend to be a coordinator, which
> will handle the postmaster as a special case. If there will ever be a
> "coordinator" worker in Postgres, that would be useful here.
>
> * The coordinator uses EmitProcSignalBarrier to reach out to all other backends
> and trigger the resize process. Backends join a Barrier to synchronize and wait
> untill everyone is finished.
>
> * There is some resizing state stored in shared memory, which is there to
> handle backends that were for some reason late or didn't receive the signal.
> What to store there is open for discussion.
>
> * Since we want to make sure all processes share the same understanding of what
> NBuffers value is, any failure is mostly a hard stop, since to rollback the
> change coordination is needed as well and sounds a bit too complicated for now.
>

I think we should add a way to monitor the progress of resizing; at
least whether resizing is complete and whether the new GUC value is in
effect.

> We've tested this change manually for now, although it might be useful to try
> out injection points. The testing strategy, which has caught plenty of bugs,
> was simply to run pgbench workload against a running instance and change
> shared_buffers on the fly. Some more subtle cases were verified by manually
> injecting delays to trigger expected scenarios.

I have shared a script with my changes but it's far from being full
testing. We will need to use injection points to test specific
scenarios.

--
Best Wishes,
Ashutosh Bapat




Attachment

pgsql-hackers by date:

Previous
From: Ilia Evdokimov
Date:
Subject: Re: Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment
Next
From: Ravi
Date:
Subject: Re: [PERF] Improve Cardinality Estimation for Joins with GROUP BY Having Single Clause