Re: Changing shared_buffers without restart - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: Changing shared_buffers without restart
Date
Msg-id CAExHW5s1b-_9GFGGOxSVimrt8oe2Y0a7vRhENrd2BSo+0Tkp-A@mail.gmail.com
Whole thread Raw
In response to Re: Changing shared_buffers without restart  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: Changing shared_buffers without restart
List pgsql-hackers
On Fri, Apr 11, 2025 at 8:31 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
>
> > > I think a relatively elegant solution is to extend ProcSignalBarrier
> > > mechanism to track not only pss_barrierGeneration, as a sign that
> > > everything was processed, but also something like
> > > pss_barrierReceivedGeneration, indicating that the message was received
> > > everywhere but not processed yet. That would be enough to allow
> > > processes to wait until the resize message was received everywhere, then
> > > use a global Barrier to wait until all processes are finished.  It's
> > > somehow similar to your proposal to use two signals, but has less
> > > implementation overhead.
> >
> > The way it's implemented in v4 still has the disjoint group problem.
> > Assume backends p1, p2, p3. All three of them are executing
> > ProcessProcSignalBarrier(). All three of them updated
> > pss_barrierReceivedGeneration
> >
> > /* The message is observed, record that */
> > pg_atomic_write_u64(&MyProcSignalSlot->pss_barrierReceivedGeneration,
> > shared_gen);
> >
> > p1, p2 moved faster and reached following code from ProcessBarrierShmemResize()
> > if (BarrierAttach(barrier) == SHMEM_RESIZE_REQUESTED)
> >   WaitForProcSignalBarrierReceived(pg_atomic_read_u64(&ShmemCtrl->Generation));
> >
> > Since all the processes have received the barrier message, p1, p2 move
> > ahead and go through all the next phases and finish resizing even
> > before p3 gets a chance to call ProcessBarrierShmemResize() and attach
> > itself to Barrier. This could happen because it processed some other
> > ProcSignalBarrier message. p1 and p2 won't wait for p3 since it has
> > not attached itself to the barrier. Once p1, p2 finish, p3 will attach
> > itself to the barrier and resize buffers again - reinitializing the
> > shared memory, which might has been already modified by p1 or p2. Boom
> > - there's memory corruption.
>
> It won't reinitialize anything, since this logic is controlled by the
> ShmemCtrl->NSharedBuffers, if it's already updated nothing will be
> changed.

Ah, I see it now
if(pg_atomic_read_u32(&ShmemCtrl->NSharedBuffers) != NBuffers)
{

Thanks for the clarification.

However, when we put back the patches to shrink buffers, we will evict
the extra buffers, and shrink - if all the processes haven't
participated in the barrier by then, some of them may try to access
those buffers - re-installing them and then bad things can happen.

>
> About the race condition you mention, there is indeed a window between
> receiving the ProcSignalBarrier and attaching to the global Barrier in
> resize, but I don't think any process will be able to touch buffer pool
> while inside this window. Even if it happens that the remapping itself
> was blazing fast that this window was enough to make one process late
> (e.g. if it was busy handling some other signal as you mention), as I've
> showed above it shouldn't be a problem.
>
> I can experiment with this case though, maybe there is a way to
> completely close this window to not thing about even potential
> scenarios.

The window may be small today but we have to make this future proof.
Multiple ProcSignalBarrier messages may be processed in a single call
to ProcessProcSignalBarrier() and if each of those takes as long as
buffer resizing, the window will get bigger and bigger. So we have to
close this window.

>
> > > * Shared memory address space is now reserved for future usage, making
> > >   shared memory segments clash (e.g. due to memory allocation)
> > >   impossible.  There is a new GUC to control how much space to reserve,
> > >   which is called max_available_memory -- on the assumption that most of
> > >   the time it would make sense to set its value to the total amount of
> > >   memory on the machine. I'm open for suggestions regarding the name.
> >
> > With 0006 applied
> > + /* Clean up some reserved space to resize into */
> > + if (munmap(m->shmem + m->shmem_size, new_size - m->shmem_size) == -1)
> > ze, m->shmem)));
> > ... snip ...
> > + ptr = mremap(m->shmem, m->shmem_size, new_size, 0);
> >
> > We unmap the portion of reserved address space where the existing
> > segment would expand into. As long as we are just expanding this will
> > work. I am wondering how would this work for shrinking buffers? What
> > scheme do you have in mind?
>
> I didn't like this part originally, and after changes to support hugetlb
> I think it's worth it just to replace mremap with munmap/mmap. That way
> there will be no such question, e.g. if a segment is getting shrinked
> the unmapped area will again become a part of the reserved space.
>

I might have not noticed it, but are we putting two mappings one
reserved and one allocated in the same address space, so that when the
allocated mapping shrinks or expands, the reserved mapping continues
to prohibit any other mapping from appearing there? I looked at some
of the previous emails, but didn't find anything that describes how
the reserved mapped space is managed.

--
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Logical Replication of sequences
Next
From: Kashif Zeeshan
Date:
Subject: Re: New committer: Jacob Champion