Re: Changing shared_buffers without restart - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Changing shared_buffers without restart |
Date | |
Msg-id | CAExHW5s1b-_9GFGGOxSVimrt8oe2Y0a7vRhENrd2BSo+0Tkp-A@mail.gmail.com Whole thread Raw |
In response to | Re: Changing shared_buffers without restart (Dmitry Dolgov <9erthalion6@gmail.com>) |
Responses |
Re: Changing shared_buffers without restart
|
List | pgsql-hackers |
On Fri, Apr 11, 2025 at 8:31 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > > > I think a relatively elegant solution is to extend ProcSignalBarrier > > > mechanism to track not only pss_barrierGeneration, as a sign that > > > everything was processed, but also something like > > > pss_barrierReceivedGeneration, indicating that the message was received > > > everywhere but not processed yet. That would be enough to allow > > > processes to wait until the resize message was received everywhere, then > > > use a global Barrier to wait until all processes are finished. It's > > > somehow similar to your proposal to use two signals, but has less > > > implementation overhead. > > > > The way it's implemented in v4 still has the disjoint group problem. > > Assume backends p1, p2, p3. All three of them are executing > > ProcessProcSignalBarrier(). All three of them updated > > pss_barrierReceivedGeneration > > > > /* The message is observed, record that */ > > pg_atomic_write_u64(&MyProcSignalSlot->pss_barrierReceivedGeneration, > > shared_gen); > > > > p1, p2 moved faster and reached following code from ProcessBarrierShmemResize() > > if (BarrierAttach(barrier) == SHMEM_RESIZE_REQUESTED) > > WaitForProcSignalBarrierReceived(pg_atomic_read_u64(&ShmemCtrl->Generation)); > > > > Since all the processes have received the barrier message, p1, p2 move > > ahead and go through all the next phases and finish resizing even > > before p3 gets a chance to call ProcessBarrierShmemResize() and attach > > itself to Barrier. This could happen because it processed some other > > ProcSignalBarrier message. p1 and p2 won't wait for p3 since it has > > not attached itself to the barrier. Once p1, p2 finish, p3 will attach > > itself to the barrier and resize buffers again - reinitializing the > > shared memory, which might has been already modified by p1 or p2. Boom > > - there's memory corruption. > > It won't reinitialize anything, since this logic is controlled by the > ShmemCtrl->NSharedBuffers, if it's already updated nothing will be > changed. Ah, I see it now if(pg_atomic_read_u32(&ShmemCtrl->NSharedBuffers) != NBuffers) { Thanks for the clarification. However, when we put back the patches to shrink buffers, we will evict the extra buffers, and shrink - if all the processes haven't participated in the barrier by then, some of them may try to access those buffers - re-installing them and then bad things can happen. > > About the race condition you mention, there is indeed a window between > receiving the ProcSignalBarrier and attaching to the global Barrier in > resize, but I don't think any process will be able to touch buffer pool > while inside this window. Even if it happens that the remapping itself > was blazing fast that this window was enough to make one process late > (e.g. if it was busy handling some other signal as you mention), as I've > showed above it shouldn't be a problem. > > I can experiment with this case though, maybe there is a way to > completely close this window to not thing about even potential > scenarios. The window may be small today but we have to make this future proof. Multiple ProcSignalBarrier messages may be processed in a single call to ProcessProcSignalBarrier() and if each of those takes as long as buffer resizing, the window will get bigger and bigger. So we have to close this window. > > > > * Shared memory address space is now reserved for future usage, making > > > shared memory segments clash (e.g. due to memory allocation) > > > impossible. There is a new GUC to control how much space to reserve, > > > which is called max_available_memory -- on the assumption that most of > > > the time it would make sense to set its value to the total amount of > > > memory on the machine. I'm open for suggestions regarding the name. > > > > With 0006 applied > > + /* Clean up some reserved space to resize into */ > > + if (munmap(m->shmem + m->shmem_size, new_size - m->shmem_size) == -1) > > ze, m->shmem))); > > ... snip ... > > + ptr = mremap(m->shmem, m->shmem_size, new_size, 0); > > > > We unmap the portion of reserved address space where the existing > > segment would expand into. As long as we are just expanding this will > > work. I am wondering how would this work for shrinking buffers? What > > scheme do you have in mind? > > I didn't like this part originally, and after changes to support hugetlb > I think it's worth it just to replace mremap with munmap/mmap. That way > there will be no such question, e.g. if a segment is getting shrinked > the unmapped area will again become a part of the reserved space. > I might have not noticed it, but are we putting two mappings one reserved and one allocated in the same address space, so that when the allocated mapping shrinks or expands, the reserved mapping continues to prohibit any other mapping from appearing there? I looked at some of the previous emails, but didn't find anything that describes how the reserved mapped space is managed. -- Best Wishes, Ashutosh Bapat
pgsql-hackers by date: