Re: Changing shared_buffers without restart - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: Changing shared_buffers without restart |
Date | |
Msg-id | CAExHW5s=0Yz7ZhN6J+ORUBEpme-ZkfOC=KaDuD6LHtcwC8Abvg@mail.gmail.com Whole thread Raw |
In response to | Re: Changing shared_buffers without restart (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
List | pgsql-hackers |
Hi Tomas, Thanks for your detailed feedback. Sorry for replying late. On Fri, Jul 4, 2025 at 5:36 AM Tomas Vondra <tomas@vondra.me> wrote: > > v5-0008-Support-shrinking-shared-buffers.patch > > 1) Why is ShmemCtrl->evictor_pid reset in AnonymousShmemResize? Isn't > there a place starting it and waiting for it to complete? Why > shouldn't it do EvictExtraBuffers itself? > > 3) Seems a bit strange to do it from a random backend. Shouldn't it > be the responsibility of a process like checkpointer/bgwriter, or > maybe a dedicated dynamic bgworker? Can we even rely on a backend > to be available? I will answer these two together. I don't think we should rely on a random backend. But that's what the rest of the patches did and patches to support shrinking followed them. But AFAIK, Dmitry is working on a set of changes which will make a non-postmaster backend to be a coordinator for buffer pool resizing process. When that happens the same backend which initializes the expanded memory when expanding the buffer pool should also be responsible for evicting the buffers when shrinking the buffer pool. Will wait for Dmitry's next set of patches before making this change. > > 4) Unsolved issues with buffers pinned for a long time. Could be an > issue if the buffer is pinned indefinitely (e.g. cursor in idle > connection), and the resizing blocks some activity (new connections > or stuff like that). In such cases we should cancel the operation or kill that backend (per user preference) after a timeout with (user specified) timeout >= 0. We haven't yet figured out the details. I think the first version of the feature would just cancel the operation, if it encounters a pinned buffer. > 2) Isn't the change to BufferManagerShmemInit wrong? How do we know the > last buffer is still at the end of the freelist? Seems unlikely. > 6) It's not clear to me in what situations this triggers (in the call > to BufferManagerShmemInit) > > if (FirstBufferToInit < NBuffers) ... > Will answer these two together. As the comment says FirstBufferToInit < NBuffers indicates two situations: When FirstBufferToInit = 0, it's the first time the buffer pool is being initialized. Otherwise it indicates expanding the buffer pool, in which case the last buffer will be a newly initialized buffer. All newly initialized buffers are linked into the freelist one after the other in the increasing order of their buffer ids by code a few lines above. Now that the free buffer list has been removed, we don't need to worry about it. In the next set of patches, I have removed this code. > > v5-0009-Reinitialize-StrategyControl-after-resizing-buffe.patch > > 1) IMHO this should be included in the earlier resize/shrink patches, > I don't see a reason to keep it separate (assuming this is the > correct way, and the "init" is not). These patches are separate just because me and Dmitry developed them respectively. Once they are reviewed by Dmitry, we will squash them into a single patch. I am expecting that Dmitry's next patchset which will do significant changes to the synchronization will have a single patch for all code related to and consequential to resizing. > > 5) Funny that "AI suggests" something, but doesn't the block fail to > reset nextVictimBuffer of the clocksweep? It may point to a buffer > we're removing, and it'll be invalid, no? > The TODO no more applies. There's code to reset the clocksweep in a separate patch. Sorry for not removing it earlier. It will be removed in the next set of patches. > > 2) Doesn't StrategyPurgeFreeList already do some of this for the case > of shrinking memory? > > 3) Not great adding a bunch of static variables to bufmgr.c. Why do we > need to make "everything" static global? Isn't it enough to make > only the "valid" flag global? The rest can stay local, no? > > If everything needs to be global for some reason, could we at least > make it a struct, to group the fields, not just separate random > variables? And maybe at the top, not half-way throught the file? > > 4) Isn't the name BgBufferSyncAdjust misleading? It's not adjusting > anything, it's just invalidating the info about past runs. I think there's a bit of refactoring possible here. Setting up the BgBufferSync state, resetting it when bgwriter_lru_maxpages <= 0 and then re initializing it when bgwriter_lru_maxpages > 0, and actually performing the buffer sync is all packed into the same function BgBufferSync() right now. It makes this function harder to read. I think these functionalities should be separated into their own functions and use the appropriate one instead of BgBufferSyncAdjust(), whose name is misleading. The static global variables should all be packed into a structure which is passed as an argument to these functions. I need more time to study the code and refactor it that way. For now I have added a note to the commit message of this patch so that I will revisit it. I have renamed BgBufferSyncAdjust() to BgBufferSyncReset(). > > 5) I don't quite understand why BufferSync needs to do the dance with > delay_shmem_resize. I mean, we certainly should not run BufferSync > from the code that resizes buffers, right? Certainly not after the > eviction, from the part that actually rebuilds shmem structs etc. Right. But let me answer all three questions together. > So perhaps something could trigger resize while we're running the > BufferSync()? Isn't that a bit strange? If this flag is needed, it > seems more like a band-aid for some issue in the architecture. > > 6) Also, why should it be fine to get into situation that some of the > buffers might not be valid, during shrinking? I mean, why should > this check (pg_atomic_read_u32(&ShmemCtrl->NSharedBuffers) != NBuffers). > It seems better to ensure we never get into "sync" in a way that > might lead some of the buffers invalid. Seems way too lowlevel to > care about whether resize is happening. > > 7) I don't understand the new condition for "Execute the LRU scan". > Won't this stop LRU scan even in cases when we want it to happen? > Don't we want to scan the buffers in the remaining part (after > shrinking), for example? Also, we already checked this shmem flag at > the beginning of the function - sure, it could change (if some other > process modifies it), but does that make sense? Wouldn't it cause > problems if it can change at an arbitrary point while running the > BufferSync? IMHO just another sign it may not make sense to allow > this, i.e. buffer sync should not run during the "actual" resize. > ProcessBarrierShmemResize() which does the resizing is part of ProcessProcSignalBarrier() which in turn gets called from CHECK_FOR_INTERRUPTS(), which is called from multiple places, even from elog(). I am not able to find a call stack linking BgBufferSync() and ProcessProcSignalBarrier(). But I couldn't convince myself that it is true and will remain true in the future. I mean, the function loops through a large number of buffers and performs IO, both avenues to call CHECK_FOR_INTERRUPTS(). Hence that flag. Do you know what (part of code) guarantees that ProcessProcSignalBarrier() will never be called from BgBufferSync()? Note, resizing can not begin till delay_shmem_resize is cleared, so while BgBufferSync is executing, no buffer can be invalidated or no new buffers could be added. But at the cost of all other backends to wait till BgBufferSync finishes. We want to avoid that. The idea here is to make BgBufferSync stop as soon as it realises that the buffer resizing is "about to begin". But I think the condition looks wrong. I think the right condition would be NBufferPending != NBuffers or NBuffersOld. AFAIK, Dmitry is working on consolidating NBuffers* variables as you have requested elsewhere. Better even if we could somehow set a flag in shared memory indicating that the buffer resizing is "about to begin" and BgBufferSync() checks that flag. So I will wait for him to make that change and then change this condition. > > v5-0010-Additional-validation-for-buffer-in-the-ring.patch > > 1) So the problem is we might create a ring before shrinking shared > buffers, and then GetBufferFromRing will see bogus buffers? OK, but > we should be more careful with these checks, otherwise we'll miss > real issues when we incorrectly get an invalid buffer. Can't the > backends do this only when they for sure know we did shrink the > shared buffers? Or maybe even handle that during the barrier? > > 2) IMHO a sign there's the "transitions" between different NBuffers > values may not be clear enough, and we're allowing stuff to happen > in the "blurry" area. I think that's likely to cause bugs (it did > cause issues for the online checksums patch, I think). > I think you are right, that this might hide some bugs. Just like we remove buffers to be shrunk from freelist only once, I wanted each backend to remove them buffer rings only once. But I couldn't find a way to make all the buffer rings for a given backend available to the barrier handling code. The rings are stored in Scan objects, which seem local to the executor nodes. Is there a way to make them available to barrier handling code (even if it has to walk an execution tree, let's say)? If there would have been only one scan, we could have set a flag after shrinking, let GetBufferFromRing() purge all invalid buffers once when flag is true and reset the flag. But there can be more than one scan happening and we don't know how many there are and when all of them have finished calling GetBufferFromRing() after shrinking. Suggestions to do this only once are welcome. I will send the next set of patches with my next email. -- Best Wishes, Ashutosh Bapat
pgsql-hackers by date: