Re: Adding basic NUMA awareness - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Adding basic NUMA awareness |
Date | |
Msg-id | b58e54ad-d9bb-435d-8dab-145619322e3a@vondra.me Whole thread Raw |
In response to | Re: Adding basic NUMA awareness (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>) |
Responses |
Re: Adding basic NUMA awareness
|
List | pgsql-hackers |
On 7/2/25 13:37, Ashutosh Bapat wrote: > On Wed, Jul 2, 2025 at 12:37 AM Tomas Vondra <tomas@vondra.me> wrote: >> >> >> 3) v1-0003-freelist-Don-t-track-tail-of-a-freelist.patch >> >> Minor optimization. Andres noticed we're tracking the tail of buffer >> freelist, without using it. So the patch removes that. >> > > The patches for resizing buffers use the lastFreeBuffer to add new > buffers to the end of free list when expanding it. But we could as > well add it at the beginning of the free list. > > This patch seems almost independent of the rest of the patches. Do you > need it in the rest of the patches? I understand that those patches > don't need to worry about maintaining lastFreeBuffer after this patch. > Is there any other effect? > > If we are going to do this, let's do it earlier so that buffer > resizing patches can be adjusted. > My patches don't particularly rely on this bit, it would work even with lastFreeBuffer. I believe Andres simply noticed the current code does not use lastFreeBuffer, it just maintains is, so he removed that as an optimization. I don't know how significant is the improvement, but if it's measurable we could just do that independently of our patches. >> >> There's also the question how this is related to other patches affecting >> shared memory - I think the most relevant one is the "shared buffers >> online resize" by Ashutosh, simply because it touches the shared memory. > > I have added Dmitry to this thread since he has written most of the > shared memory handling code. > Thanks. >> >> I don't think the splitting would actually make some things simpler, or >> maybe more flexible - in particular, it'd allow us to enable huge pages >> only for some regions (like shared buffers), and keep the small pages >> e.g. for PGPROC. So that'd be good. > > The resizing patches split the shared buffer related structures into > separate memory segments. I think that itself will help enabling huge > pages for some regions. Would that help in your case? > Indirectly. My patch can work just fine with a single segment, but being able to enable huge pages only for some of the segments seems better. >> >> But there'd also need to be some logic to "rework" how shared buffers >> get mapped to NUMA nodes after resizing. It'd be silly to start with >> memory on 4 nodes (25% each), resize shared buffers to 50% and end up >> with memory only on 2 of the nodes (because the other 2 nodes were >> originally assigned the upper half of shared buffers). >> >> I don't have a clear idea how this would be done, but I guess it'd >> require a bit of code invoked sometime after the resize. It'd already >> need to rebuild the freelists in some way, I guess. > > Yes, there's code to build the free list. I think we will need code to > remap the buffers and buffer descriptor. > Right. The good thing is that's just "advisory" information, it doesn't break anything if it's temporarily out of sync. We don't need to "stop" everything to remap the buffers to other nodes, or anything like that. Or at least I think so. It's one thing to "flip" the target mapping (determining which node a buffer should be on), and actually migrating the buffers. The first part can be done instantaneously, the second part can happen in the background over a longer time period. I'm not sure how you're rebuilding the freelist. Presumably it can contain buffers that are no longer valid (after shrinking). How is that handled to not break anything? I think the NUMA variant would do exactly the same thing, except that there's multiple lists. regards -- Tomas Vondra
pgsql-hackers by date: