Re: Adding basic NUMA awareness - Mailing list pgsql-hackers

From Burd, Greg
Subject Re: Adding basic NUMA awareness
Date
Msg-id 7AA27B4C-E1C0-4608-AC59-25218C59991A@burd.me
Whole thread Raw
In response to Re: Adding basic NUMA awareness  ("Burd, Greg" <greg@burd.me>)
List pgsql-hackers

> On Jul 10, 2025, at 8:13 AM, Burd, Greg <greg@burd.me> wrote:
>
>
>> On Jul 9, 2025, at 1:23 PM, Andres Freund <andres@anarazel.de> wrote:
>>
>> Hi,
>>
>> On 2025-07-09 12:55:51 -0400, Greg Burd wrote:
>>> On Jul 9 2025, at 12:35 pm, Andres Freund <andres@anarazel.de> wrote:
>>>
>>>> FWIW, I've started to wonder if we shouldn't just get rid of the freelist
>>>> entirely. While clocksweep is perhaps minutely slower in a single
>>>> thread than
>>>> the freelist, clock sweep scales *considerably* better [1]. As it's rather
>>>> rare to be bottlenecked on clock sweep speed for a single thread
>>>> (rather then
>>>> IO or memory copy overhead), I think it's worth favoring clock sweep.
>>>
>>> Hey Andres, thanks for spending time on this.  I've worked before on
>>> freelist implementations (last one in LMDB) and I think you're onto
>>> something.  I think it's an innovative idea and that the speed
>>> difference will either be lost in the noise or potentially entirely
>>> mitigated by avoiding duplicate work.
>>
>> Agreed. FWIW, just using clock sweep actually makes things like DROP TABLE
>> perform better because it doesn't need to maintain the freelist anymore...
>>
>>
>>>> Also needing to switch between getting buffers from the freelist and
>>>> the sweep
>>>> makes the code more expensive.  I think just having the buffer in the sweep,
>>>> with a refcount / usagecount of zero would suffice.
>>>
>>> If you're not already coding this, I'll jump in. :)
>>
>> My experimental patch is literally a four character addition ;), namely adding
>> "0 &&" to the relevant code in StrategyGetBuffer().
>>
>> Obviously a real patch would need to do some more work than that.  Feel free
>> to take on that project, I am not planning on tackling that in near term.
>>
>
> I started on this last night, making good progress.  Thanks for the inspiration.  I'll create a new thread to track
thework and cross-reference when I have something reasonable to show (hopefully later today). 
>
>> There's other things around this that could use some attention. It's not hard
>> to see clock sweep be a bottleneck in concurrent workloads - partially due to
>> the shared maintenance of the clock hand. A NUMAed clock sweep would address
>> that.
>
> Working on it.

For archival sake, and to tie up loose ends I'll link from here to a new thread I just started that proposes the
removalof the freelist and the buffer_strategy_lock [1]. 

That patch set doesn't address any NUMA-related tasks directly, but it should remove some pain when working in that
directionby removing code that requires partitioning and locking and... 

best.

-greg

[1] https://postgr.es/m/E2D6FCDC-BE98-4F95-B45E-699C3E17BA10@burd.me




pgsql-hackers by date:

Previous
From: Jacob Champion
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER
Next
From: "DINESH NAIR"
Date:
Subject: Re: [Question] Window Function Results without ORDER BY Clause