On 7/9/25 19:23, Andres Freund wrote:
> Hi,
>
> On 2025-07-09 12:55:51 -0400, Greg Burd wrote:
>> On Jul 9 2025, at 12:35 pm, Andres Freund <andres@anarazel.de> wrote:
>>
>>> FWIW, I've started to wonder if we shouldn't just get rid of the freelist
>>> entirely. While clocksweep is perhaps minutely slower in a single
>>> thread than
>>> the freelist, clock sweep scales *considerably* better [1]. As it's rather
>>> rare to be bottlenecked on clock sweep speed for a single thread
>>> (rather then
>>> IO or memory copy overhead), I think it's worth favoring clock sweep.
>>
>> Hey Andres, thanks for spending time on this. I've worked before on
>> freelist implementations (last one in LMDB) and I think you're onto
>> something. I think it's an innovative idea and that the speed
>> difference will either be lost in the noise or potentially entirely
>> mitigated by avoiding duplicate work.
>
> Agreed. FWIW, just using clock sweep actually makes things like DROP TABLE
> perform better because it doesn't need to maintain the freelist anymore...
>
>
>>> Also needing to switch between getting buffers from the freelist and
>>> the sweep
>>> makes the code more expensive. I think just having the buffer in the sweep,
>>> with a refcount / usagecount of zero would suffice.
>>
>> If you're not already coding this, I'll jump in. :)
>
> My experimental patch is literally a four character addition ;), namely adding
> "0 &&" to the relevant code in StrategyGetBuffer().
>
> Obviously a real patch would need to do some more work than that. Feel free
> to take on that project, I am not planning on tackling that in near term.
>
>
> There's other things around this that could use some attention. It's not hard
> to see clock sweep be a bottleneck in concurrent workloads - partially due to
> the shared maintenance of the clock hand. A NUMAed clock sweep would address
> that. However, we also maintain StrategyControl->numBufferAllocs, which is a
> significant contention point and would not necessarily be removed by a
> NUMAificiation of the clock sweep.
>
Wouldn't it make sense to partition the numBufferAllocs too, though? I
don't remember if my hacky experimental patch NUMA-partitioning did that
or I just thought about doing that, but why wouldn't that be enough?
Places that need the "total" count would have to sum the counters, but
it seemed to me most of the places would be fine with the "local" count
for that partition. If we also make sure to "sync" the clocksweeps so as
to not work on just a single partition, that might be enough ...
regards
--
Tomas Vondra