On Aug 17 2025, at 12:57 am, Thomas Munro <thomas.munro@gmail.com> wrote:
> On Sun, Aug 17, 2025 at 4:34 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>> Or if you don't like those odds, maybe it'd be OK to keep % but use it
>> rarely and without the CAS that can fail.
>
> ... or if we wanted to try harder to avoid %, could we relegate it to
> the unlikely CLOCK-went-all-the-way-around-again-due-to-unlucky-scheduling
> case, but use subtraction for the expected periodic overshoot?
>
> if (hand >= NBuffers)
> {
> hand = hand < Nbuffers * 2 ? hand - NBuffers : hand % NBuffers;
> /* Base value advanced by backend that overshoots by one tick. */
> if (hand == 0)
> pg_atomic_fetch_add_u64(&StrategyControl->ticks_base, NBuffers);
> }
>
Hi Tomas,
Thanks for all the ideas, I have tried out a few of them and a number of
other ideas. I've done a lot of measurement and had a few off channel
discussions about this and I think the best way to move forward is to
just focus on the removal of the freelist and not bother with the lock
or changing clock-sweep right now too much. So, the attached patch set
keeps the first two from the last set but drops the rest.
But wait, there's more...
As a *bonus* I've added a new third patch with some proposed changes to
spark discussions. As I researched experiences in the field at scale a
few other buffer management issues came to light. The one in particular
that I tried to address in this new patch 0003 has to do with very large
shared_buffers (NBuffers) and very large active datasets causing most
buffer usage counts to be at or near the max value (5). In these cases
the clock-sweep algorithm needs to perform NBuffers * 5 "ticks" before
identifying a buffer to evict. This also pollutes the completePasses
value used to inform the bgwriter where to start working.
So, in this patch I add per-backend buffer usage tracking and proactive
pressure management. Each tick of the hand can now decrement usage by a
calculated amount, not just 1, based on /hand-wavy-first-attempt at magic/.
The thing I'm sure this doesn't help with, and may in fact hurt, is
keeping frequently accessed buffers in the buffer pool. I imagine a two
tier approach to this where some small subset of buffers that are reused
frequently enough are not even considered by the clock-sweep algorithm.
Regardless, I feel the first two patches on this set address the
intention of this thread. I added patch 0003 just to start a
conversation, please chime in if any of this interests you. Maybe this
new patch should take on a life of its own in a new thread? If anyone
thinks this approach has some merit, I'll do that.
I look forward to thoughts on these idea, and hopefully to finding
someone willing to help me get the first two over the line.
best.
-greg