Re: [PATCH] Let's get rid of the freelist and the buffer_strategy_lock - Mailing list pgsql-hackers

From Greg Burd
Subject Re: [PATCH] Let's get rid of the freelist and the buffer_strategy_lock
Date
Msg-id 70C6A5B5-2A20-4D0B-BC73-EB09DD62D61C@getmailspring.com
Whole thread Raw
In response to Re: [PATCH] Let's get rid of the freelist and the buffer_strategy_lock  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-hackers
On Aug 17 2025, at 12:57 am, Thomas Munro <thomas.munro@gmail.com> wrote:

> On Sun, Aug 17, 2025 at 4:34 PM Thomas Munro <thomas.munro@gmail.com> wrote:
>> Or if you don't like those odds, maybe it'd be OK to keep % but use it
>> rarely and without the CAS that can fail.
>
> ... or if we wanted to try harder to avoid %, could we relegate it to
> the unlikely CLOCK-went-all-the-way-around-again-due-to-unlucky-scheduling
> case, but use subtraction for the expected periodic overshoot?
>
>    if (hand >= NBuffers)
>    {
>        hand = hand < Nbuffers * 2 ? hand - NBuffers : hand % NBuffers;
>        /* Base value advanced by backend that overshoots by one tick. */
>        if (hand == 0)
>            pg_atomic_fetch_add_u64(&StrategyControl->ticks_base, NBuffers);
>    }
>

Hi Tomas,

Thanks for all the ideas, I have tried out a few of them and a number of
other ideas.  I've done a lot of measurement and had a few off channel
discussions about this and I think the best way to move forward is to
just focus on the removal of the freelist and not bother with the lock
or changing clock-sweep right now too much.  So, the attached patch set
keeps the first two from the last set but drops the rest.

But wait, there's more...

As a *bonus* I've added a new third patch with some proposed changes to
spark discussions.  As I researched experiences in the field at scale a
few other buffer management issues came to light.  The one in particular
that I tried to address in this new patch 0003 has to do with very large
shared_buffers (NBuffers) and very large active datasets causing most
buffer usage counts to be at or near the max value (5).  In these cases
the clock-sweep algorithm needs to perform NBuffers * 5 "ticks" before
identifying a buffer to evict.  This also pollutes the completePasses
value used to inform the bgwriter where to start working.

So, in this patch I add per-backend buffer usage tracking and proactive
pressure management.  Each tick of the hand can now decrement usage by a
calculated amount, not just 1, based on /hand-wavy-first-attempt at magic/.

The thing I'm sure this doesn't help with, and may in fact hurt, is
keeping frequently accessed buffers in the buffer pool.  I imagine a two
tier approach to this where some small subset of buffers that are reused
frequently enough are not even considered by the clock-sweep algorithm.

Regardless, I feel the first two patches on this set address the
intention of this thread. I added patch 0003 just to start a
conversation, please chime in if any of this interests you.  Maybe this
new patch should take on a life of its own in a new thread?  If anyone
thinks this approach has some merit, I'll do that.

I look forward to thoughts on these idea, and hopefully to finding
someone willing to help me get the first two over the line.

best.

-greg

Attachment

pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Buffer locking is special (hints, checksums, AIO writes)
Next
From: Daniel Gustafsson
Date:
Subject: Re: Serverside SNI support in libpq