Hi,
On 2025-08-07 11:24:18 +0200, Tomas Vondra wrote:
> The patch does a much simpler thing - treat the weight as a "budget",
> i.e. number of buffers to allocate before proceeding to the "next"
> partition. So it allocates 55 buffers from P1, then 45 buffers from P2,
> and then goes back to P1 in a round-robin way. The advantage is it can
> do away without a PRNG.
I think that's a good plan.
A few comments about the clock sweep patch:
- It'd be easier to review if BgBufferSync() weren't basically re-indented
wholesale. Maybe you could instead move the relevant code to a helper
function that's called by BgBufferSync() for each clock?
- I think choosing a clock sweep partition in every tick would likely show up
in workloads that do a lot of buffer replacement, particularly if buffers
in the workload often have a high usagecount (and thus more ticks are used).
Given that your balancing approach "sticks" with a partition for a while,
could we perhaps only choose the partition after exhausting that budget?
- I don't really understand what
> + /*
> + * Buffers that should have been allocated in this partition (but might
> + * have been redirected to keep allocations balanced).
> + */
> + pg_atomic_uint32 numRequestedAllocs;
> +
is intended for.
Adding yet another atomic increment for every clock sweep tick seems rather
expensive...
- I wonder if the balancing budgets being relatively low will be good
enough. It's not too hard to imagine that this frequent "partition choosing"
will be bad in buffer access heavy workloads. But it's probably the right
approach until we've measured it being a problem.
- It'd be interesting to do some very simple evaluation like a single
pg_prewarm() of a relation that's close to the size of shared buffers and
verify that we don't end up evicting newly read in buffers. I think your
approach should work, but verifying that...
I wonder if we could make some of this into tests somehow. It's pretty easy
to break this kind of thing and not notice, as everything just continues to
work, just a tad slower.
Greetings,
Andres Freund