Re: Randomize B-Tree page split location to avoid oscillating patterns - Mailing list pgsql-hackers

From Dmitry Dolgov
Subject Re: Randomize B-Tree page split location to avoid oscillating patterns
Date
Msg-id uqcsdatuokodytg7dcan75hbutv4erwdmzqaxjwhakmvfhy7vf@fxlpamb3opzr
Whole thread
In response to Re: Randomize B-Tree page split location to avoid oscillating patterns  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Randomize B-Tree page split location to avoid oscillating patterns
List pgsql-hackers
> On Mon, Apr 27, 2026 at 02:07:14PM -0400, Peter Geoghegan wrote:

Sorry for the delay.

> My interpretation is that randomization can be combined with the usual
> suffix truncation split point choosing heuristics; the paper even
> recommends this at one point. That is, you randomly pick from a small
> number of candidate split points that are all approximately equally
> good according to the traditional criteria. It looks like your patch
> doesn't account for suffix truncation at all, though.
> 
> Your patch should initially look for several split points that are
> approximately equal in quality according to the current criteria -- a
> separate, initial pass. You'd then randomly pick a final split point
> from those gathered during this initial pass -- not from the original
> fillfactormult-sorted list of split points. What you have in v1 will
> make suffix truncation much less effective, which seems unacceptable.

The paper suggest combining randomization and suffix truncation via
randomly shifting the interval for truncation. I'm not sure a separate
pass is needed for that, looks like it should be enough to add random
shift to lowsplit / highsplit in _bt_bestsplitloc, when the penalty is
calculated.

The interesting part here is that it seems the current split interval
might be too small for such randomization to make a significant impact
-- few experiments show that with the current value the result looks
more like changing the fillfactor, waves are just shifting to the right.
Increasing the split interval helps in this case.

Such approach (randomized suffix truncation interval) produces the same
effect in a test with single column index as in the patch above. In a
test with a wider index and significant impact of truncation, there is
no visible degradation (and no oscillations either, as expected suffix
truncation introduces some randomness on it's own).

> Another issue is that nbtsort.c doesn't have any ability to pick among
> split points; it focuses entirely on keeping free space balanced. To
> get much benefit from this, I think you'd have to teach nbtsort.c to
> also pick split points using approximately the same algorithm as
> nbtsplitloc.c would (assuming retail inserts in ascending key space
> order). I've been meaning to add proper suffix truncation to the
> CREATE INDEX case.

Good point, I need to look into this.

> > The unfortunate part is that I couldn't get clear numbers for the performance
> > impact. Turns out the disk in my experimental setup is not good enough to get a
> > sufficient number of inserts to trigger the issue, and to get nice graphs I was
> > running everything either on a RAM disk or on an unlogged table -- in both
> > cases it's easy to observe oscillations of page splits, but their impact is not
> > large enough since only so much IO is happening.
> >
> > But anyway, any thoughts / commentaries on that?
> 
> I wouldn't expect the patch to increase absolute throughput
> significantly, if at all. Its value comes from making the *rate* of
> splits over time more consistent, for a given fixed workload. You
> might notice a more interesting effect if you look at latency,
> particularly worst case latency.

That's exactly what I'm talking about. In those experiments I was trying
to get any visible changes in latency variability, but in this
particular setup I could spot nothing beyond regular noise, while the
number of page splits was obviously heavily oscillating.



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Why is_admin_of_role() use ROLERECURSE_MEMBERS rather than ROLERECURSE_PRIVS?
Next
From: Paul A Jungwirth
Date:
Subject: Re: FOR PORTION OF does not recompute GENERATED STORED columns that depend on the range column