Re: Randomize B-Tree page split location to avoid oscillating patterns - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Randomize B-Tree page split location to avoid oscillating patterns
Date
Msg-id CAH2-WznzR_q9Q+5kZ8xbjaqt0c0LmXnv_wqQfzOX-g5cVgZYbg@mail.gmail.com
Whole thread
In response to Re: Randomize B-Tree page split location to avoid oscillating patterns  (Dmitry Dolgov <9erthalion6@gmail.com>)
Responses Re: Randomize B-Tree page split location to avoid oscillating patterns
List pgsql-hackers
On Wed, May 6, 2026 at 1:10 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > On Mon, Apr 27, 2026 at 02:07:14PM -0400, Peter Geoghegan wrote:
> > Your patch should initially look for several split points that are
> > approximately equal in quality according to the current criteria -- a
> > separate, initial pass. You'd then randomly pick a final split point
> > from those gathered during this initial pass -- not from the original
> > fillfactormult-sorted list of split points. What you have in v1 will
> > make suffix truncation much less effective, which seems unacceptable.
>
> The paper suggest combining randomization and suffix truncation via
> randomly shifting the interval for truncation. I'm not sure a separate
> pass is needed for that, looks like it should be enough to add random
> shift to lowsplit / highsplit in _bt_bestsplitloc, when the penalty is
> calculated.

I can't see why that would make much difference. Increasing the split
interval usually won't affect the final chosen split point at all.
Decreasing is more likely to change the split point, but that's
*precisely* because it makes suffix truncation less effective.

In general it's quite unlikely that an expanded interval (e.g.,
doubling LEAF_SPLIT_DISTANCE) will include a split point that
truncates additional index attributes compared to the best split point
available within the current calculated split interval/current
LEAF_SPLIT_DISTANCE. For example, with the TPC-C indexes, I'd expect
only a tiny minority of all page splits to be affected by doubling
LEAF_SPLIT_DISTANCE to make nbtsplitloc.c consider ~20% of all
possible split points around the middle of the page. Whereas *halving*
LEAF_SPLIT_DISTANCE will indeed make suffix truncation worse.

This asymmetry matters. I believe that it'll make it impossible for
space utilization to average out to the target leaffillfactor% over
time. After all, this approach doesn't actually target space
utilization. This is also why it will make literally no difference at
all with a single column unique index.

> The interesting part here is that it seems the current split interval
> might be too small for such randomization to make a significant impact
> -- few experiments show that with the current value the result looks
> more like changing the fillfactor, waves are just shifting to the right.
> Increasing the split interval helps in this case.

The point of gathering a list of "equally good" split points in an
initial pass is that it removes the danger of making the split
interval less effective in terms of final split penalty. The random
choice becomes a choice among split points that all truncate away the
same number of suffix attributes (all of which must still be
sufficiently close to the space-optimal split location to ensure that
no split is ever wildly unbalanced). This prevents the variable/random
choice from affecting the suffix truncation/split penalty.

With a single column unique index, all split points will have an equal
penalty under this scheme (implying that suffix truncation will be
equally effective). The initial list gathered in the first pass is
exactly all of the split points within the split interval in that
case. Whereas with other types of indexes the initial list gathered is
some subset of all the split points within the interval, without
regard for how balanced the post-split space utilization will be (all
splits within the interval are assumed to be "good enough" from a
space utilization perspective). Over time, space utilization should
reach fillfactor% -- without it negatively affecting suffix truncation.

It's possible that increasing the split interval would also make
sense. That seems like a follow-up question to me.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Paul A Jungwirth
Date:
Subject: Re: FOR PORTION OF does not recompute GENERATED STORED columns that depend on the range column
Next
From: Jacob Champion
Date:
Subject: Re: Why is_admin_of_role() use ROLERECURSE_MEMBERS rather than ROLERECURSE_PRIVS?