Home > mailing lists

Re: Randomize B-Tree page split location to avoid oscillating patterns - Mailing list pgsql-hackers

From	Dmitry Dolgov
Subject	Re: Randomize B-Tree page split location to avoid oscillating patterns
Date	May 7 21:10:30
Msg-id	noovhlxtwa64do7isbbvl2f6clqukujb7hexw3zi2rn3ssbmrw@3yvb4ghojg4b Whole thread
In response to	Re: Randomize B-Tree page split location to avoid oscillating patterns (Peter Geoghegan <pg@bowt.ie>)
List	pgsql-hackers

Tree view

> On Wed, May 06, 2026 at 01:49:10PM -0400, Peter Geoghegan wrote:
> On Wed, May 6, 2026 at 1:10 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > > On Mon, Apr 27, 2026 at 02:07:14PM -0400, Peter Geoghegan wrote:
> > > Your patch should initially look for several split points that are
> > > approximately equal in quality according to the current criteria -- a
> > > separate, initial pass. You'd then randomly pick a final split point
> > > from those gathered during this initial pass -- not from the original
> > > fillfactormult-sorted list of split points. What you have in v1 will
> > > make suffix truncation much less effective, which seems unacceptable.
> >
> > The paper suggest combining randomization and suffix truncation via
> > randomly shifting the interval for truncation. I'm not sure a separate
> > pass is needed for that, looks like it should be enough to add random
> > shift to lowsplit / highsplit in _bt_bestsplitloc, when the penalty is
> > calculated.
> 
> I can't see why that would make much difference. Increasing the split
> interval usually won't affect the final chosen split point at all.
> Decreasing is more likely to change the split point, but that's
> *precisely* because it makes suffix truncation less effective.

I assume by increasing/decreasing the split interval you mean increase/decrease
around the split point, but I was talking about shifting the interval to the
right. More about this in the next part. 

> The point of gathering a list of "equally good" split points in an
> initial pass is that it removes the danger of making the split
> interval less effective in terms of final split penalty. The random
> choice becomes a choice among split points that all truncate away the
> same number of suffix attributes (all of which must still be
> sufficiently close to the space-optimal split location to ensure that
> no split is ever wildly unbalanced). This prevents the variable/random
> choice from affecting the suffix truncation/split penalty.

I see what you mean, but I'm concerned about the resulting overhead. Here is my
understanding of how it would work, let me know if something is missing in the
chain of thoughts:

* What happens currently is we collect all possible split points, sort them by
  delta, calculate the split interval. Then we iterate over the possible split
  points in the interval [0, split interval), trying to find the one causing
  the lowest penalty. If what we found is the best possible penalty, we stop --
  for unique indexes it means that virtually all the time everything is over
  after a single iteration, since the first attribute is enough to
  distinguish the split.

* If we're going to pick at random from a list of "equally good" split points,
  everything up to the split interval is the same. Then we iterate over the
  possible split points in the interval [0, split interval), and search for all
  the points with penalty equal or lower than the current lowest value (the
  list is reset if we found a lower value). We search this way until we found
  enough for the chosen randomization interval (those 20% from the paper) or
  until we're out of split points. In the end we've got a list of split points,
  each has equally low penalty, and we choose a split point at random from this
  list. For a unique index it means we have to do at least as many iterations
  as the size of randomization interval, instead of one iteration as before --
  this is the overhead I'm concerned about.

* If we're going to shift the split interval, everything is also the same up to
  the split interval calculation. Now we introduce a

      delta = random from (0, randomization interval)

  and consider two intervals of available split points:

      [0, delta) and [delta, split interval + delta)

  We first search in [0, delta) as before for lowest possible penalty, just to
  make sure we're not missing it, if it's not present in the second interval.
  The main part is search in [delta, split interval + delta) for a split point
  with the same or lower penalty as found in the first interval and use it as
  the final result. There are few possible outcomes:

  1. There are one or more lowest penalty split point in [0, delta) and one or
  more in [delta, split interval + delta). Since delta is randomized, we pick
  one of the "equally good" split points at random.

  2. There are one or more lowest penalty split points in [0, delta) and for
  some reason none in [delta, split interval + delta). In this case we forgo
  randomization and go with the lowest penalty split point, using the point
  from the [0, delta).

  3. There are none lowest penalty split points in [0, delta) and one or more
  in [delta, split interval + delta). In this case we still pick up one of the
  "equally good" split points, still with some randomization.

  4. There are none lowest penalty split points in [0, randomization interval)
  and one or more in [randomization interval, split interval + randomization
  interval). In this case we still pick up one of the "equally good" split
  points, but without randomization, since the best point lays beyond the
  allowed randomization range, and we hit it all the time.

  Ultimately we would like to apply randomization of split point for mostly
  unique indexes, since otherwise suffix truncation already does the job. With
  this approach the randomization interval serves as a threshold to distinguish
  such indexes -- if the index has mostly unique values, the optimal split
  point would be located close to the start of the interval and multiple
  "equally good" points will fall within the randomization interval. If on the
  other hand the index has many duplicating values, the optimal split point may
  be located outside the randomization interval and we would choose it all
  the time -- but thanks to suffix truncation it doesn't matter.

  From the overhead perspective, with this approach we do:
  - one search in [0, delta), which will end after one iteration for unique
    indexes.
  - then do a "jump" of random distance and do one search in [delta, split
    interval + delta) on the same conditions. It will end after one iteration
    for unique index, and will process a similar number of iteration otherwise.

To summarize, in comparison with the second approach the third one seems to
have less overhead, the same guarantees regarding penalty, and more complex
implementation. I'm fine going with the second approach, but first would like
to discuss the alternatives and make sure we're on the same page.

pgsql-hackers by date:

From: Lukas Fittl
Date: 07 May, 20:33:33
Subject: Re: Broken build on macOS (Universal / Intel): cpuid instruction not available

From: Zsolt Parragi
Date: 07 May, 22:00:04
Subject: Re: Wrong results with equality search using trigram index and non-deterministic collation

Re: Randomize B-Tree page split location to avoid oscillating patterns - Mailing list pgsql-hackers

Previous

Next