Re: Randomize B-Tree page split location to avoid oscillating patterns - Mailing list pgsql-hackers
| From | Dmitry Dolgov |
|---|---|
| Subject | Re: Randomize B-Tree page split location to avoid oscillating patterns |
| Date | |
| Msg-id | noovhlxtwa64do7isbbvl2f6clqukujb7hexw3zi2rn3ssbmrw@3yvb4ghojg4b Whole thread |
| In response to | Re: Randomize B-Tree page split location to avoid oscillating patterns (Peter Geoghegan <pg@bowt.ie>) |
| List | pgsql-hackers |
> On Wed, May 06, 2026 at 01:49:10PM -0400, Peter Geoghegan wrote:
> On Wed, May 6, 2026 at 1:10 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> > > On Mon, Apr 27, 2026 at 02:07:14PM -0400, Peter Geoghegan wrote:
> > > Your patch should initially look for several split points that are
> > > approximately equal in quality according to the current criteria -- a
> > > separate, initial pass. You'd then randomly pick a final split point
> > > from those gathered during this initial pass -- not from the original
> > > fillfactormult-sorted list of split points. What you have in v1 will
> > > make suffix truncation much less effective, which seems unacceptable.
> >
> > The paper suggest combining randomization and suffix truncation via
> > randomly shifting the interval for truncation. I'm not sure a separate
> > pass is needed for that, looks like it should be enough to add random
> > shift to lowsplit / highsplit in _bt_bestsplitloc, when the penalty is
> > calculated.
>
> I can't see why that would make much difference. Increasing the split
> interval usually won't affect the final chosen split point at all.
> Decreasing is more likely to change the split point, but that's
> *precisely* because it makes suffix truncation less effective.
I assume by increasing/decreasing the split interval you mean increase/decrease
around the split point, but I was talking about shifting the interval to the
right. More about this in the next part.
> The point of gathering a list of "equally good" split points in an
> initial pass is that it removes the danger of making the split
> interval less effective in terms of final split penalty. The random
> choice becomes a choice among split points that all truncate away the
> same number of suffix attributes (all of which must still be
> sufficiently close to the space-optimal split location to ensure that
> no split is ever wildly unbalanced). This prevents the variable/random
> choice from affecting the suffix truncation/split penalty.
I see what you mean, but I'm concerned about the resulting overhead. Here is my
understanding of how it would work, let me know if something is missing in the
chain of thoughts:
* What happens currently is we collect all possible split points, sort them by
delta, calculate the split interval. Then we iterate over the possible split
points in the interval [0, split interval), trying to find the one causing
the lowest penalty. If what we found is the best possible penalty, we stop --
for unique indexes it means that virtually all the time everything is over
after a single iteration, since the first attribute is enough to
distinguish the split.
* If we're going to pick at random from a list of "equally good" split points,
everything up to the split interval is the same. Then we iterate over the
possible split points in the interval [0, split interval), and search for all
the points with penalty equal or lower than the current lowest value (the
list is reset if we found a lower value). We search this way until we found
enough for the chosen randomization interval (those 20% from the paper) or
until we're out of split points. In the end we've got a list of split points,
each has equally low penalty, and we choose a split point at random from this
list. For a unique index it means we have to do at least as many iterations
as the size of randomization interval, instead of one iteration as before --
this is the overhead I'm concerned about.
* If we're going to shift the split interval, everything is also the same up to
the split interval calculation. Now we introduce a
delta = random from (0, randomization interval)
and consider two intervals of available split points:
[0, delta) and [delta, split interval + delta)
We first search in [0, delta) as before for lowest possible penalty, just to
make sure we're not missing it, if it's not present in the second interval.
The main part is search in [delta, split interval + delta) for a split point
with the same or lower penalty as found in the first interval and use it as
the final result. There are few possible outcomes:
1. There are one or more lowest penalty split point in [0, delta) and one or
more in [delta, split interval + delta). Since delta is randomized, we pick
one of the "equally good" split points at random.
2. There are one or more lowest penalty split points in [0, delta) and for
some reason none in [delta, split interval + delta). In this case we forgo
randomization and go with the lowest penalty split point, using the point
from the [0, delta).
3. There are none lowest penalty split points in [0, delta) and one or more
in [delta, split interval + delta). In this case we still pick up one of the
"equally good" split points, still with some randomization.
4. There are none lowest penalty split points in [0, randomization interval)
and one or more in [randomization interval, split interval + randomization
interval). In this case we still pick up one of the "equally good" split
points, but without randomization, since the best point lays beyond the
allowed randomization range, and we hit it all the time.
Ultimately we would like to apply randomization of split point for mostly
unique indexes, since otherwise suffix truncation already does the job. With
this approach the randomization interval serves as a threshold to distinguish
such indexes -- if the index has mostly unique values, the optimal split
point would be located close to the start of the interval and multiple
"equally good" points will fall within the randomization interval. If on the
other hand the index has many duplicating values, the optimal split point may
be located outside the randomization interval and we would choose it all
the time -- but thanks to suffix truncation it doesn't matter.
From the overhead perspective, with this approach we do:
- one search in [0, delta), which will end after one iteration for unique
indexes.
- then do a "jump" of random distance and do one search in [delta, split
interval + delta) on the same conditions. It will end after one iteration
for unique index, and will process a similar number of iteration otherwise.
To summarize, in comparison with the second approach the third one seems to
have less overhead, the same guarantees regarding penalty, and more complex
implementation. I'm fine going with the second approach, but first would like
to discuss the alternatives and make sure we're on the same page.
pgsql-hackers by date: