Thread: "Classic" nbtree suffix truncation prototype

"Classic" nbtree suffix truncation prototype

From
Peter Geoghegan
Date:
I wrote a prototype quality patch that implements what I call
"classic" suffix truncation. That is, the patch makes nbtree leaf page
splits generate a short prefix datum for the new high key for
variable-length datatypes, generated using new opclass infrastructure.
This is useful for string-like datatypes such as text, especially when
we happen to be indexing long text strings, where we don't truly need
to use long strings in internal pages. The feature is "classic" in the
sense that it works in more or less the way that Bayer and Unterauer
anticipate in the "Prefix B-Trees" paper. We already have a limited
form of suffix truncation that only works at the whole-column
granularity. To some degree, I approached writing this patch as
finishing off the work that was started in Postgres 12. It started out
that way, at least.

I believe that the prototype is representative of the benefits that
we'd get from a high quality implementation of the same feature. Those
benefits turn out to be relatively small, which is discouraging. I'm
not going to pursue the project any further in the near-term, because
there is more important work to be done, but I thought that I'd share
what I found here. Highlights:

* There were changes required to nbtsplitloc.c to make it work well,
unsurprisingly. It has to be sensitive to the final size of the new
high key, rather than just caring about the number of columns that can
be truncated. At the same time, it's important to not break the
duplicate handling stuff. I ended up changing the penalty logic for
leaf splits, making it give primary consideration to the discrete cost
that is the total number of columns that are left in the new high key,
and secondary consideration to the continuous cost that is the exact
final size of the new high key with classic suffix truncation. This
seemed to work well, though I didn't get as far as worrying about the
extra cycles (I only implemented classic suffix truncation for "C"
locale text).

* I saw a tiny reduction in the number of leaf pages with affected
indexes in my test suite, and small reductions in the number of
internal pages.

* There were 2 or 3 existing indexes from my test suite that happened
to index very large strings, such as an index on a "notes" column.
These indexes had massive reductions in the number of internal pages
(e.g. a 5x+ reduction), along with a very small reduction in the
number of leaf pages. Users that happen to have a lot of indexes that
look like this are likely to find classic suffix truncation
compelling, but that doesn't seem like a good enough reason to push
ahead with the patch.

--
Peter Geoghegan