Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation
Date
Msg-id CAH2-WzkZY5QUnKfjTyVjUcwUrDOA3yedjn1YOfcPYj_e75kTeA@mail.gmail.com
Whole thread Raw
In response to Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Responses Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation
List pgsql-hackers
On Thu, Apr 15, 2021 at 11:06 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> I've noticed there are a lot of places in the btree index
> infrastructure (and also some other index AMs) that effectively
> iterate over the attributes of the index tuple, but won't use
> index_deform_tuple for reasons. However, this implies that they must
> repeatedly call index_getattr, which in the worst case is O(n) for the
> n-th attribute, slowing down extraction of multi-column indexes
> significantly. As such, I've added some API that allows for iteration
> (ish) over index attributes.

Interesting approach. I think that in an ideal world we would have a
tuple format with attribute lengths/offsets right in the header. But
it's too late for that, so other approaches seem worth considering.

> Also attached is 0002, which dynamically truncates attribute prefixes
> of tuples whilst _binsrch-ing through a nbtree page. It greatly uses
> the improved performance of 0001; they work very well together. The
> problems that Peter (cc-ed) mentions in [0] only result in invalid
> search bounds when traversing the tree, but on the page level valid
> bounds can be constructed.
>
> This is patchset 1 of a series of patches I'm starting for eventually
> adding static prefix truncation into nbtree infrastructure in
> PostgreSQL. I've put up a wiki page [1] with my current research and
> thoughts on that topic.

The idea of making _bt_truncate() produce new leaf page high keys
based on the lastleft tuple rather than the firstright tuple (i.e.
+inf truncated attribute values rather than the current -inf) seems
like a non-starter. As you point out in "1.) Suffix-truncation; -INF
in high keys" on the Postgres wiki page, the current approach
truncates firstright (not lastleft), making the left page's new high
key contain what you call a 'foreign' value. But I see that as a big
advantage of the current approach.

Consider, for example, the nbtree high key "continuescan" optimization
added by commit 29b64d1d. The fact that leaf page high keys are
generated in this way kind of allows us to "peak" on the page to the
immediate right before actually visiting it -- possibly without ever
visiting it (which is where the benefit of that particular
optimization lies). _bt_check_unique() uses a similar trick. After the
Postgres 12 work, _bt_check_unique() will only visit a second page in
the extreme case where we cannot possibly fit all of the relevant
version duplicates on even one whole leaf page (i.e. practically
never). There is also cleverness inside _bt_compare() to make sure
that we handle the boundary cases perfectly while descending the tree.

You might also consider how the nbtsplitloc.c code works with
duplicates, and how that would be affected by +inf truncated
attributes. The leaf-page-packing performed in the SPLIT_SINGLE_VALUE
case only goes ahead when the existing high key confirms that this
must be the rightmost page. Now, I guess that you could still do
something like that if we switched to +inf semantics. But, the fact
that the new right page will have a 'foreign' value in the
SPLIT_SINGLE_VALUE-split case is also of benefit -- it is practically
empty right after the split (since the original/left page is packed
full), and we want this empty space to be eligible to either take more
duplicates, or to take values that may happen to fit between the
highly duplicated value and the original foreign high key value. We
want that flexibility, I think.

I also find -inf much more natural. If in the future we teach nbtree
to truncate "inside" text attributes (say text columns), you'd pretty
much be doing the same thing at the level of characters rather than
whole columns. The -inf semantics are like strcmp() semantics.

If you're going to pursue full prefix compression anyway, maybe you
should use a low key on the leaf level in cases where the optimization
is in use. This creates complexity during page deletion, because the
low key in the subtree to the right of the deletion target subtree may
need to be updated. Perhaps you can find a way to make that work that
isn't too complicated.

> I've run some tests with regards to performance on my laptop; which
> tests nbtree index traversal. The test is based on a recent UK land
> registry sales prices dataset (25744780 rows), being copied from one
> table into an unlogged table with disabled autovacuum, with one index
> as specified by the result. Master @ 99964c4a, patched is with both
> 0001 and 0002. The results are averages over 3 runs, with plain
> configure, compiled by gcc (Debian 6.3.0-18+deb9u1).

You should probably account for index size here. I have lots of my own
tests for space utilization, using data from a variety of sources.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: Bogus collation version recording in recordMultipleDependencies
Next
From: Tom Lane
Date:
Subject: Re: Bogus collation version recording in recordMultipleDependencies