Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation - Mailing list pgsql-hackers

From Matthias van de Meent
Subject Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation
Date
Msg-id CAEze2WjiXKrpLyk0tyhSeZds93KfkWPZLaUUuauETFm5W_aATA@mail.gmail.com
Whole thread Raw
In response to Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Fri, 16 Apr 2021 at 18:03, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Apr 15, 2021 at 11:06 AM Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> > I've noticed there are a lot of places in the btree index
> > infrastructure (and also some other index AMs) that effectively
> > iterate over the attributes of the index tuple, but won't use
> > index_deform_tuple for reasons. However, this implies that they must
> > repeatedly call index_getattr, which in the worst case is O(n) for the
> > n-th attribute, slowing down extraction of multi-column indexes
> > significantly. As such, I've added some API that allows for iteration
> > (ish) over index attributes.
>
> Interesting approach. I think that in an ideal world we would have a
> tuple format with attribute lengths/offsets right in the header.

I believe that that would indeed be ideal w.r.t. access speed, but
also quite expensive w.r.t. amount of data stored. This would add 2
bytes per attribute in the current infrastructure (11 bits at least
for each attribute to store offsets), on the current 12 bytes of
overhead per indextuple (= 8 for IndexTuple + 4 for ItemId). That is
probably always going to be a non-starter, seeing that we can
relatively easily optimize our current attribute access patterns.

> But
> it's too late for that, so other approaches seem worth considering.

Yep.

> > Also attached is 0002, which dynamically truncates attribute prefixes
> > of tuples whilst _binsrch-ing through a nbtree page. It greatly uses
> > the improved performance of 0001; they work very well together. The
> > problems that Peter (cc-ed) mentions in [0] only result in invalid
> > search bounds when traversing the tree, but on the page level valid
> > bounds can be constructed.
> >
> > This is patchset 1 of a series of patches I'm starting for eventually
> > adding static prefix truncation into nbtree infrastructure in
> > PostgreSQL. I've put up a wiki page [1] with my current research and
> > thoughts on that topic.
>
> The idea of making _bt_truncate() produce new leaf page high keys
> based on the lastleft tuple rather than the firstright tuple (i.e.
> +inf truncated attribute values rather than the current -inf) seems
> like a non-starter. As you point out in "1.) Suffix-truncation; -INF
> in high keys" on the Postgres wiki page, the current approach
> truncates firstright (not lastleft), making the left page's new high
> key contain what you call a 'foreign' value. But I see that as a big
> advantage of the current approach.
>
> Consider, for example, the nbtree high key "continuescan" optimization
> added by commit 29b64d1d. The fact that leaf page high keys are
> generated in this way kind of allows us to "peak" on the page to the
> immediate right before actually visiting it -- possibly without ever
> visiting it (which is where the benefit of that particular
> optimization lies). _bt_check_unique() uses a similar trick. After the
> Postgres 12 work, _bt_check_unique() will only visit a second page in
> the extreme case where we cannot possibly fit all of the relevant
> version duplicates on even one whole leaf page (i.e. practically
> never). There is also cleverness inside _bt_compare() to make sure
> that we handle the boundary cases perfectly while descending the tree.

I understand and appreciate that the "-INF" truncation that is
currently in place is being relied upon in quite some places. Part of
the effort for "+INF" truncation would be determining where and how to
keep the benefits of the "-INF" truncation. I also believe that for
internal pages truncating to "+INF" would be perfectly fine; the
optimizations that I know of only rely on it at the leaf level.
Completely seperate from that, there's no reason (except for a
potential lack of unused bits) we can't flag suffix-truncated columns
as either "+INF" or "-INF" - that would allow us to apply each where
useful.

> You might also consider how the nbtsplitloc.c code works with
> duplicates, and how that would be affected by +inf truncated
> attributes. The leaf-page-packing performed in the SPLIT_SINGLE_VALUE
> case only goes ahead when the existing high key confirms that this
> must be the rightmost page. Now, I guess that you could still do
> something like that if we switched to +inf semantics. But, the fact
> that the new right page will have a 'foreign' value in the
> SPLIT_SINGLE_VALUE-split case is also of benefit -- it is practically
> empty right after the split (since the original/left page is packed
> full), and we want this empty space to be eligible to either take more
> duplicates, or to take values that may happen to fit between the
> highly duplicated value and the original foreign high key value. We
> want that flexibility, I think.
>
> I also find -inf much more natural. If in the future we teach nbtree
> to truncate "inside" text attributes (say text columns), you'd pretty
> much be doing the same thing at the level of characters rather than
> whole columns. The -inf semantics are like strcmp() semantics.

Yes, I also read and appreciate your comments on +inf vs -inf when
this came up in [0]. However, if we could choose, I think that having
both options could be quite beneficial, especially when dealing with
many duplicates or duplicate prefixes.

> If you're going to pursue full prefix compression anyway, maybe you
> should use a low key on the leaf level in cases where the optimization
> is in use. This creates complexity during page deletion, because the
> low key in the subtree to the right of the deletion target subtree may
> need to be updated. Perhaps you can find a way to make that work that
> isn't too complicated.

That would be an interesting research path as well, the cost/benefit
analysis would be much trickier when comparing to the status quo.

> You should probably account for index size here. I have lots of my own
> tests for space utilization, using data from a variety of sources.

I'd like to mention that the current (and measured) patchset only does
_logical_ dynamic prefix truncation, not the physical prefix
truncation that is described on the wiki page. Physical prefix
truncation will probably be a summer / fall project, and I will indeed
at some point need to build a test suite that would measure the
benefits, but for this patch I do not see the need for benchmarks on
size, as that is not the point of these patches. These patches are
useful on their own for multi-key-column btree performance (and some
GIST), regardless of later patches implementing physical dynamic
prefix truncation in the btree AM.


With regards,

Matthias van de Meent

[0] https://www.postgresql.org/message-id/CAH2-Wzm_Kxm26E_DwK7AR%2BZB_-B50OMpGoO%3Dn08tD%2BqH%3DMD-zw%40mail.gmail.com
[1] https://www.postgresql.org/message-id/CAH2-Wzn_NAyK4pR0HRWO0StwHmxjP5qyu+X8vppt030XpqrO6w@mail.gmail.com



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: default_tablespace doc and partitioned rels
Next
From: Thomas Munro
Date:
Subject: Re: Bogus collation version recording in recordMultipleDependencies