Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation |
Date | |
Msg-id | CAEze2WjiXKrpLyk0tyhSeZds93KfkWPZLaUUuauETFm5W_aATA@mail.gmail.com Whole thread Raw |
In response to | Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation
|
List | pgsql-hackers |
On Fri, 16 Apr 2021 at 18:03, Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, Apr 15, 2021 at 11:06 AM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > I've noticed there are a lot of places in the btree index > > infrastructure (and also some other index AMs) that effectively > > iterate over the attributes of the index tuple, but won't use > > index_deform_tuple for reasons. However, this implies that they must > > repeatedly call index_getattr, which in the worst case is O(n) for the > > n-th attribute, slowing down extraction of multi-column indexes > > significantly. As such, I've added some API that allows for iteration > > (ish) over index attributes. > > Interesting approach. I think that in an ideal world we would have a > tuple format with attribute lengths/offsets right in the header. I believe that that would indeed be ideal w.r.t. access speed, but also quite expensive w.r.t. amount of data stored. This would add 2 bytes per attribute in the current infrastructure (11 bits at least for each attribute to store offsets), on the current 12 bytes of overhead per indextuple (= 8 for IndexTuple + 4 for ItemId). That is probably always going to be a non-starter, seeing that we can relatively easily optimize our current attribute access patterns. > But > it's too late for that, so other approaches seem worth considering. Yep. > > Also attached is 0002, which dynamically truncates attribute prefixes > > of tuples whilst _binsrch-ing through a nbtree page. It greatly uses > > the improved performance of 0001; they work very well together. The > > problems that Peter (cc-ed) mentions in [0] only result in invalid > > search bounds when traversing the tree, but on the page level valid > > bounds can be constructed. > > > > This is patchset 1 of a series of patches I'm starting for eventually > > adding static prefix truncation into nbtree infrastructure in > > PostgreSQL. I've put up a wiki page [1] with my current research and > > thoughts on that topic. > > The idea of making _bt_truncate() produce new leaf page high keys > based on the lastleft tuple rather than the firstright tuple (i.e. > +inf truncated attribute values rather than the current -inf) seems > like a non-starter. As you point out in "1.) Suffix-truncation; -INF > in high keys" on the Postgres wiki page, the current approach > truncates firstright (not lastleft), making the left page's new high > key contain what you call a 'foreign' value. But I see that as a big > advantage of the current approach. > > Consider, for example, the nbtree high key "continuescan" optimization > added by commit 29b64d1d. The fact that leaf page high keys are > generated in this way kind of allows us to "peak" on the page to the > immediate right before actually visiting it -- possibly without ever > visiting it (which is where the benefit of that particular > optimization lies). _bt_check_unique() uses a similar trick. After the > Postgres 12 work, _bt_check_unique() will only visit a second page in > the extreme case where we cannot possibly fit all of the relevant > version duplicates on even one whole leaf page (i.e. practically > never). There is also cleverness inside _bt_compare() to make sure > that we handle the boundary cases perfectly while descending the tree. I understand and appreciate that the "-INF" truncation that is currently in place is being relied upon in quite some places. Part of the effort for "+INF" truncation would be determining where and how to keep the benefits of the "-INF" truncation. I also believe that for internal pages truncating to "+INF" would be perfectly fine; the optimizations that I know of only rely on it at the leaf level. Completely seperate from that, there's no reason (except for a potential lack of unused bits) we can't flag suffix-truncated columns as either "+INF" or "-INF" - that would allow us to apply each where useful. > You might also consider how the nbtsplitloc.c code works with > duplicates, and how that would be affected by +inf truncated > attributes. The leaf-page-packing performed in the SPLIT_SINGLE_VALUE > case only goes ahead when the existing high key confirms that this > must be the rightmost page. Now, I guess that you could still do > something like that if we switched to +inf semantics. But, the fact > that the new right page will have a 'foreign' value in the > SPLIT_SINGLE_VALUE-split case is also of benefit -- it is practically > empty right after the split (since the original/left page is packed > full), and we want this empty space to be eligible to either take more > duplicates, or to take values that may happen to fit between the > highly duplicated value and the original foreign high key value. We > want that flexibility, I think. > > I also find -inf much more natural. If in the future we teach nbtree > to truncate "inside" text attributes (say text columns), you'd pretty > much be doing the same thing at the level of characters rather than > whole columns. The -inf semantics are like strcmp() semantics. Yes, I also read and appreciate your comments on +inf vs -inf when this came up in [0]. However, if we could choose, I think that having both options could be quite beneficial, especially when dealing with many duplicates or duplicate prefixes. > If you're going to pursue full prefix compression anyway, maybe you > should use a low key on the leaf level in cases where the optimization > is in use. This creates complexity during page deletion, because the > low key in the subtree to the right of the deletion target subtree may > need to be updated. Perhaps you can find a way to make that work that > isn't too complicated. That would be an interesting research path as well, the cost/benefit analysis would be much trickier when comparing to the status quo. > You should probably account for index size here. I have lots of my own > tests for space utilization, using data from a variety of sources. I'd like to mention that the current (and measured) patchset only does _logical_ dynamic prefix truncation, not the physical prefix truncation that is described on the wiki page. Physical prefix truncation will probably be a summer / fall project, and I will indeed at some point need to build a test suite that would measure the benefits, but for this patch I do not see the need for benchmarks on size, as that is not the point of these patches. These patches are useful on their own for multi-key-column btree performance (and some GIST), regardless of later patches implementing physical dynamic prefix truncation in the btree AM. With regards, Matthias van de Meent [0] https://www.postgresql.org/message-id/CAH2-Wzm_Kxm26E_DwK7AR%2BZB_-B50OMpGoO%3Dn08tD%2BqH%3DMD-zw%40mail.gmail.com [1] https://www.postgresql.org/message-id/CAH2-Wzn_NAyK4pR0HRWO0StwHmxjP5qyu+X8vppt030XpqrO6w@mail.gmail.com
pgsql-hackers by date: