Thread: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation
Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation
From
Matthias van de Meent
Date:
-- Targeting PG15; if too early / noise then please ignore. I've noticed there are a lot of places in the btree index infrastructure (and also some other index AMs) that effectively iterate over the attributes of the index tuple, but won't use index_deform_tuple for reasons. However, this implies that they must repeatedly call index_getattr, which in the worst case is O(n) for the n-th attribute, slowing down extraction of multi-column indexes significantly. As such, I've added some API that allows for iteration (ish) over index attributes. Please find attached patch 0001 that improves the runtime complexity of many of these places by storing and reusing the offset of the last extracted attribute. This improves the worst-case runtime of extracting all attributes to O(n) for incremental attribute extraction (from O(n*n)). Note that finding the first offsets is still an O(n) worst case for starting at the n-th attribute, but nothing can be done about that. Almost all workloads for multi-column nbtree indexes that cannot use attcacheoff should see a benefit from this patch; only those that only use row scans cannot use this optimization. Additionally, multi-column gist indexes could also see some (albeit limited) benefit, which is indeed useful when considering the newly added INCLUDE support in the gist AM. Also attached is 0002, which dynamically truncates attribute prefixes of tuples whilst _binsrch-ing through a nbtree page. It greatly uses the improved performance of 0001; they work very well together. The problems that Peter (cc-ed) mentions in [0] only result in invalid search bounds when traversing the tree, but on the page level valid bounds can be constructed. This is patchset 1 of a series of patches I'm starting for eventually adding static prefix truncation into nbtree infrastructure in PostgreSQL. I've put up a wiki page [1] with my current research and thoughts on that topic. Performance ----------- I've run some tests with regards to performance on my laptop; which tests nbtree index traversal. The test is based on a recent UK land registry sales prices dataset (25744780 rows), being copied from one table into an unlogged table with disabled autovacuum, with one index as specified by the result. Master @ 99964c4a, patched is with both 0001 and 0002. The results are averages over 3 runs, with plain configure, compiled by gcc (Debian 6.3.0-18+deb9u1). INSERT (index definition) | master (s) | patched (s) | improv(%) UNIQUE (transaction) | 256851 | 251705 | 2.00 (county, city, locality) | 154529 | 147495 | 4.55 (county COLLATE "en_US", city, locality) | 174028 | 164165 | 5.67 (always_null, county, city, locality) | 173090 | 166851 | 3.60 Some testing for reindex indicates improvements there as well: Same compiled version; all indexes on an unlogged table; REINDEX run 4 times on each index, last 3 were averaged. REINDEX (index definition) | master (s) | patched (s) | improv(%) UNIQUE (transaction) | 11623 | 11692 | -0.6 (county, city, locality) | 58299 | 54770 | 6.1 (county COLLATE "en_US", city, locality) | 61790 | 55887 | 9.6 (always_null, county, city, locality) | 69703 | 63925 | 8.3 I am quite suprised with the results for the single-column unique index insertions, as that was one of the points where I was suspecting a slight decrease in performance for inserts. I haven't really checked why the performance increased, but I suspect it has to do with an improved fast-path for finding the first attribute (we know it always starts at offset 0 of the data section), but it might also just as well be due to throttling (sadly, I do not have a stable benchmarking machine, so my laptop will do). I'm also slightly disappointed with the results of the always_null insert load; I had hoped for better results there, seeing the results for the other 2 multi-column indexes. With regards, Matthias van de Meent. [0] https://www.postgresql.org/message-id/CAH2-Wzn_NAyK4pR0HRWO0StwHmxjP5qyu+X8vppt030XpqrO6w@mail.gmail.com [1] https://wiki.postgresql.org/wiki/NBTree_Prefix_Truncation
Attachment
Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation
From
Peter Geoghegan
Date:
On Thu, Apr 15, 2021 at 11:06 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > I've noticed there are a lot of places in the btree index > infrastructure (and also some other index AMs) that effectively > iterate over the attributes of the index tuple, but won't use > index_deform_tuple for reasons. However, this implies that they must > repeatedly call index_getattr, which in the worst case is O(n) for the > n-th attribute, slowing down extraction of multi-column indexes > significantly. As such, I've added some API that allows for iteration > (ish) over index attributes. Interesting approach. I think that in an ideal world we would have a tuple format with attribute lengths/offsets right in the header. But it's too late for that, so other approaches seem worth considering. > Also attached is 0002, which dynamically truncates attribute prefixes > of tuples whilst _binsrch-ing through a nbtree page. It greatly uses > the improved performance of 0001; they work very well together. The > problems that Peter (cc-ed) mentions in [0] only result in invalid > search bounds when traversing the tree, but on the page level valid > bounds can be constructed. > > This is patchset 1 of a series of patches I'm starting for eventually > adding static prefix truncation into nbtree infrastructure in > PostgreSQL. I've put up a wiki page [1] with my current research and > thoughts on that topic. The idea of making _bt_truncate() produce new leaf page high keys based on the lastleft tuple rather than the firstright tuple (i.e. +inf truncated attribute values rather than the current -inf) seems like a non-starter. As you point out in "1.) Suffix-truncation; -INF in high keys" on the Postgres wiki page, the current approach truncates firstright (not lastleft), making the left page's new high key contain what you call a 'foreign' value. But I see that as a big advantage of the current approach. Consider, for example, the nbtree high key "continuescan" optimization added by commit 29b64d1d. The fact that leaf page high keys are generated in this way kind of allows us to "peak" on the page to the immediate right before actually visiting it -- possibly without ever visiting it (which is where the benefit of that particular optimization lies). _bt_check_unique() uses a similar trick. After the Postgres 12 work, _bt_check_unique() will only visit a second page in the extreme case where we cannot possibly fit all of the relevant version duplicates on even one whole leaf page (i.e. practically never). There is also cleverness inside _bt_compare() to make sure that we handle the boundary cases perfectly while descending the tree. You might also consider how the nbtsplitloc.c code works with duplicates, and how that would be affected by +inf truncated attributes. The leaf-page-packing performed in the SPLIT_SINGLE_VALUE case only goes ahead when the existing high key confirms that this must be the rightmost page. Now, I guess that you could still do something like that if we switched to +inf semantics. But, the fact that the new right page will have a 'foreign' value in the SPLIT_SINGLE_VALUE-split case is also of benefit -- it is practically empty right after the split (since the original/left page is packed full), and we want this empty space to be eligible to either take more duplicates, or to take values that may happen to fit between the highly duplicated value and the original foreign high key value. We want that flexibility, I think. I also find -inf much more natural. If in the future we teach nbtree to truncate "inside" text attributes (say text columns), you'd pretty much be doing the same thing at the level of characters rather than whole columns. The -inf semantics are like strcmp() semantics. If you're going to pursue full prefix compression anyway, maybe you should use a low key on the leaf level in cases where the optimization is in use. This creates complexity during page deletion, because the low key in the subtree to the right of the deletion target subtree may need to be updated. Perhaps you can find a way to make that work that isn't too complicated. > I've run some tests with regards to performance on my laptop; which > tests nbtree index traversal. The test is based on a recent UK land > registry sales prices dataset (25744780 rows), being copied from one > table into an unlogged table with disabled autovacuum, with one index > as specified by the result. Master @ 99964c4a, patched is with both > 0001 and 0002. The results are averages over 3 runs, with plain > configure, compiled by gcc (Debian 6.3.0-18+deb9u1). You should probably account for index size here. I have lots of my own tests for space utilization, using data from a variety of sources. -- Peter Geoghegan
Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation
From
Matthias van de Meent
Date:
On Fri, 16 Apr 2021 at 18:03, Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, Apr 15, 2021 at 11:06 AM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > I've noticed there are a lot of places in the btree index > > infrastructure (and also some other index AMs) that effectively > > iterate over the attributes of the index tuple, but won't use > > index_deform_tuple for reasons. However, this implies that they must > > repeatedly call index_getattr, which in the worst case is O(n) for the > > n-th attribute, slowing down extraction of multi-column indexes > > significantly. As such, I've added some API that allows for iteration > > (ish) over index attributes. > > Interesting approach. I think that in an ideal world we would have a > tuple format with attribute lengths/offsets right in the header. I believe that that would indeed be ideal w.r.t. access speed, but also quite expensive w.r.t. amount of data stored. This would add 2 bytes per attribute in the current infrastructure (11 bits at least for each attribute to store offsets), on the current 12 bytes of overhead per indextuple (= 8 for IndexTuple + 4 for ItemId). That is probably always going to be a non-starter, seeing that we can relatively easily optimize our current attribute access patterns. > But > it's too late for that, so other approaches seem worth considering. Yep. > > Also attached is 0002, which dynamically truncates attribute prefixes > > of tuples whilst _binsrch-ing through a nbtree page. It greatly uses > > the improved performance of 0001; they work very well together. The > > problems that Peter (cc-ed) mentions in [0] only result in invalid > > search bounds when traversing the tree, but on the page level valid > > bounds can be constructed. > > > > This is patchset 1 of a series of patches I'm starting for eventually > > adding static prefix truncation into nbtree infrastructure in > > PostgreSQL. I've put up a wiki page [1] with my current research and > > thoughts on that topic. > > The idea of making _bt_truncate() produce new leaf page high keys > based on the lastleft tuple rather than the firstright tuple (i.e. > +inf truncated attribute values rather than the current -inf) seems > like a non-starter. As you point out in "1.) Suffix-truncation; -INF > in high keys" on the Postgres wiki page, the current approach > truncates firstright (not lastleft), making the left page's new high > key contain what you call a 'foreign' value. But I see that as a big > advantage of the current approach. > > Consider, for example, the nbtree high key "continuescan" optimization > added by commit 29b64d1d. The fact that leaf page high keys are > generated in this way kind of allows us to "peak" on the page to the > immediate right before actually visiting it -- possibly without ever > visiting it (which is where the benefit of that particular > optimization lies). _bt_check_unique() uses a similar trick. After the > Postgres 12 work, _bt_check_unique() will only visit a second page in > the extreme case where we cannot possibly fit all of the relevant > version duplicates on even one whole leaf page (i.e. practically > never). There is also cleverness inside _bt_compare() to make sure > that we handle the boundary cases perfectly while descending the tree. I understand and appreciate that the "-INF" truncation that is currently in place is being relied upon in quite some places. Part of the effort for "+INF" truncation would be determining where and how to keep the benefits of the "-INF" truncation. I also believe that for internal pages truncating to "+INF" would be perfectly fine; the optimizations that I know of only rely on it at the leaf level. Completely seperate from that, there's no reason (except for a potential lack of unused bits) we can't flag suffix-truncated columns as either "+INF" or "-INF" - that would allow us to apply each where useful. > You might also consider how the nbtsplitloc.c code works with > duplicates, and how that would be affected by +inf truncated > attributes. The leaf-page-packing performed in the SPLIT_SINGLE_VALUE > case only goes ahead when the existing high key confirms that this > must be the rightmost page. Now, I guess that you could still do > something like that if we switched to +inf semantics. But, the fact > that the new right page will have a 'foreign' value in the > SPLIT_SINGLE_VALUE-split case is also of benefit -- it is practically > empty right after the split (since the original/left page is packed > full), and we want this empty space to be eligible to either take more > duplicates, or to take values that may happen to fit between the > highly duplicated value and the original foreign high key value. We > want that flexibility, I think. > > I also find -inf much more natural. If in the future we teach nbtree > to truncate "inside" text attributes (say text columns), you'd pretty > much be doing the same thing at the level of characters rather than > whole columns. The -inf semantics are like strcmp() semantics. Yes, I also read and appreciate your comments on +inf vs -inf when this came up in [0]. However, if we could choose, I think that having both options could be quite beneficial, especially when dealing with many duplicates or duplicate prefixes. > If you're going to pursue full prefix compression anyway, maybe you > should use a low key on the leaf level in cases where the optimization > is in use. This creates complexity during page deletion, because the > low key in the subtree to the right of the deletion target subtree may > need to be updated. Perhaps you can find a way to make that work that > isn't too complicated. That would be an interesting research path as well, the cost/benefit analysis would be much trickier when comparing to the status quo. > You should probably account for index size here. I have lots of my own > tests for space utilization, using data from a variety of sources. I'd like to mention that the current (and measured) patchset only does _logical_ dynamic prefix truncation, not the physical prefix truncation that is described on the wiki page. Physical prefix truncation will probably be a summer / fall project, and I will indeed at some point need to build a test suite that would measure the benefits, but for this patch I do not see the need for benchmarks on size, as that is not the point of these patches. These patches are useful on their own for multi-key-column btree performance (and some GIST), regardless of later patches implementing physical dynamic prefix truncation in the btree AM. With regards, Matthias van de Meent [0] https://www.postgresql.org/message-id/CAH2-Wzm_Kxm26E_DwK7AR%2BZB_-B50OMpGoO%3Dn08tD%2BqH%3DMD-zw%40mail.gmail.com [1] https://www.postgresql.org/message-id/CAH2-Wzn_NAyK4pR0HRWO0StwHmxjP5qyu+X8vppt030XpqrO6w@mail.gmail.com
Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation
From
Peter Geoghegan
Date:
On Fri, Apr 16, 2021 at 2:20 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > Interesting approach. I think that in an ideal world we would have a > > tuple format with attribute lengths/offsets right in the header. > > I believe that that would indeed be ideal w.r.t. access speed, but > also quite expensive w.r.t. amount of data stored. This would add 2 > bytes per attribute in the current infrastructure (11 bits at least > for each attribute to store offsets), on the current 12 bytes of > overhead per indextuple (= 8 for IndexTuple + 4 for ItemId). That is > probably always going to be a non-starter, seeing that we can > relatively easily optimize our current attribute access patterns. I don't think that that's why it's a non-starter. This design assumes a world in which everything has already been optimized for this layout. You no longer get to store the varlena header inline, which would break a lot of things in Postgres were it ever to be attempted. The space efficiency issues don't really apply because you have an offset for fixed-length types -- their presence is always implied. I think that you need to encode NULLs differently, which is a lot less space efficient when there are a lot of NULLs. But on the whole this design seems more efficient than what we have currently. This representation of index tuples would be a totally reasonable design were we in a green field situation. (Which is pretty far from the situation we're actually in, of course.) > I understand and appreciate that the "-INF" truncation that is > currently in place is being relied upon in quite some places. Part of > the effort for "+INF" truncation would be determining where and how to > keep the benefits of the "-INF" truncation. I also believe that for > internal pages truncating to "+INF" would be perfectly fine; the > optimizations that I know of only rely on it at the leaf level. I don't doubt that there is nothing special about -inf from a key space point of view. Actually...you could say that -inf is special to the limited extent that we know it only appears in pivot tuples and exploit that property when the !pivotsearch case/optimization is used. But that isn't much of an exception at a high level, so whatever. Anyway, it follows that +inf could in principle be used instead in some or all cases -- all that is truly essential for correctness is that the invariants always be respected. We're still in agreement up until here. > Yes, I also read and appreciate your comments on +inf vs -inf when > this came up in [0]. I'm impressed that you've done your homework on this. > However, if we could choose, I think that having > both options could be quite beneficial, especially when dealing with > many duplicates or duplicate prefixes. This is where things are much less clear -- maybe we're not in agreement here. Who knows, though -- maybe you're right. But you haven't presented any kind of argument. I understand that it's hard to articulate what effects might be in play with stuff like this, so I won't force the issue now. Strong evidence is of course the only way that you'll reliably convince me of this. I should point out that I am a little confused about how this +inf business could be both independently useful and pivotal to implementing [dynamic] prefix truncation/compression. Seems...weird to discuss them together, except maybe to mention in passing that this +inf thing is notable for particularly helping dynamic prefix stuff -- which is it? It is my strong preference that nbtsplitloc.c continue to know approximately nothing about compression or deduplication. While it is true that nbtsplitloc.c's _bt_recsplitloc() is aware of posting lists, this is strictly an optimization that is only justified by the fact that posting lists are sometimes very large, and therefore worth considering directly -- just to get a more accurate idea of how a relevant split point choice affects the balance of free space (we don't bother to do the same thing with non-key INCLUDE columns because they're generally small and equi-sized). And so this _bt_recsplitloc() thing no exception to the general rule, which is: deduplication/posting list maintenance should be *totally* orthogonal to the page split choice logic (the design of posting list splits helps a lot with that). We can afford to have complicated split point choice logic because the question of which split point is optimal is totally decoupled from the question of which are correct -- in particular, from the correctness of the space accounting used to generate candidate split points. It may interest you to know that I once thought that it would be nice to have the *option* of +inf too, so that we could use it in very rare cases like the pathological SPLIT_MANY_DUPLICATES case that _bt_bestsplitloc() has some defenses against. It would perhaps be nice if we could use +inf selectively in that case. I never said anything about this publicly before now, mostly because it wasn't that important -- pathological behaviors like this have never been reported on by users a full 18 months after the release of 12.0, so it's unlikely to be a real concern. > > If you're going to pursue full prefix compression anyway, maybe you > > should use a low key on the leaf level in cases where the optimization > > is in use. This creates complexity during page deletion, because the > > low key in the subtree to the right of the deletion target subtree may > > need to be updated. Perhaps you can find a way to make that work that > > isn't too complicated. > > That would be an interesting research path as well, the cost/benefit > analysis would be much trickier when comparing to the status quo. I'd say that's unclear right now. > > You should probably account for index size here. I have lots of my own > > tests for space utilization, using data from a variety of sources. > > I'd like to mention that the current (and measured) patchset only does > _logical_ dynamic prefix truncation, not the physical prefix > truncation that is described on the wiki page. If you change how _bt_truncate() behaves in any way (e.g. sometimes it's lastleft/+inf based now), and nothing else, you're still bound to change the space utilization with the tests that I maintain -- though perhaps only at the level of noise. I sometimes call these tests "wind tunnel tests". It turns out that you can simulate rather a lot about a real complicated workload with simple, deterministic, serial test cases -- provided you're only interested in the space utilization. This helped a lot for both the Postgres 12 and Postgres 13 stuff (though not the Postgres 14 stuff). > These patches are > useful on their own for multi-key-column btree performance (and some > GIST), regardless of later patches implementing physical dynamic > prefix truncation in the btree AM. Have you isolated the performance impact of the first patch at all? Can you characterize how well it works on its own, perhaps just informally? It would be convenient if the first patch could be treated as an independent thing. -- Peter Geoghegan
Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation
From
Matthias van de Meent
Date:
On Sat, 17 Apr 2021 at 01:05, Peter Geoghegan <pg@bowt.ie> wrote: > > On Fri, Apr 16, 2021 at 2:20 PM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > > Interesting approach. I think that in an ideal world we would have a > > > tuple format with attribute lengths/offsets right in the header. > > > > I believe that that would indeed be ideal w.r.t. access speed, but > > also quite expensive w.r.t. amount of data stored. This would add 2 > > bytes per attribute in the current infrastructure (11 bits at least > > for each attribute to store offsets), on the current 12 bytes of > > overhead per indextuple (= 8 for IndexTuple + 4 for ItemId). That is > > probably always going to be a non-starter, seeing that we can > > relatively easily optimize our current attribute access patterns. > > I don't think that that's why it's a non-starter. This design assumes > a world in which everything has already been optimized for this > layout. You no longer get to store the varlena header inline, which > would break a lot of things in Postgres were it ever to be attempted. > The space efficiency issues don't really apply because you have an > offset for fixed-length types -- their presence is always implied. I > think that you need to encode NULLs differently, which is a lot less > space efficient when there are a lot of NULLs. But on the whole this > design seems more efficient than what we have currently. I believe that that depends on your definition of 'efficiency'. For storage efficiency, the current design is quite good (except for the varlena header size of 4 bytes for attributes > 127 bytes, which could be 2 bytes because pages can not be larger than 64kiB (actually 32kiB) with our current design, all attributes use just about the least data possible). For access efficiency / code complexity, you're probably right that storing attribute offsets in the tuple header is preferable, but such design would still need some alignment calls, or store the length of attributes as well to prevent reading the alignment padding of the next attribute into the variable length attribute at the additional overhead of up to 2 bytes per attribute. > This representation of index tuples would be a totally reasonable > design were we in a green field situation. (Which is pretty far from > the situation we're actually in, of course.) That might indeed be the case, assuming a green field with different or no processing architecture or storage limitations. CPU to storage bandwidth can be (and often is) a bottleneck, as well as alignment. > > I understand and appreciate that the "-INF" truncation that is > > currently in place is being relied upon in quite some places. Part of > > the effort for "+INF" truncation would be determining where and how to > > keep the benefits of the "-INF" truncation. I also believe that for > > internal pages truncating to "+INF" would be perfectly fine; the > > optimizations that I know of only rely on it at the leaf level. > > I don't doubt that there is nothing special about -inf from a key > space point of view. Actually...you could say that -inf is special to > the limited extent that we know it only appears in pivot tuples and > exploit that property when the !pivotsearch case/optimization is used. > But that isn't much of an exception at a high level, so whatever. > > Anyway, it follows that +inf could in principle be used instead in > some or all cases -- all that is truly essential for correctness is > that the invariants always be respected. We're still in agreement up > until here. Agreed > > Yes, I also read and appreciate your comments on +inf vs -inf when > > this came up in [0]. > > I'm impressed that you've done your homework on this. > > > However, if we could choose, I think that having > > both options could be quite beneficial, especially when dealing with > > many duplicates or duplicate prefixes. > > This is where things are much less clear -- maybe we're not in > agreement here. Who knows, though -- maybe you're right. But you > haven't presented any kind of argument. I understand that it's hard to > articulate what effects might be in play with stuff like this, so I > won't force the issue now. Strong evidence is of course the only way > that you'll reliably convince me of this. > > I should point out that I am a little confused about how this +inf > business could be both independently useful and pivotal to > implementing [dynamic] prefix truncation/compression. Seems...weird to > discuss them together, except maybe to mention in passing that this > +inf thing is notable for particularly helping dynamic prefix stuff -- > which is it? I agree that my reasoning might have been unclear and confusing. I mean that most benefits that we could receive from +inf would be in improving the ability to apply [dynamic] prefix truncation on a page by limiting the keyspace of that page to 'local' values. If prefix truncation is impossible / does not apply for some index (a single unique column !allequalimage index is a likely worst case scenario), then applying +inf would potentially be detrimental to the performance of certain other optimizations (e.g. the continuescan optimization), in which case using -inf would probably be preferable. Ergo, I'm planning on making _bt_recsplitloc aware of +inf and -inf after implementing physical prefix truncation, and allow it to decide if and when each should be applied, if it turns out it consistently improves space and/or time performance without significantly decreasing either. > It is my strong preference that nbtsplitloc.c continue to know > approximately nothing about compression or deduplication. While it is > true that nbtsplitloc.c's _bt_recsplitloc() is aware of posting lists, > this is strictly an optimization that is only justified by the fact > that posting lists are sometimes very large, and therefore worth > considering directly -- just to get a more accurate idea of how a > relevant split point choice affects the balance of free space (we > don't bother to do the same thing with non-key INCLUDE columns because > they're generally small and equi-sized). And so this _bt_recsplitloc() > thing no exception to the general rule, which is: > deduplication/posting list maintenance should be *totally* orthogonal > to the page split choice logic (the design of posting list splits > helps a lot with that). We can afford to have complicated split point > choice logic because the question of which split point is optimal is > totally decoupled from the question of which are correct -- in > particular, from the correctness of the space accounting used to > generate candidate split points. I would argue that it also knows about duplicate attributes and shared prefixes? It already optimizes (unintentionally?) for deduplication by choosing split points between two runs of equal values. I believe that implementing the same for prefixes (if not already in place) would not stand out too much. I think we can discuss that more extensively when we actually have code that would benefit from that. > It may interest you to know that I once thought that it would be nice > to have the *option* of +inf too, so that we could use it in very rare > cases like the pathological SPLIT_MANY_DUPLICATES case that > _bt_bestsplitloc() has some defenses against. It would perhaps be nice > if we could use +inf selectively in that case. I never said anything > about this publicly before now, mostly because it wasn't that > important -- pathological behaviors like this have never been reported > on by users a full 18 months after the release of 12.0, so it's > unlikely to be a real concern. I do not per se disagree, but I should note that the amazing work on btree page split prevention through 'heapkeyspace', deduplication and eager tuple deletion have changed some key behaviours of btree index pages. The same would likely occur once physical prefix truncation is implemented, and in that case I believe that some decisions that were previously non-problematic might need to be re-examined. > > > If you're going to pursue full prefix compression anyway, maybe you > > > should use a low key on the leaf level in cases where the optimization > > > is in use. This creates complexity during page deletion, because the > > > low key in the subtree to the right of the deletion target subtree may > > > need to be updated. Perhaps you can find a way to make that work that > > > isn't too complicated. > > > > That would be an interesting research path as well, the cost/benefit > > analysis would be much trickier when comparing to the status quo. > > I'd say that's unclear right now. I agree. My 'trickier' pointed to that "adding an extra non-key tuple to the page" needs solid understanding and reasoning about the use of the AM to prove that it's worth the extra metadata on the page. Proving that is, in my opinion, difficult. > > > You should probably account for index size here. I have lots of my own > > > tests for space utilization, using data from a variety of sources. > > > > I'd like to mention that the current (and measured) patchset only does > > _logical_ dynamic prefix truncation, not the physical prefix > > truncation that is described on the wiki page. > > If you change how _bt_truncate() behaves in any way (e.g. sometimes > it's lastleft/+inf based now), and nothing else, you're still bound to > change the space utilization with the tests that I maintain -- though > perhaps only at the level of noise. I sometimes call these tests "wind > tunnel tests". It turns out that you can simulate rather a lot about a > real complicated workload with simple, deterministic, serial test > cases -- provided you're only interested in the space utilization. > This helped a lot for both the Postgres 12 and Postgres 13 stuff > (though not the Postgres 14 stuff). I would be interested in running these benchmarks when I get to updating the physical format. Good to know there are relatively easy tests available. > > These patches are > > useful on their own for multi-key-column btree performance (and some > > GIST), regardless of later patches implementing physical dynamic > > prefix truncation in the btree AM. > > Have you isolated the performance impact of the first patch at all? > Can you characterize how well it works on its own, perhaps just > informally? The REINDEX performance results is the place where attribute iteration shines best, as the hot path in reindex is the tuple comparison which used index_getattr a lot in its hot path, and the dynamic prefix truncation is not applicable there (yet?). Its time spent went down by over 6% for the indexes with 3 key columns of variable length, whereas the indexes with only a single fixed-size attribute took only slightly longer (+0.6% avg in 3 runs on a laptop, high variance). I have not tested it with GIST, but I believe that similar results are realistic there as well for varlen attributes. > It would be convenient if the first patch could be treated > as an independent thing. Patch 0002 was the reason for writing 0001, and uses the performance improvements of 0001 to show it's worth. As such, I submitted them as a set. If you'd like, I could submit 0002 seperately? With regards, Matthias van de Meent [+] instead of starting _binsrch with only the high key compare result, we could also eagerly compare the search key to the lowest key. This way, we have high+low bounds for the whole page, instead of having that only after finding a key < searchkey on the page. The effort might just as well not be worth it, as it is one extra key compare (out of max 9 on a page, plus one highkey).
Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation
From
Matthias van de Meent
Date:
On Fri, 23 Apr 2021 at 12:45, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > On Sat, 17 Apr 2021 at 01:05, Peter Geoghegan <pg@bowt.ie> wrote: > > > It would be convenient if the first patch could be treated > > as an independent thing. > > Patch 0002 was the reason for writing 0001, and uses the performance > improvements of 0001 to show it's worth. As such, I submitted them as > a set. If you'd like, I could submit 0002 seperately? For now, version 2 of the patchset to make MSVC and cfbot happy (only fixes the compilation issues, no significant changes). I'll try to benchmark the patches in this patchset (both 0001, and 0001+0002) in the upcoming weekend. Kind regards, Matthias van de Meent.
Attachment
Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation
From
Matthias van de Meent
Date:
On Thu, 17 Jun 2021 at 17:14, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > I'll try to > benchmark the patches in this patchset (both 0001, and 0001+0002) in > the upcoming weekend. Somewhat delayed, benchmark results are attached. These are based on 7 iterations of the attached benchmark script ('scratch.sql'), with the latest 'UK Price Paid' dataset. Again, the index_test table is an unlogged copy of the land_registry_price_paid_uk table, with one additional trailing always_null column. Results for 0001 are quite good in the target area of multi-column indexes in which attcacheoff cannot be used (2-4% for insertion workloads, 4-12% for reindex workloads), but regresses slightly for the single unique column insertion test, and are quite a bit worse for both insert and reindex cases for the attcacheoff-enabled multi-column index (4% and 18% respectively (!)). With 0001+0002, further improvements are made in the target area (now 4-7% for the various insertion workloads, 5-14% for reindex). The regression in the insert- and reindex-workload in attcacheoff-enabled multi-column indexes is still substantial, but slightly less bad (down to a 2% and 15% degradation respectively). Evidently, this needs improvements in the (likely common) attcacheoff-enabled multi-column case; as I don't think we can reasonably commit a 10+% regression. I'll work on that this weekend. Kind regards, Matthias van de Meent Benchmarks were all performed on WSL2 running Debian 10, on an AMD 5950X, with shared_buffers = 15GB (which should fit the dataset three times), enable_indexscan = off, autovacuum disabled, and parallel workers disabled on the tables, so that the results should be about as stable as it gets.
Attachment
Re: Iterating on IndexTuple attributes and nbtree page-level dynamic prefix truncation
From
Daniel Gustafsson
Date:
> On 24 Jun 2021, at 18:21, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > On Thu, 17 Jun 2021 at 17:14, Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: >> >> I'll try to >> benchmark the patches in this patchset (both 0001, and 0001+0002) in >> the upcoming weekend. > > Somewhat delayed, benchmark results are attached. I'm moving this patch to the next CF to allow for more review of the latest revision and benchmark results. It no longer applies though, so please post a rebased version. -- Daniel Gustafsson https://vmware.com/