Thread: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
From
Matthias van de Meent
Date:
Hi, Currently, nbtree code compares each and every column of an index tuple during the binary search on the index page. With large indexes that have many duplicate prefix column values (e.g. an index on (bool, bool, uuid) ) that means a lot of wasted time getting to the right column. The attached patch improves on that by doing per-page dynamic prefix truncation: If we know that on both the right and left side there are index tuples where the first two attributes are equal to the scan key, we skip comparing those attributes at the current index tuple and start with comparing attribute 3, saving two attribute compares. We gain performance whenever comparing prefixing attributes is expensive and when there are many tuples with a shared prefix - in unique indexes this doesn't gain much, but we also don't lose much in this case. This patch was originally suggested at [0], but it was mentioned that they could be pulled out into it's own thread. Earlier, the performance gains were not clearly there for just this patch, but after further benchmarking this patch stands on its own for performance: it sees no obvious degradation of performance, while gaining 0-5% across various normal indexes on the cc-complete sample dataset, with the current worst-case index shape getting a 60%+ improved performance on INSERTs in the tests at [0]. Kind regards, Matthias van de Meent Neon (https://neon.tech) PS. Best served with the downlink right separator/HIKEY optimization (separate patch to be submitted soon), and specialization over at [0]. [0] https://www.postgresql.org/message-id/CAEze2WiqOONRQTUT1p_ZV19nyMA69UU2s0e2dp+jSBM=j8snuw@mail.gmail.com
Attachment
Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
From
Pavel Stehule
Date:
Hi
út 31. 10. 2023 v 22:12 odesílatel Matthias van de Meent <boekewurm+postgres@gmail.com> napsal:
Hi,
Currently, nbtree code compares each and every column of an index
tuple during the binary search on the index page. With large indexes
that have many duplicate prefix column values (e.g. an index on (bool,
bool, uuid) ) that means a lot of wasted time getting to the right
column.
The attached patch improves on that by doing per-page dynamic prefix
truncation: If we know that on both the right and left side there are
index tuples where the first two attributes are equal to the scan key,
we skip comparing those attributes at the current index tuple and
start with comparing attribute 3, saving two attribute compares. We
gain performance whenever comparing prefixing attributes is expensive
and when there are many tuples with a shared prefix - in unique
indexes this doesn't gain much, but we also don't lose much in this
case.
This patch was originally suggested at [0], but it was mentioned that
they could be pulled out into it's own thread. Earlier, the
performance gains were not clearly there for just this patch, but
after further benchmarking this patch stands on its own for
performance: it sees no obvious degradation of performance, while
gaining 0-5% across various normal indexes on the cc-complete sample
dataset, with the current worst-case index shape getting a 60%+
improved performance on INSERTs in the tests at [0].
+1
This can be nice functionality. I had a customer with a very slow index scan - the main problem was a long common prefix like prg010203xxxx.
Regards
Pavel
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
PS. Best served with the downlink right separator/HIKEY optimization
(separate patch to be submitted soon), and specialization over at [0].
[0] https://www.postgresql.org/message-id/CAEze2WiqOONRQTUT1p_ZV19nyMA69UU2s0e2dp+jSBM=j8snuw@mail.gmail.com
Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
From
Matthias van de Meent
Date:
On Wed, 1 Nov 2023 at 07:47, Pavel Stehule <pavel.stehule@gmail.com> wrote: > > Hi > > út 31. 10. 2023 v 22:12 odesílatel Matthias van de Meent <boekewurm+postgres@gmail.com> napsal: >> This patch was originally suggested at [0], but it was mentioned that >> they could be pulled out into it's own thread. Earlier, the >> performance gains were not clearly there for just this patch, but >> after further benchmarking this patch stands on its own for >> performance: it sees no obvious degradation of performance, while >> gaining 0-5% across various normal indexes on the cc-complete sample >> dataset, with the current worst-case index shape getting a 60%+ >> improved performance on INSERTs in the tests at [0]. > > > +1 Thanks for showing interest. > This can be nice functionality. I had a customer with a very slow index scan - the main problem was a long common prefixlike prg010203xxxx. I'll have to note that this patch doesn't cover cases where e.g. text attributes have large shared prefixes, but are still unique: the dynamic prefix compression in this patch is only implemented at the tuple attribute level; it doesn't implement type aware dynamic prefix compression inside the attributes. So, a unique index on a column of int32 formatted like '%0100i' would not materially benefit from this patch. While would certainly be possible to add some type-level prefix truncation in the framework of this patch, adding that would require significant code churn in btree compare operators, because we'd need an additional return argument to contain a numerical "shared prefix", and that is not something I was planning to implement in this patch. Kind regards, Matthias van de Meent Neon (https://neon.tech)
Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
From
Pavel Stehule
Date:
st 1. 11. 2023 v 11:32 odesílatel Matthias van de Meent <boekewurm+postgres@gmail.com> napsal:
On Wed, 1 Nov 2023 at 07:47, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>
> Hi
>
> út 31. 10. 2023 v 22:12 odesílatel Matthias van de Meent <boekewurm+postgres@gmail.com> napsal:
>> This patch was originally suggested at [0], but it was mentioned that
>> they could be pulled out into it's own thread. Earlier, the
>> performance gains were not clearly there for just this patch, but
>> after further benchmarking this patch stands on its own for
>> performance: it sees no obvious degradation of performance, while
>> gaining 0-5% across various normal indexes on the cc-complete sample
>> dataset, with the current worst-case index shape getting a 60%+
>> improved performance on INSERTs in the tests at [0].
>
>
> +1
Thanks for showing interest.
> This can be nice functionality. I had a customer with a very slow index scan - the main problem was a long common prefix like prg010203xxxx.
I'll have to note that this patch doesn't cover cases where e.g. text
attributes have large shared prefixes, but are still unique: the
dynamic prefix compression in this patch is only implemented at the
tuple attribute level; it doesn't implement type aware dynamic prefix
compression inside the attributes. So, a unique index on a column of
int32 formatted like '%0100i' would not materially benefit from this
patch.
While would certainly be possible to add some type-level prefix
truncation in the framework of this patch, adding that would require
significant code churn in btree compare operators, because we'd need
an additional return argument to contain a numerical "shared prefix",
and that is not something I was planning to implement in this patch.
Thanks for the explanation.
Pavel
Kind regards,
Matthias van de Meent
Neon (https://neon.tech)
Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
From
Dilip Kumar
Date:
On Wed, Nov 1, 2023 at 2:42 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > Hi, > > Currently, nbtree code compares each and every column of an index > tuple during the binary search on the index page. With large indexes > that have many duplicate prefix column values (e.g. an index on (bool, > bool, uuid) ) that means a lot of wasted time getting to the right > column. > > The attached patch improves on that by doing per-page dynamic prefix > truncation: If we know that on both the right and left side there are > index tuples where the first two attributes are equal to the scan key, > we skip comparing those attributes at the current index tuple and > start with comparing attribute 3, saving two attribute compares. We > gain performance whenever comparing prefixing attributes is expensive > and when there are many tuples with a shared prefix - in unique > indexes this doesn't gain much, but we also don't lose much in this > case. > > This patch was originally suggested at [0], but it was mentioned that > they could be pulled out into it's own thread. Earlier, the > performance gains were not clearly there for just this patch, but > after further benchmarking this patch stands on its own for > performance: it sees no obvious degradation of performance, while > gaining 0-5% across various normal indexes on the cc-complete sample > dataset, with the current worst-case index shape getting a 60%+ > improved performance on INSERTs in the tests at [0]. +1 for the idea, I have some initial comments while reading through the patch. 1. Commit message refers to a non-existing reference '(see [0])'. 2. +When we do a binary search on a sorted set (such as a BTree), we know that a +tuple will be smaller than its left neighbour, and larger than its right +neighbour. I think this should be 'larger than left neighbour and smaller than right neighbour' instead of the other way around. 3. +With the above optimizations, dynamic prefix truncation improves the worst +case complexity of indexing from O(tree_height * natts * log(tups_per_page)) +to O(tree_height * (3*natts + log(tups_per_page))) Where do the 3*natts come from? Is it related to setting up the dynamic prefix at each level? 4. + /* + * All tuple attributes are equal to the scan key, only later attributes + * could potentially not equal the scan key. + */ + *compareattr = ntupatts + 1; Can you elaborate on this more? If all tuple attributes are equal to the scan key then what do those 'later attributes' mean? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
From
Matthias van de Meent
Date:
On Fri, 19 Jan 2024 at 05:55, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Wed, Nov 1, 2023 at 2:42 AM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > > > Hi, > > > > Currently, nbtree code compares each and every column of an index > > tuple during the binary search on the index page. With large indexes > > that have many duplicate prefix column values (e.g. an index on (bool, > > bool, uuid) ) that means a lot of wasted time getting to the right > > column. > > > > The attached patch improves on that by doing per-page dynamic prefix > > truncation: If we know that on both the right and left side there are > > index tuples where the first two attributes are equal to the scan key, > > we skip comparing those attributes at the current index tuple and > > start with comparing attribute 3, saving two attribute compares. We > > gain performance whenever comparing prefixing attributes is expensive > > and when there are many tuples with a shared prefix - in unique > > indexes this doesn't gain much, but we also don't lose much in this > > case. > > > > This patch was originally suggested at [0], but it was mentioned that > > they could be pulled out into it's own thread. Earlier, the > > performance gains were not clearly there for just this patch, but > > after further benchmarking this patch stands on its own for > > performance: it sees no obvious degradation of performance, while > > gaining 0-5% across various normal indexes on the cc-complete sample > > dataset, with the current worst-case index shape getting a 60%+ > > improved performance on INSERTs in the tests at [0]. > > +1 for the idea, I have some initial comments while reading through the patch. Thank you for the review. > 1. > Commit message refers to a non-existing reference '(see [0])'. Noted, I'll update that. > 2. > +When we do a binary search on a sorted set (such as a BTree), we know that a > +tuple will be smaller than its left neighbour, and larger than its right > +neighbour. > > I think this should be 'larger than left neighbour and smaller than > right neighbour' instead of the other way around. Noted, will be fixed, too. > 3. > +With the above optimizations, dynamic prefix truncation improves the worst > +case complexity of indexing from O(tree_height * natts * log(tups_per_page)) > +to O(tree_height * (3*natts + log(tups_per_page))) > > Where do the 3*natts come from? Is it related to setting up the > dynamic prefix at each level? Yes: We need to establish prefixes for both a tuple that's ahead of the to-be-compared tuple, and one that's after the to-be-compared tuple. Assuming homogenous random distribution of scan key accesses across the page (not always the case, but IMO a reasonable starting point) this would average to 3 unprefixed compares before you have established both a higher and a lower prefix. > 4. > + /* > + * All tuple attributes are equal to the scan key, only later attributes > + * could potentially not equal the scan key. > + */ > + *compareattr = ntupatts + 1; > > Can you elaborate on this more? If all tuple attributes are equal to > the scan key then what do those 'later attributes' mean? In inner pages, tuples may not have all key attributes, as some may have been truncated away in page splits. So, tuples that have at least the same prefix as this (potentially truncated) tuple will need to be compared starting at the first missing attribute of this tuple, i.e. ntupatts + 1. Kind regards, Matthias van de Meent
Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
From
Matthias van de Meent
Date:
On Wed, 24 Jan 2024 at 13:02, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > 1. > > Commit message refers to a non-existing reference '(see [0])'. > > Noted, I'll update that. > > > 2. > > +When we do a binary search on a sorted set (such as a BTree), we know that a > > +tuple will be smaller than its left neighbour, and larger than its right > > +neighbour. > > > > I think this should be 'larger than left neighbour and smaller than > > right neighbour' instead of the other way around. > > Noted, will be fixed, too. Attached is version 15 of this patch, with the above issues fixed. It's also rebased on top of 655dc310 of this morning, so that should keep good for some time again. Kind regards, Matthias van de Meent Neon (https://neon.tech)
Attachment
Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
From
Matthias van de Meent
Date:
On Fri, 1 Mar 2024 at 14:48, Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > Attached is version 15 of this patch, with the above issues fixed. > It's also rebased on top of 655dc310 of this morning, so that should > keep good for some time again. Attached is version 16 now. Relevant changes from previous patch versions: - Move from 1-indexed AttributeNumber to 0-indexed ints for prefixes, and use "prefix" as naming scheme, rather than cmpcol. A lack of prefix, previously indicated with a cmpcol value of 1, is now a prefix value of 0. - Adjusted README - Improved the efficiency of the insertion path in some cases where we've previously compared the page's highkey. As always, why we need this: Currently, btrees are quite inefficient when they have many key attributes but low attribute cardinality in the prefix, e.g. an index on ("", "", "", uuid). This is not just inefficient use of disk space with the high repetition of duplicate prefix values in pages, but it is also a computational overhead when we're descending the tree in e.g. _bt_first() or btinsert(): The code we use to search a page currently compares the full key to the full searchkey, for a complexity of O(n_equal_attributes + 1) for every tuple on the page, for O(log(page_ntups) * (n_equal_attributes + 1)) attribute compares every page during descent. This patch fixes that part of the computational complexity by applying dynamic prefix compression, thus reducing the average computational complexity in random index lookups to O(3 * (n_equal_attributes) + log(page_ntups)) per page (assuming at least 4 non-hikey tuples on each page). In practice, this makes indexes with 3+ attributes and prefixes with low selectivity (such as the example above) much more viable computationally, as we have to spend much less effort on comparing known index attributes during descent. Note that this does _not_ reuse prefix bounds across pages - it re-establishes the left- and right prefixes every page during the binary search. See the README modified in the patch for specific implementation details and considerations. This patch synergizes with the highkey optimization used in [0]: When combined, the number of attribute compare function calls could be further reduced to O(2 * (n_equal_atts) + log(page_ntups)), a reduction by n_equal_atts every page, which in certain wide index types could be over 25% of all attribute compare function calls on the page after dynamic prefix truncation. However, both are separately useful and reduce the amount of work done on most pages. Kind regards, Matthias van de Meent Neon (https://neon.tech) [0] https://www.postgresql.org/message-id/flat/CAEze2WijWhCQy_nZVP4Ye5Hwj=YW=3rqv+hbMJGcOHtrYQmyKw@mail.gmail.com
Attachment
Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
From
Peter Geoghegan
Date:
On Tue, Aug 6, 2024 at 5:42 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > Attached is version 16 now. I ran this with my old land registry benchmark, used for validating the space utilization impact of nbtree deduplication (among other things). This isn't obviously the best benchmark for this sort of thing, but I seem to recall that you'd used it yourself at some point. To validate work in this area, likely including this patch. So I decided to start there. To be clear, this test involves bulk loading of an unlogged table (the land registry table). The following composite index is created on the table before we insert any rows, so most of the cycles here are in index maintenance including _bt_search descents: CREATE INDEX composite ON land2 USING btree (county COLLATE "C", city COLLATE "C", locality COLLATE "C"); I wasn't able to see much of an improvement with this patch applied. It went from ~00:51.598 to ~00:51.053. That's a little disappointing, given that this is supposed to be a sympathetic case for the patch. Can you suggest something else? (Granted, I understand that this patch has some complicated relationship with other patches of yours, which I don't understand currently.) I'm a bit worried about side-effects for this assertion: @@ -485,7 +489,7 @@ _bt_check_unique(Relation rel, BTInsertState insertstate, Relation heapRel, Assert(insertstate->bounds_valid); Assert(insertstate->low >= P_FIRSTDATAKEY(opaque)); Assert(insertstate->low <= insertstate->stricthigh); - Assert(_bt_compare(rel, itup_key, page, offset) < 0); + Assert(_bt_compare(rel, itup_key, page, offset, &sprefix) < 0); break; } More generally, it's not entirely clear how the code in _bt_check_unique is supposed to work with the patch. Why should it be safe to do what you're doing with the prefix there? It's not like we're doing a binary search here -- it's more like a linear search. > - Move from 1-indexed AttributeNumber to 0-indexed ints for prefixes, > and use "prefix" as naming scheme, rather than cmpcol. A lack of > prefix, previously indicated with a cmpcol value of 1, is now a prefix > value of 0. Found a likely-related bug in the changes you made to amcheck, which I was able to fix myself like so: diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c index c7dc6725a..15be61777 100644 --- a/contrib/amcheck/verify_nbtree.c +++ b/contrib/amcheck/verify_nbtree.c @@ -3187,7 +3187,7 @@ bt_rootdescend(BtreeCheckState *state, IndexTuple itup) insertstate.buf = lbuf; /* Get matching tuple on leaf page */ - offnum = _bt_binsrch_insert(state->rel, &insertstate, 1); + offnum = _bt_binsrch_insert(state->rel, &insertstate, 0); /* Compare first >= matching item on leaf page, if any */ -- Peter Geoghegan
Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
From
Dmitry Dolgov
Date:
> On Tue, Aug 13, 2024 at 02:39:10PM GMT, Peter Geoghegan wrote: > On Tue, Aug 6, 2024 at 5:42 PM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > Attached is version 16 now. > > I ran this with my old land registry benchmark, used for validating > the space utilization impact of nbtree deduplication (among other > things). This isn't obviously the best benchmark for this sort of > thing, but I seem to recall that you'd used it yourself at some point. > To validate work in this area, likely including this patch. So I > decided to start there. > > To be clear, this test involves bulk loading of an unlogged table (the > land registry table). The following composite index is created on the > table before we insert any rows, so most of the cycles here are in > index maintenance including _bt_search descents: > > CREATE INDEX composite ON land2 USING btree (county COLLATE "C", city > COLLATE "C", locality COLLATE "C"); > > I wasn't able to see much of an improvement with this patch applied. > It went from ~00:51.598 to ~00:51.053. That's a little disappointing, > given that this is supposed to be a sympathetic case for the patch. > Can you suggest something else? (Granted, I understand that this patch > has some complicated relationship with other patches of yours, which I > don't understand currently.) Under the danger of showing my ignorance, what is the definition of land registry benchmark? I think it would be useful if others could reproduce the results as well, especially if they're somewhat surprising. At the same time I would like to note, that dynamic prefix truncation can be useful not only on realistic data, but in some weird-but-useful cases, e.g. for partitioned B-Tree with an artificial leading key separating partitions. It's a hypothetical case, which can't justify a feature of course, but makes it worth investigating.
Re: btree: implement dynamic prefix truncation (was: Improving btree performance through specializing by key shape, take 2)
From
Peter Geoghegan
Date:
On Wed, Nov 13, 2024 at 3:30 PM Dmitry Dolgov <9erthalion6@gmail.com> wrote: > > On Tue, Aug 13, 2024 at 02:39:10PM GMT, Peter Geoghegan wrote: > > On Tue, Aug 6, 2024 at 5:42 PM Matthias van de Meent > > To be clear, this test involves bulk loading of an unlogged table (the > > land registry table). The following composite index is created on the > > table before we insert any rows, so most of the cycles here are in > > index maintenance including _bt_search descents: > > > > CREATE INDEX composite ON land2 USING btree (county COLLATE "C", city > > COLLATE "C", locality COLLATE "C"); > Under the danger of showing my ignorance, what is the definition of land > registry benchmark? I think it would be useful if others could reproduce > the results as well, especially if they're somewhat surprising. It's a sample dataset that I've found useful from time to time, particularly when testing nbtree features. Usually using a composite index like the one I described. One slightly useful (though far from unique) property of such an index is that it contains data that's low cardinality (sometimes extremely low cardinality) across multiple index columns. With variable-width (text) index columns. That specific combination made the index a decent test of certain issues affecting the nbtsplitloc.c split point choice logic during work on Postgres 12 and 13. I believe that Matthias independently found it useful on a number of other occasions, too. See https://wiki.postgresql.org/wiki/Sample_Databases for instructions on how to set it up for yourself. You could probably come up with a way of generating a similar dataset, without needing to download anything, though. The fact that I found it useful in the past is at least somewhat arbitrary. -- Peter Geoghegan