Thread: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Peter Geoghegan
Date:
As I recently went into on the thread where we've been discussing my nbtree SAOP patch [1], there is good reason to suspect that one of the optimizations added by commit e0b1ee17 is buggy in the presence of an opfamily lacking the full set of cross-type comparisons. The attached test case confirms that these suspicions were correct. Running the tese case against HEAD will lead to an assertion failure (or a wrong answer when assertions are disabled). To recap, the optimization in question (which is not to be confused with the "precheck" optimization from the same commit) is based on the idea that _bt_first must always land the scan ahead of the position that the scan would end on, were the scan direction to change (from forwards to backwards, say). It follows that inequality strategy scan keys that are required in the opposite-to-scan direction *only* must be redundant in the current scan direction (in the sense that _bt_checkkeys needn't bother comparing them at all). Unfortunately, that rationale is at least slightly wrong. Although some version of the same assumption must really hold in the case of required equality strategy scan keys (old comments in _bt_checkkeys and in _bt_first say as much), it isn't really guaranteed in the case of inequalities. In fact, the following sentence appears in old comments above _bt_preprocess_keys, directly contradicting the theory behind the optimization in question: "In general, when inequality keys are present, the initial-positioning code only promises to position before the first possible match, not exactly at the first match, for a forward scan; or after the last match for a backward scan." My test case mostly just demonstrates how to reproduce the scenario described by this sentence. It's probably possible to salvage the optimization, but that will require bookkeeping sufficient to detect these unsafe cases, so that _bt_checkkeys only skips the comparisons when it's truly safe. As far as I know, the only reason that inequalities differ from equalities is this respect is the issue that the test case highlights. (There were also issues with NULLs, but AFAICT Alexander dealt with that aspect of the problem already.) [1] https://postgr.es/m/CAH2-Wz=BuxYEHxpNH0tPvo=+G1WtE1PamRoVU1dEVow1Vy9Y7A@mail.gmail.com -- Peter Geoghegan
Attachment
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Tom Lane
Date:
Peter Geoghegan <pg@bowt.ie> writes: > As I recently went into on the thread where we've been discussing my > nbtree SAOP patch [1], there is good reason to suspect that one of the > optimizations added by commit e0b1ee17 is buggy in the presence of an > opfamily lacking the full set of cross-type comparisons. The attached > test case confirms that these suspicions were correct. Running the > tese case against HEAD will lead to an assertion failure (or a wrong > answer when assertions are disabled). Hmm ... I had not paid any attention to this commit, but the rationale given in the commit message is just flat wrong: Imagine the ordered B-tree scan for the query like this. SELECT * FROM tbl WHERE col > 'a' AND col < 'b' ORDER BY col; The (col > 'a') scan key will be always matched once we find the location to start the scan. The (col < 'b') scan key will match every item on the page as long as it matches the last item on the page. That argument probably holds for the index's first column, but it is completely and obviously wrong for every following column. Nonetheless it looks like we're trying to apply the optimization to every scan key. regards, tom lane
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Peter Geoghegan
Date:
On Tue, Dec 5, 2023 at 4:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Hmm ... I had not paid any attention to this commit, but the rationale > given in the commit message is just flat wrong: > > Imagine the ordered B-tree scan for the query like this. > > SELECT * FROM tbl WHERE col > 'a' AND col < 'b' ORDER BY col; > > The (col > 'a') scan key will be always matched once we find the location to > start the scan. The (col < 'b') scan key will match every item on the page > as long as it matches the last item on the page. > > That argument probably holds for the index's first column, but it is > completely and obviously wrong for every following column. Nonetheless > it looks like we're trying to apply the optimization to every scan key. Just to be clear, you're raising a concern that seems to me to apply to "the other optimization" from the same commit, specifically -- the precheck optimization. Not the one I found a problem in. (They're closely related but distinct optimizations.) I *think* that that part is handled correctly, because non-required scan keys are not affected (by either optimization). I have no specific reason to doubt the proposition that 'b' could only be marked required in situations where it is indeed safe to assume that the col < 'b' condition must also apply to earlier tuples transitively (i.e. this must be true because it was true for the the final tuple on the page during the _bt_readpage precheck). That being said, I wouldn't rule out problems for the precheck optimization in the presence of opfamilies like the one from my test case. I didn't get as far as exploring that side of things, at least. -- Peter Geoghegan
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Peter Geoghegan
Date:
On Tue, Dec 5, 2023 at 4:41 PM Peter Geoghegan <pg@bowt.ie> wrote: > "In general, when inequality keys are present, the initial-positioning > code only promises to position before the first possible match, not > exactly at the first match, for a forward scan; or after the last > match for a backward scan." > > My test case mostly just demonstrates how to reproduce the scenario > described by this sentence. I just realized that my test case wasn't quite minimized correctly. It depended on a custom function that was no longer created. Attached is a revised version that uses btint84cmp instead. -- Peter Geoghegan
Attachment
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Alexander Korotkov
Date:
Hi, Peter! On Wed, Dec 6, 2023 at 3:46 AM Peter Geoghegan <pg@bowt.ie> wrote: > On Tue, Dec 5, 2023 at 4:41 PM Peter Geoghegan <pg@bowt.ie> wrote: > > "In general, when inequality keys are present, the initial-positioning > > code only promises to position before the first possible match, not > > exactly at the first match, for a forward scan; or after the last > > match for a backward scan." > > > > My test case mostly just demonstrates how to reproduce the scenario > > described by this sentence. > > I just realized that my test case wasn't quite minimized correctly. It > depended on a custom function that was no longer created. > > Attached is a revised version that uses btint84cmp instead. Thank you for raising this issue. Preprocessing of btree scan keys is normally removing the redundant scan keys. However, redundant scan keys aren't removed when they have arguments of different types. Please give me a bit of time to figure out how to workaround this. ------ Regards, Alexander Korotkov
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Peter Geoghegan
Date:
On Tue, Dec 5, 2023 at 8:06 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > Thank you for raising this issue. Preprocessing of btree scan keys is > normally removing the redundant scan keys. However, redundant scan > keys aren't removed when they have arguments of different types. > Please give me a bit of time to figure out how to workaround this. Couldn't you condition the use of the optimization on _bt_preprocess_keys being able to use cross-type operators when it checked for redundant or contradictory scan keys? The vast majority of index scans will be able to do that. As I said already, what you're doing here isn't all that different to the way that we rely on required equality strategy scan keys being used to build our initial insertion scan key, that determines where the scan is initially positioned to within _bt_first. Inequalities aren't all that different to equalities. -- Peter Geoghegan
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Robert Haas
Date:
On Tue, Dec 5, 2023 at 8:15 PM Peter Geoghegan <pg@bowt.ie> wrote: > Just to be clear, you're raising a concern that seems to me to apply > to "the other optimization" from the same commit, specifically -- the > precheck optimization. Not the one I found a problem in. (They're > closely related but distinct optimizations.) It isn't very clear from the commit message that this commit is doing two different things, and in fact I'm still unclear on what exactly the other optimization is. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Matthias van de Meent
Date:
On Wed, 6 Dec 2023 at 14:11, Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Dec 5, 2023 at 8:15 PM Peter Geoghegan <pg@bowt.ie> wrote: > > Just to be clear, you're raising a concern that seems to me to apply > > to "the other optimization" from the same commit, specifically -- the > > precheck optimization. Not the one I found a problem in. (They're > > closely related but distinct optimizations.) > > It isn't very clear from the commit message that this commit is doing > two different things, and in fact I'm still unclear on what exactly > the other optimization is. I feel that Peter refered to these two distinct optimizations: 1. When scanning an index in ascending order using scankey a > 1 (so, one that defines a start point of the scan), we don't need to check items for consistency with that scankey once we've found the first value that is consistent with the scankey, as all future values will also be consistent with the scankey (if we assume no concurrent page deletions). 2. When scanning an index in ascending order using scankey a < 10 (one that defines an endpoint of the scan), we can look ahead and check if the last item on the page is consistent. If so, then all other items on the page will also be consistent with that scankey. Kind regards, Matthias van de Meent Neon (https://neon.tech)
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Robert Haas
Date:
On Wed, Dec 6, 2023 at 8:27 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > I feel that Peter refered to these two distinct optimizations: > > 1. When scanning an index in ascending order using scankey a > 1 (so, > one that defines a start point of the scan), we don't need to check > items for consistency with that scankey once we've found the first > value that is consistent with the scankey, as all future values will > also be consistent with the scankey (if we assume no concurrent page > deletions). > > 2. When scanning an index in ascending order using scankey a < 10 (one > that defines an endpoint of the scan), we can look ahead and check if > the last item on the page is consistent. If so, then all other items > on the page will also be consistent with that scankey. Oh, interesting. Thanks. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Peter Geoghegan
Date:
On Wed, Dec 6, 2023 at 5:27 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > On Wed, 6 Dec 2023 at 14:11, Robert Haas <robertmhaas@gmail.com> wrote: > > It isn't very clear from the commit message that this commit is doing > > two different things, and in fact I'm still unclear on what exactly > > the other optimization is. > > I feel that Peter refered to these two distinct optimizations: Right. > 2. When scanning an index in ascending order using scankey a < 10 (one > that defines an endpoint of the scan), we can look ahead and check if > the last item on the page is consistent. If so, then all other items > on the page will also be consistent with that scankey. Also worth noting that it could be "scankey a = 10". That is, the precheck optimization (i.e. the optimization that's not the target of my test case) can deal with equalities and inequalities just as well (any scan key that's required in the current scan direction is supported). On the other hand the required-in-opposite-direction-only optimization (i.e. the target of my test case) only applies to inequality strategy scan keys. It kinda makes sense to explain both concepts using an example that involves both > and < strategy inequalities, since that makes the symmetry between the two optimizations clearer. -- Peter Geoghegan
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Peter Geoghegan
Date:
On Tue, Dec 5, 2023 at 8:20 PM Peter Geoghegan <pg@bowt.ie> wrote: > On Tue, Dec 5, 2023 at 8:06 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > Thank you for raising this issue. Preprocessing of btree scan keys is > > normally removing the redundant scan keys. However, redundant scan > > keys aren't removed when they have arguments of different types. > > Please give me a bit of time to figure out how to workaround this. > > Couldn't you condition the use of the optimization on > _bt_preprocess_keys being able to use cross-type operators when it > checked for redundant or contradictory scan keys? The vast majority of > index scans will be able to do that. Some quick experimentation shows that my test case works as expected once _bt_preprocess_keys is taught to remember that it has seen a maybe-unsafe case, which it stashes in a special new field from the scan's state for later. As I said, this field can be treated as a condition of applying the required-in-opposite-direction-only optimization in _bt_readpage(). This new field would be analogous to the existing requiredMatchedByPrecheck state used by _bt_readpage() to determine whether it can apply the required-in-same-direction optimization. The new field works for the whole scan instead of just for one page, and it works based on information from "behind the scan" instead of information "just ahead of the scan". But the basic idea is the same. _bt_preprocess_keys is rather complicated. It is perhaps tempting to do this in a targeted way, that specifically limits itself to the exact cases that we know to be unsafe. But it might be okay to just disable the optimization in most or all cases where _bt_compare_scankey_args() returns false. That's likely to be very rare in practice, anyway (who really uses opfamilies like these?). Not really sure where to come down on that. -- Peter Geoghegan
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Peter Geoghegan
Date:
On Wed, Dec 6, 2023 at 5:27 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > 1. When scanning an index in ascending order using scankey a > 1 (so, > one that defines a start point of the scan), we don't need to check > items for consistency with that scankey once we've found the first > value that is consistent with the scankey, as all future values will > also be consistent with the scankey (if we assume no concurrent page > deletions). BTW, I don't think that page deletion is a concern for these optimizations in the way that it is for the similar idea of "dynamic prefix compression", which works against insertion-type scan keys (used to descend the tree and to do an initial binary search of a leaf page). We already rely on the first call to _bt_readpage (the one that takes place from within _bt_first rather than from _bt_next) passing a page offset number that's exactly at the start of where matches begin -- this is crucial in the case of scans with required equality strategy scan keys (most scans). If we just skipped the _bt_binsrch and passed P_FIRSTDATAKEY(opaque) to _bt_readpage within _bt_first instead, that would break lots of queries. So the _bt_binsrch step within _bt_first isn't just an optimization -- it's crucial. This is nothing new. Recall that _bt_readpage only deals with search-type scan keys, meaning scan keys that use a simple operator (so it uses = operators with the equality strategy, as opposed to using a 3-way ORDER proc/support function 1 that can tell the difference between < and >). In general _bt_readpage doesn't know how to tell the difference between a tuple that's before the start of = matches, and a tuple that's at (or after) the end of any = matches. If it is ever allowed to conflate these two cases, then we'll overlook matching tuples, which is of course wrong (it'll terminate the scan before it even starts). It is up to the caller (really just _bt_first) to never call _bt_readpage in a way that allows this confusion to take place -- which is what makes the _bt_binsrch step crucial. A race condition with page deletion might allow the key space covered by a leaf page to "widen" after we've left its parent, but before we arrive on the leaf page. But the _bt_binsrch step within _bt_first happens *after* we land on and lock that leaf page, in any case. So there is no risk of the scan ever doing anything with concurrently-inserted index tuples. In general we only have to worry about such race conditions when descending the tree -- they're not a problem after the scan has reached the leaf level and established an initial page offset number. (The way that _bt_readpage processes whole pages in one atomic step helps with this sort of thing, too. We can almost pretend that the B-Tree structure is immutable, even though that's obviously not really true at all. We know that we'll always make forward progress through the key space by remembering the next page to visit when processing each page.) My test case broke the required-in-opposite-direction-only optimization by finding a way in which required-in-opposite-direction-only inequalities were not quite the same as required equalities with respect to this business about the precise leaf page offset number that the scan begins at. They make *almost* the same set of guarantees (note in particular that both will be transformed into insertion scan key/3-way ORDER proc scan keys by _bt_first's initial positioning code), but there is at least one special case that applies only to inequalities. I had to play games with weird incomplete opfamilies to actually break the optimization -- that was required to tickle the special case in just the right/wrong way. -- Peter Geoghegan
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Matthias van de Meent
Date:
On Wed, 6 Dec 2023 at 19:55, Peter Geoghegan <pg@bowt.ie> wrote: > > On Wed, Dec 6, 2023 at 5:27 AM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > 1. When scanning an index in ascending order using scankey a > 1 (so, > > one that defines a start point of the scan), we don't need to check > > items for consistency with that scankey once we've found the first > > value that is consistent with the scankey, as all future values will > > also be consistent with the scankey (if we assume no concurrent page > > deletions). > > BTW, I don't think that page deletion is a concern for these > optimizations in the way that it is for the similar idea of "dynamic > prefix compression", which works against insertion-type scan keys > (used to descend the tree and to do an initial binary search of a leaf > page). > > We already rely on the first call to _bt_readpage (the one that takes > place from within _bt_first rather than from _bt_next) passing a page > offset number that's exactly at the start of where matches begin -- > this is crucial in the case of scans with required equality strategy > scan keys (most scans). If we just skipped the _bt_binsrch and passed > P_FIRSTDATAKEY(opaque) to _bt_readpage within _bt_first instead, that > would break lots of queries. So the _bt_binsrch step within _bt_first > isn't just an optimization -- it's crucial. This is nothing new. I was thinking more along the lines of page splits+deletions while we're doing _bt_stepright(), but forgot to consider that we first lock the right sibling, and only then release the left sibling for splits, so we should be fine here. Kind regards, Matthias van de Meent
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Peter Geoghegan
Date:
On Wed, Dec 6, 2023 at 11:14 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > I was thinking more along the lines of page splits+deletions while > we're doing _bt_stepright(), but forgot to consider that we first lock > the right sibling, and only then release the left sibling for splits, > so we should be fine here. In general the simplest (and possibly most convincing) arguments for the correctness of optimizations like the ones that Alexander added rely on seeing that the only way that the optimization can be wrong is if some more fundamental and long established thing was also wrong. We could try to prove that the new optimization is correct (or wrong), but it is often more helpful to "prove" that some much more fundamental thing is correct instead, if that provides us with a useful corollary about the new thing also being correct. Take the _bt_readpage precheck optimization, for example. Rather than thinking about the key space and transitive rules, it might be more helpful to focus on what must have been true in earlier Postgres versions, and what we can expect to still hold now. The only way that that optimization could be wrong is if the same old _bt_checkkeys logic that decides when to terminate the scan (by setting continuescan=false) always had some propensity to "change its mind" about ending the scan, at least when it somehow got the opportunity to see tuples after the first tuple that it indicated should end the scan. That's not quite bulletproof, of course (it's not like older Postgres versions actually provided _bt_checkkeys with opportunities to "change its mind" in this sense), but it's a useful starting point IME. It helps to build intuition. -- Peter Geoghegan
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Alexander Korotkov
Date:
On Wed, Dec 6, 2023 at 6:05 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > On Wed, Dec 6, 2023 at 3:46 AM Peter Geoghegan <pg@bowt.ie> wrote: > > On Tue, Dec 5, 2023 at 4:41 PM Peter Geoghegan <pg@bowt.ie> wrote: > > > "In general, when inequality keys are present, the initial-positioning > > > code only promises to position before the first possible match, not > > > exactly at the first match, for a forward scan; or after the last > > > match for a backward scan." > > > > > > My test case mostly just demonstrates how to reproduce the scenario > > > described by this sentence. > > > > I just realized that my test case wasn't quite minimized correctly. It > > depended on a custom function that was no longer created. > > > > Attached is a revised version that uses btint84cmp instead. > > Thank you for raising this issue. Preprocessing of btree scan keys is > normally removing the redundant scan keys. However, redundant scan > keys aren't removed when they have arguments of different types. > Please give me a bit of time to figure out how to workaround this. I dig into the problem. I think this assumption is wrong in my commit. "When the key is required for opposite direction scan, it must be already satisfied by_bt_first() ..." In your example "foo = 90" is satisfied by_bt_first(), but "foo > 99::int8" is not. I think this could be resolved by introducing a separate flag exactly distinguishing scan keys used for _bt_first(). I'm going to post the patch doing this. ------ Regards, Alexander Korotkov
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Alexander Korotkov
Date:
On Fri, Dec 8, 2023 at 8:30 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > On Wed, Dec 6, 2023 at 6:05 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > On Wed, Dec 6, 2023 at 3:46 AM Peter Geoghegan <pg@bowt.ie> wrote: > > > On Tue, Dec 5, 2023 at 4:41 PM Peter Geoghegan <pg@bowt.ie> wrote: > > > > "In general, when inequality keys are present, the initial-positioning > > > > code only promises to position before the first possible match, not > > > > exactly at the first match, for a forward scan; or after the last > > > > match for a backward scan." > > > > > > > > My test case mostly just demonstrates how to reproduce the scenario > > > > described by this sentence. > > > > > > I just realized that my test case wasn't quite minimized correctly. It > > > depended on a custom function that was no longer created. > > > > > > Attached is a revised version that uses btint84cmp instead. > > > > Thank you for raising this issue. Preprocessing of btree scan keys is > > normally removing the redundant scan keys. However, redundant scan > > keys aren't removed when they have arguments of different types. > > Please give me a bit of time to figure out how to workaround this. > > I dig into the problem. I think this assumption is wrong in my commit. > > "When the key is required for opposite direction scan, it must be > already satisfied by_bt_first() ..." > > In your example "foo = 90" is satisfied by_bt_first(), but "foo > > 99::int8" is not. I think this could be resolved by introducing a > separate flag exactly distinguishing scan keys used for _bt_first(). > I'm going to post the patch doing this. The draft patch is attached. It requires polishing and proper commenting. But I hope the basic idea is clear. Do you think this is the way forward? ------ Regards, Alexander Korotkov
Attachment
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Peter Geoghegan
Date:
On Fri, Dec 8, 2023 at 10:46 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > In your example "foo = 90" is satisfied by_bt_first(), but "foo > > > 99::int8" is not. I think this could be resolved by introducing a > > separate flag exactly distinguishing scan keys used for _bt_first(). > > I'm going to post the patch doing this. > > The draft patch is attached. It requires polishing and proper > commenting. But I hope the basic idea is clear. Do you think this is > the way forward? Does this really need to work at the scan key level, rather than at the whole-scan level? Wouldn't it make more sense to just totally disable it for the whole scan, since we'll barely ever need to do that anyway? My ScalarArrayOpExpr patch will need to disable this optimization, since with that patch in place we don't necessarily go through _bt_first each time the search-type scan keys must change. We might need to check a few tuples from before the _bt_first-wise position of the next set of array values, which is a problem with opposite-direction-only inequalities (it's a little bit like the situation from my test case, actually). That's partly why I'd prefer this to work at the whole-scan level (though I also just don't think that inventing SK_BT_BT_FIRST makes much sense). I think that you should make it clearer that this whole optimization only applies to required *inequalities*, which can be required in the opposite direction *only*. It should be more obvious from looking at the code that the optimization doesn't apply to required equality strategy scan keys (those are always required in *both* scan directions or in neither direction, so unlike the closely related prefix optimization added by the same commit, they just can't use the optimization that we need to fix here). BTW, do we really need to keep around the BTScanOpaqueData.firstPage field? Why can't the call to _bt_readpage from _bt_first (and from _bt_endpoint) just pass "firstPage=true" as a simple argument? Note that the first call to _bt_readpage must take place from _bt_first (or from _bt_endpoint). The first _bt_first call is already kind of special, in a way that is directly related to this issue. I added some comments about that to today's commit c9c0589fda, in fact -- I think it's an important issue in general. -- Peter Geoghegan
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Alexander Korotkov
Date:
Hi, Peter! On Sat, Dec 9, 2023 at 4:29 AM Peter Geoghegan <pg@bowt.ie> wrote: > Does this really need to work at the scan key level, rather than at > the whole-scan level? Wouldn't it make more sense to just totally > disable it for the whole scan, since we'll barely ever need to do that > anyway? > > My ScalarArrayOpExpr patch will need to disable this optimization, > since with that patch in place we don't necessarily go through > _bt_first each time the search-type scan keys must change. We might > need to check a few tuples from before the _bt_first-wise position of > the next set of array values, which is a problem with > opposite-direction-only inequalities (it's a little bit like the > situation from my test case, actually). That's partly why I'd prefer > this to work at the whole-scan level (though I also just don't think > that inventing SK_BT_BT_FIRST makes much sense). > > I think that you should make it clearer that this whole optimization > only applies to required *inequalities*, which can be required in the > opposite direction *only*. It should be more obvious from looking at > the code that the optimization doesn't apply to required equality > strategy scan keys (those are always required in *both* scan > directions or in neither direction, so unlike the closely related > prefix optimization added by the same commit, they just can't use the > optimization that we need to fix here). > > BTW, do we really need to keep around the BTScanOpaqueData.firstPage > field? Why can't the call to _bt_readpage from _bt_first (and from > _bt_endpoint) just pass "firstPage=true" as a simple argument? Note > that the first call to _bt_readpage must take place from _bt_first (or > from _bt_endpoint). The first _bt_first call is already kind of > special, in a way that is directly related to this issue. I added some > comments about that to today's commit c9c0589fda, in fact -- I think > it's an important issue in general. Please, check the attached patchset. The first patch removes the BTScanOpaqueData.firstPage field as you proposed. I think this is a good idea, thank you for the proposal. Regarding the requiredOppositeDir bug. I don't want to lose the generality of the optimization. I could work for different cases, for example. WHERE col1 > val1 AND col1 < val2 WHERE col1 = val1 AND col2 > val2 AND col2 < val3 WHERE col1 = val1 AND col2 = val2 AND col3 > val3 AND col3 < val4 And there could be additional scan keys, which shouldn't be skipped. But that shouldn't mean we shouldn't skip others. See the second patch for my next proposal to fix the problem. Instead of relying on _bt_first(), let's rely on the first matched item on the page. Once it's found, we may skip scan keys required for the opposite direction. What do you think? ------ Regards, Alexander Korotkov
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Peter Geoghegan
Date:
Will you be in Prague this week? If not this might have to wait.
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Alexander Korotkov
Date:
On Mon, Dec 11, 2023 at 5:56 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > BTW, do we really need to keep around the BTScanOpaqueData.firstPage > > field? Why can't the call to _bt_readpage from _bt_first (and from > > _bt_endpoint) just pass "firstPage=true" as a simple argument? Note > > that the first call to _bt_readpage must take place from _bt_first (or > > from _bt_endpoint). The first _bt_first call is already kind of > > special, in a way that is directly related to this issue. I added some > > comments about that to today's commit c9c0589fda, in fact -- I think > > it's an important issue in general. > > Please, check the attached patchset. Sorry, I forgot the attachment. Here it is. ------ Regards, Alexander Korotkov
Attachment
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Alexander Korotkov
Date:
On Mon, Dec 11, 2023 at 6:16 PM Peter Geoghegan <pg@bowt.ie> wrote: > Will you be in Prague this week? If not this might have to wait. Sorry, I wouldn't be in Prague this week. Due to my current immigration status, I can't travel. I wish you to have a lovely time in Prague. I'm OK to wait, review once you can. I will probably provide a more polished version meanwhile. ------ Regards, Alexander Korotkov
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Alexander Korotkov
Date:
On Tue, Dec 12, 2023 at 3:22 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > On Mon, Dec 11, 2023 at 6:16 PM Peter Geoghegan <pg@bowt.ie> wrote: > > Will you be in Prague this week? If not this might have to wait. > > Sorry, I wouldn't be in Prague this week. Due to my current > immigration status, I can't travel. > I wish you to have a lovely time in Prague. I'm OK to wait, review > once you can. I will probably provide a more polished version > meanwhile. Please find the revised patchset attached. It comes with revised comments and commit messages. Besides bug fixing the second patch makes optimization easier to understand. Now the flag used for skipping checks of same direction required keys is named continuescanPrechecked and means exactly that *continuescan flag is known to be true for the last item on the page. Any objections to pushing these two patches? ------ Regards, Alexander Korotkov
Attachment
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Pavel Borisov
Date:
Hi, Alexander!
On Mon, 25 Dec 2023 at 02:51, Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Tue, Dec 12, 2023 at 3:22 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Mon, Dec 11, 2023 at 6:16 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > Will you be in Prague this week? If not this might have to wait.
>
> Sorry, I wouldn't be in Prague this week. Due to my current
> immigration status, I can't travel.
> I wish you to have a lovely time in Prague. I'm OK to wait, review
> once you can. I will probably provide a more polished version
> meanwhile.
Please find the revised patchset attached. It comes with revised
comments and commit messages. Besides bug fixing the second patch
makes optimization easier to understand. Now the flag used for
skipping checks of same direction required keys is named
continuescanPrechecked and means exactly that *continuescan flag is
known to be true for the last item on the page.
Any objections to pushing these two patches?
I've reviewed both patches:
0001 - is a pure refactoring replacing argument transfer from via struct member to transfer explicitly as a function argument. It's justified by the fact firstPage is localized only to several places. The patch looks simple and good enough.
0002:
continuescanPrechecked is semantically much better than previous requiredMatchedByPrecheck which confused me earlier. Thanks!
From the new comments, it looks a little bit hard to understand who does what. Semantics "if caller told" in comments looks more clear to me. Could you especially give attention to the comments:
"If they wouldn't be matched, then the *continuescan flag would be set for the current item and the last item on the page accordingly."
"If the key is required for the opposite direction scan, we need to know there was already at least one matching item on the page. For those keys."
> Prechecking the value of the continuescan flag for the last item on the
>+ * page (according to the scan direction).
>+ * page (according to the scan direction).
Maybe, in this case, it would be more clear like: "...(for backwards scan it will be the first item on a page)"
Otherwise the patch 0002 looks like a good fix for the bug to be pushed.
Kind regards,
Pavel Borisov
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Alexander Korotkov
Date:
Pavel, On Mon, Dec 25, 2023 at 8:32 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > I've reviewed both patches: > 0001 - is a pure refactoring replacing argument transfer from via struct member to transfer explicitly as a function argument.It's justified by the fact firstPage is localized only to several places. The patch looks simple and good enough. > > 0002: > continuescanPrechecked is semantically much better than previous requiredMatchedByPrecheck which confused me earlier. Thanks! > > From the new comments, it looks a little bit hard to understand who does what. Semantics "if caller told" in comments looksmore clear to me. Could you especially give attention to the comments: > > "If they wouldn't be matched, then the *continuescan flag would be set for the current item and the last item on the pageaccordingly." > "If the key is required for the opposite direction scan, we need to know there was already at least one matching item onthe page. For those keys." > > > Prechecking the value of the continuescan flag for the last item on the > >+ * page (according to the scan direction). > Maybe, in this case, it would be more clear like: "...(for backwards scan it will be the first item on a page)" > > Otherwise the patch 0002 looks like a good fix for the bug to be pushed. Thank you for your review. I've revised comments to meet your suggestions. ------ Regards, Alexander Korotkov
Attachment
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Pavel Borisov
Date:
Alexander,
On Tue, 26 Dec 2023 at 23:35, Alexander Korotkov <aekorotkov@gmail.com> wrote:
Pavel,
On Mon, Dec 25, 2023 at 8:32 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> I've reviewed both patches:
> 0001 - is a pure refactoring replacing argument transfer from via struct member to transfer explicitly as a function argument. It's justified by the fact firstPage is localized only to several places. The patch looks simple and good enough.
>
> 0002:
> continuescanPrechecked is semantically much better than previous requiredMatchedByPrecheck which confused me earlier. Thanks!
>
> From the new comments, it looks a little bit hard to understand who does what. Semantics "if caller told" in comments looks more clear to me. Could you especially give attention to the comments:
>
> "If they wouldn't be matched, then the *continuescan flag would be set for the current item and the last item on the page accordingly."
> "If the key is required for the opposite direction scan, we need to know there was already at least one matching item on the page. For those keys."
>
> > Prechecking the value of the continuescan flag for the last item on the
> >+ * page (according to the scan direction).
> Maybe, in this case, it would be more clear like: "...(for backwards scan it will be the first item on a page)"
>
> Otherwise the patch 0002 looks like a good fix for the bug to be pushed.
Thank you for your review. I've revised comments to meet your suggestions.
Thank you for revised comments! I think they are good enough.
Regards,
Pavel
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Alexander Korotkov
Date:
On Wed, Dec 27, 2023 at 1:18 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > Thank you for revised comments! I think they are good enough. Pushed, thank you! ------ Regards, Alexander Korotkov
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
"Anton A. Melnikov"
Date:
Hi! Maybe _bt_readpage(scan, dir, start, true) needed at this line: https://github.com/postgres/postgres/blob/b4080fa3dcf6c6359e542169e0e81a0662c53ba8/src/backend/access/nbtree/nbtsearch.c#L2501 ? Do we really need to try prechecking the continuescan flag here? And the current "false" in the last arg does not match the previous code before 06b10f80ba and the current comment above. Would be very grateful for clarification. With the best regards! -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Pavel Borisov
Date:
Hi, Anton!
Looks like an oversight when refactoring BTScanOpaqueData.firstPage into using function argument in 06b10f80ba4.
@@ -2487,14 +2486,13 @@ _bt_endpoint(IndexScanDesc scan, ScanDirection dir)
104
105 /* remember which buffer we have pinned */
106 so->currPos.buf = buf;
107 - so->firstPage = true;
108
109 _bt_initialize_more_data(so, dir);
110
111 /*
112 * Now load data from the first page of the scan.
113 */
114 - if (!_bt_readpage(scan, dir, start))
115 + if (!_bt_readpage(scan, dir, start, false))
104
105 /* remember which buffer we have pinned */
106 so->currPos.buf = buf;
107 - so->firstPage = true;
108
109 _bt_initialize_more_data(so, dir);
110
111 /*
112 * Now load data from the first page of the scan.
113 */
114 - if (!_bt_readpage(scan, dir, start))
115 + if (!_bt_readpage(scan, dir, start, false))
Attached is a fix.
Thank you!
Regards,
Pavel
On Fri, 22 Mar 2024 at 11:29, Anton A. Melnikov <a.melnikov@postgrespro.ru> wrote:
Hi!
Maybe _bt_readpage(scan, dir, start, true) needed at this line:
https://github.com/postgres/postgres/blob/b4080fa3dcf6c6359e542169e0e81a0662c53ba8/src/backend/access/nbtree/nbtsearch.c#L2501
?
Do we really need to try prechecking the continuescan flag here?
And the current "false" in the last arg does not match the previous code before 06b10f80ba
and the current comment above.
Would be very grateful for clarification.
With the best regards!
--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
"Anton A. Melnikov"
Date:
On 22.03.2024 11:02, Pavel Borisov wrote: > > Attached is a fix. Thanks! -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Maxim Orlov
Date:
I've noticed this patch and had a quick look at it. As far as I understand, this bug
does not lead to an incorrect matching, resulting only in degradation in speed.
Anyway, consider this patch useful, hope it will be committed soon.
--
Best regards,
Maxim Orlov.
Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)
From
Alexander Korotkov
Date:
On Fri, Mar 22, 2024 at 11:58 AM Maxim Orlov <orlovmg@gmail.com> wrote: > I've noticed this patch and had a quick look at it. As far as I understand, this bug > does not lead to an incorrect matching, resulting only in degradation in speed. > Anyway, consider this patch useful, hope it will be committed soon. Pushed. Thanks to Maxim and Pavel. ------ Regards, Alexander Korotkov