Thread: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Peter Geoghegan

Date:

06 December 2023, 00:41:06

As I recently went into on the thread where we've been discussing my
nbtree SAOP patch [1], there is good reason to suspect that one of the
optimizations added by commit e0b1ee17 is buggy in the presence of an
opfamily lacking the full set of cross-type comparisons. The attached
test case confirms that these suspicions were correct. Running the
tese case against HEAD will lead to an assertion failure (or a wrong
answer when assertions are disabled).

To recap, the optimization in question (which is not to be confused
with the "precheck" optimization from the same commit) is based on the
idea that _bt_first must always land the scan ahead of the position
that the scan would end on, were the scan direction to change (from
forwards to backwards, say). It follows that inequality strategy scan
keys that are required in the opposite-to-scan direction *only* must
be redundant in the current scan direction (in the sense that
_bt_checkkeys needn't bother comparing them at all). Unfortunately,
that rationale is at least slightly wrong.

Although some version of the same assumption must really hold in the
case of required equality strategy scan keys (old comments in
_bt_checkkeys and in _bt_first say as much), it isn't really
guaranteed in the case of inequalities. In fact, the following
sentence appears in old comments above _bt_preprocess_keys, directly
contradicting the theory behind the optimization in question:

"In general, when inequality keys are present, the initial-positioning
code only promises to position before the first possible match, not
exactly at the first match, for a forward scan; or after the last
match for a backward scan."

My test case mostly just demonstrates how to reproduce the scenario
described by this sentence.

It's probably possible to salvage the optimization, but that will
require bookkeeping sufficient to detect these unsafe cases, so that
_bt_checkkeys only skips the comparisons when it's truly safe. As far
as I know, the only reason that inequalities differ from equalities is
this respect is the issue that the test case highlights. (There were
also issues with NULLs, but AFAICT Alexander dealt with that aspect of
the problem already.)

[1] https://postgr.es/m/CAH2-Wz=BuxYEHxpNH0tPvo=+G1WtE1PamRoVU1dEVow1Vy9Y7A@mail.gmail.com
-- 
Peter Geoghegan

Attachment

require_opposite_dir_repro.sql

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Tom Lane

Date:

06 December 2023, 00:53:50

Peter Geoghegan <pg@bowt.ie> writes:
> As I recently went into on the thread where we've been discussing my
> nbtree SAOP patch [1], there is good reason to suspect that one of the
> optimizations added by commit e0b1ee17 is buggy in the presence of an
> opfamily lacking the full set of cross-type comparisons. The attached
> test case confirms that these suspicions were correct. Running the
> tese case against HEAD will lead to an assertion failure (or a wrong
> answer when assertions are disabled).

Hmm ... I had not paid any attention to this commit, but the rationale
given in the commit message is just flat wrong:

    Imagine the ordered B-tree scan for the query like this.

    SELECT * FROM tbl WHERE col > 'a' AND col < 'b' ORDER BY col;

    The (col > 'a') scan key will be always matched once we find the location to
    start the scan.  The (col < 'b') scan key will match every item on the page
    as long as it matches the last item on the page.

That argument probably holds for the index's first column, but it is
completely and obviously wrong for every following column.  Nonetheless
it looks like we're trying to apply the optimization to every scan key.

            regards, tom lane

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Peter Geoghegan

Date:

06 December 2023, 01:14:23

On Tue, Dec 5, 2023 at 4:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hmm ... I had not paid any attention to this commit, but the rationale
> given in the commit message is just flat wrong:
>
>     Imagine the ordered B-tree scan for the query like this.
>
>     SELECT * FROM tbl WHERE col > 'a' AND col < 'b' ORDER BY col;
>
>     The (col > 'a') scan key will be always matched once we find the location to
>     start the scan.  The (col < 'b') scan key will match every item on the page
>     as long as it matches the last item on the page.
>
> That argument probably holds for the index's first column, but it is
> completely and obviously wrong for every following column.  Nonetheless
> it looks like we're trying to apply the optimization to every scan key.

Just to be clear, you're raising a concern that seems to me to apply
to "the other optimization" from the same commit, specifically -- the
precheck optimization. Not the one I found a problem in. (They're
closely related but distinct optimizations.)

I *think* that that part is handled correctly, because non-required
scan keys are not affected (by either optimization). I have no
specific reason to doubt the proposition that 'b' could only be marked
required in situations where it is indeed safe to assume that the col
< 'b' condition must also apply to earlier tuples transitively (i.e.
this must be true because it was true for the the final tuple on the
page during the _bt_readpage precheck).

That being said, I wouldn't rule out problems for the precheck
optimization in the presence of opfamilies like the one from my test
case. I didn't get as far as exploring that side of things, at least.

--
Peter Geoghegan

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Peter Geoghegan

Date:

06 December 2023, 01:45:41

On Tue, Dec 5, 2023 at 4:41 PM Peter Geoghegan <pg@bowt.ie> wrote:
> "In general, when inequality keys are present, the initial-positioning
> code only promises to position before the first possible match, not
> exactly at the first match, for a forward scan; or after the last
> match for a backward scan."
>
> My test case mostly just demonstrates how to reproduce the scenario
> described by this sentence.

I just realized that my test case wasn't quite minimized correctly. It
depended on a custom function that was no longer created.

Attached is a revised version that uses btint84cmp instead.

--
Peter Geoghegan

Attachment

require_opposite_dir_repro_v2.sql

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Alexander Korotkov

Date:

06 December 2023, 04:05:52

Hi, Peter!

On Wed, Dec 6, 2023 at 3:46 AM Peter Geoghegan <pg@bowt.ie> wrote:
> On Tue, Dec 5, 2023 at 4:41 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > "In general, when inequality keys are present, the initial-positioning
> > code only promises to position before the first possible match, not
> > exactly at the first match, for a forward scan; or after the last
> > match for a backward scan."
> >
> > My test case mostly just demonstrates how to reproduce the scenario
> > described by this sentence.
>
> I just realized that my test case wasn't quite minimized correctly. It
> depended on a custom function that was no longer created.
>
> Attached is a revised version that uses btint84cmp instead.

Thank you for raising this issue.  Preprocessing of btree scan keys is
normally removing the redundant scan keys.  However, redundant scan
keys aren't removed when they have arguments of different types.
Please give me a bit of time to figure out how to workaround this.

------
Regards,
Alexander Korotkov

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Peter Geoghegan

Date:

06 December 2023, 04:20:01

On Tue, Dec 5, 2023 at 8:06 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> Thank you for raising this issue.  Preprocessing of btree scan keys is
> normally removing the redundant scan keys.  However, redundant scan
> keys aren't removed when they have arguments of different types.
> Please give me a bit of time to figure out how to workaround this.

Couldn't you condition the use of the optimization on
_bt_preprocess_keys being able to use cross-type operators when it
checked for redundant or contradictory scan keys? The vast majority of
index scans will be able to do that.

As I said already, what you're doing here isn't all that different to
the way that we rely on required equality strategy scan keys being
used to build our initial insertion scan key, that determines where
the scan is initially positioned to within _bt_first. Inequalities
aren't all that different to equalities.

--
Peter Geoghegan

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Robert Haas

Date:

06 December 2023, 13:11:32

On Tue, Dec 5, 2023 at 8:15 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Just to be clear, you're raising a concern that seems to me to apply
> to "the other optimization" from the same commit, specifically -- the
> precheck optimization. Not the one I found a problem in. (They're
> closely related but distinct optimizations.)

It isn't very clear from the commit message that this commit is doing
two different things, and in fact I'm still unclear on what exactly
the other optimization is.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Matthias van de Meent

Date:

06 December 2023, 13:27:24

On Wed, 6 Dec 2023 at 14:11, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Dec 5, 2023 at 8:15 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > Just to be clear, you're raising a concern that seems to me to apply
> > to "the other optimization" from the same commit, specifically -- the
> > precheck optimization. Not the one I found a problem in. (They're
> > closely related but distinct optimizations.)
>
> It isn't very clear from the commit message that this commit is doing
> two different things, and in fact I'm still unclear on what exactly
> the other optimization is.

I feel that Peter refered to these two distinct optimizations:

1. When scanning an index in ascending order using scankey a > 1 (so,
one that defines a start point of the scan), we don't need to check
items for consistency with that scankey once we've found the first
value that is consistent with the scankey, as all future values will
also be consistent with the scankey (if we assume no concurrent page
deletions).

2. When scanning an index in ascending order using scankey a < 10 (one
that defines an endpoint of the scan), we can look ahead and check if
the last item on the page is consistent. If so, then all other items
on the page will also be consistent with that scankey.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Robert Haas

Date:

06 December 2023, 13:32:13

On Wed, Dec 6, 2023 at 8:27 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> I feel that Peter refered to these two distinct optimizations:
>
> 1. When scanning an index in ascending order using scankey a > 1 (so,
> one that defines a start point of the scan), we don't need to check
> items for consistency with that scankey once we've found the first
> value that is consistent with the scankey, as all future values will
> also be consistent with the scankey (if we assume no concurrent page
> deletions).
>
> 2. When scanning an index in ascending order using scankey a < 10 (one
> that defines an endpoint of the scan), we can look ahead and check if
> the last item on the page is consistent. If so, then all other items
> on the page will also be consistent with that scankey.

Oh, interesting. Thanks.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Peter Geoghegan

Date:

06 December 2023, 16:44:16

On Wed, Dec 6, 2023 at 5:27 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> On Wed, 6 Dec 2023 at 14:11, Robert Haas <robertmhaas@gmail.com> wrote:
> > It isn't very clear from the commit message that this commit is doing
> > two different things, and in fact I'm still unclear on what exactly
> > the other optimization is.
>
> I feel that Peter refered to these two distinct optimizations:

Right.

> 2. When scanning an index in ascending order using scankey a < 10 (one
> that defines an endpoint of the scan), we can look ahead and check if
> the last item on the page is consistent. If so, then all other items
> on the page will also be consistent with that scankey.

Also worth noting that it could be "scankey a = 10". That is, the
precheck optimization (i.e. the optimization that's not the target of
my test case) can deal with equalities and inequalities just as well
(any scan key that's required in the current scan direction is
supported). On the other hand the required-in-opposite-direction-only
optimization (i.e. the target of my test case) only applies to
inequality strategy scan keys.

It kinda makes sense to explain both concepts using an example that
involves both > and < strategy inequalities, since that makes the
symmetry between the two optimizations clearer.

--
Peter Geoghegan

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Peter Geoghegan

Date:

06 December 2023, 17:31:56

On Tue, Dec 5, 2023 at 8:20 PM Peter Geoghegan <pg@bowt.ie> wrote:
> On Tue, Dec 5, 2023 at 8:06 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > Thank you for raising this issue.  Preprocessing of btree scan keys is
> > normally removing the redundant scan keys.  However, redundant scan
> > keys aren't removed when they have arguments of different types.
> > Please give me a bit of time to figure out how to workaround this.
>
> Couldn't you condition the use of the optimization on
> _bt_preprocess_keys being able to use cross-type operators when it
> checked for redundant or contradictory scan keys? The vast majority of
> index scans will be able to do that.

Some quick experimentation shows that my test case works as expected
once _bt_preprocess_keys is taught to remember that it has seen a
maybe-unsafe case, which it stashes in a special new field from the
scan's state for later. As I said, this field can be treated as a
condition of applying the required-in-opposite-direction-only
optimization in _bt_readpage().

This new field would be analogous to the existing
requiredMatchedByPrecheck state used by _bt_readpage() to determine
whether it can apply the required-in-same-direction optimization. The
new field works for the whole scan instead of just for one page, and
it works based on information from "behind the scan" instead of
information "just ahead of the scan". But the basic idea is the same.

_bt_preprocess_keys is rather complicated. It is perhaps tempting to
do this in a targeted way, that specifically limits itself to the exact
cases that we know to be unsafe. But it might be okay to just disable
the optimization in most or all cases where _bt_compare_scankey_args()
returns false. That's likely to be very rare in practice, anyway (who
really uses opfamilies like these?). Not really sure where to come
down on that.

--
Peter Geoghegan

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Peter Geoghegan

Date:

06 December 2023, 18:54:45

On Wed, Dec 6, 2023 at 5:27 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> 1. When scanning an index in ascending order using scankey a > 1 (so,
> one that defines a start point of the scan), we don't need to check
> items for consistency with that scankey once we've found the first
> value that is consistent with the scankey, as all future values will
> also be consistent with the scankey (if we assume no concurrent page
> deletions).

BTW, I don't think that page deletion is a concern for these
optimizations in the way that it is for the similar idea of "dynamic
prefix compression", which works against insertion-type scan keys
(used to descend the tree and to do an initial binary search of a leaf
page).

We already rely on the first call to _bt_readpage (the one that takes
place from within _bt_first rather than from _bt_next) passing a page
offset number that's exactly at the start of where matches begin --
this is crucial in the case of scans with required equality strategy
scan keys (most scans). If we just skipped the _bt_binsrch and passed
P_FIRSTDATAKEY(opaque) to _bt_readpage within _bt_first instead, that
would break lots of queries. So the _bt_binsrch step within _bt_first
isn't just an optimization -- it's crucial. This is nothing new.

Recall that _bt_readpage only deals with search-type scan keys,
meaning scan keys that use a simple operator (so it uses = operators
with the equality strategy, as opposed to using a 3-way ORDER
proc/support function 1 that can tell the difference between < and >).
In general _bt_readpage doesn't know how to tell the difference
between a tuple that's before the start of = matches, and a tuple
that's at (or after) the end of any = matches. If it is ever allowed
to conflate these two cases, then we'll overlook matching tuples,
which is of course wrong (it'll terminate the scan before it even
starts). It is up to the caller (really just _bt_first) to never call
_bt_readpage in a way that allows this confusion to take place --
which is what makes the _bt_binsrch step crucial.

A race condition with page deletion might allow the key space covered
by a leaf page to "widen" after we've left its parent, but before we
arrive on the leaf page. But the _bt_binsrch step within _bt_first
happens *after* we land on and lock that leaf page, in any case. So
there is no risk of the scan ever doing anything with
concurrently-inserted index tuples. In general we only have to worry
about such race conditions when descending the tree -- they're not a
problem after the scan has reached the leaf level and established an
initial page offset number. (The way that _bt_readpage processes whole
pages in one atomic step helps with this sort of thing, too. We can
almost pretend that the B-Tree structure is immutable, even though
that's obviously not really true at all. We know that we'll always
make forward progress through the key space by remembering the next
page to visit when processing each page.)

My test case broke the required-in-opposite-direction-only
optimization by finding a way in which
required-in-opposite-direction-only inequalities were not quite the
same as required equalities with respect to this business about the
precise leaf page offset number that the scan begins at. They make
*almost* the same set of guarantees (note in particular that both will
be transformed into insertion scan key/3-way ORDER proc scan keys by
_bt_first's initial positioning code), but there is at least one
special case that applies only to inequalities. I had to play games
with weird incomplete opfamilies to actually break the optimization --
that was required to tickle the special case in just the right/wrong
way.

--
Peter Geoghegan

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Matthias van de Meent

Date:

06 December 2023, 19:14:24

On Wed, 6 Dec 2023 at 19:55, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Wed, Dec 6, 2023 at 5:27 AM Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> > 1. When scanning an index in ascending order using scankey a > 1 (so,
> > one that defines a start point of the scan), we don't need to check
> > items for consistency with that scankey once we've found the first
> > value that is consistent with the scankey, as all future values will
> > also be consistent with the scankey (if we assume no concurrent page
> > deletions).
>
> BTW, I don't think that page deletion is a concern for these
> optimizations in the way that it is for the similar idea of "dynamic
> prefix compression", which works against insertion-type scan keys
> (used to descend the tree and to do an initial binary search of a leaf
> page).
>
> We already rely on the first call to _bt_readpage (the one that takes
> place from within _bt_first rather than from _bt_next) passing a page
> offset number that's exactly at the start of where matches begin --
> this is crucial in the case of scans with required equality strategy
> scan keys (most scans). If we just skipped the _bt_binsrch and passed
> P_FIRSTDATAKEY(opaque) to _bt_readpage within _bt_first instead, that
> would break lots of queries. So the _bt_binsrch step within _bt_first
> isn't just an optimization -- it's crucial. This is nothing new.

I was thinking more along the lines of page splits+deletions while
we're doing _bt_stepright(), but forgot to consider that we first lock
the right sibling, and only then release the left sibling for splits,
so we should be fine here.

Kind regards,

Matthias van de Meent

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Peter Geoghegan

Date:

06 December 2023, 19:50:16

On Wed, Dec 6, 2023 at 11:14 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> I was thinking more along the lines of page splits+deletions while
> we're doing _bt_stepright(), but forgot to consider that we first lock
> the right sibling, and only then release the left sibling for splits,
> so we should be fine here.

In general the simplest (and possibly most convincing) arguments for
the correctness of optimizations like the ones that Alexander added
rely on seeing that the only way that the optimization can be wrong is
if some more fundamental and long established thing was also wrong. We
could try to prove that the new optimization is correct (or wrong),
but it is often more helpful to "prove" that some much more
fundamental thing is correct instead, if that provides us with a
useful corollary about the new thing also being correct.

Take the _bt_readpage precheck optimization, for example. Rather than
thinking about the key space and transitive rules, it might be more
helpful to focus on what must have been true in earlier Postgres
versions, and what we can expect to still hold now. The only way that
that optimization could be wrong is if the same old _bt_checkkeys
logic that decides when to terminate the scan (by setting
continuescan=false) always had some propensity to "change its mind"
about ending the scan, at least when it somehow got the opportunity to
see tuples after the first tuple that it indicated should end the
scan. That's not quite bulletproof, of course (it's not like older
Postgres versions actually provided _bt_checkkeys with opportunities
to "change its mind" in this sense), but it's a useful starting point
IME. It helps to build intuition.

--
Peter Geoghegan

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Alexander Korotkov

Date:

08 December 2023, 18:30:18

On Wed, Dec 6, 2023 at 6:05 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Wed, Dec 6, 2023 at 3:46 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > On Tue, Dec 5, 2023 at 4:41 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > > "In general, when inequality keys are present, the initial-positioning
> > > code only promises to position before the first possible match, not
> > > exactly at the first match, for a forward scan; or after the last
> > > match for a backward scan."
> > >
> > > My test case mostly just demonstrates how to reproduce the scenario
> > > described by this sentence.
> >
> > I just realized that my test case wasn't quite minimized correctly. It
> > depended on a custom function that was no longer created.
> >
> > Attached is a revised version that uses btint84cmp instead.
>
> Thank you for raising this issue.  Preprocessing of btree scan keys is
> normally removing the redundant scan keys.  However, redundant scan
> keys aren't removed when they have arguments of different types.
> Please give me a bit of time to figure out how to workaround this.

I dig into the problem.  I think this assumption is wrong in my commit.

"When the key is required for opposite direction scan, it must be
already satisfied by_bt_first() ..."

In your example "foo = 90" is satisfied by_bt_first(), but "foo >
99::int8" is not.  I think this could be resolved by introducing a
separate flag exactly distinguishing scan keys used for _bt_first().
I'm going to post the patch doing this.

------
Regards,
Alexander Korotkov

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Alexander Korotkov

Date:

08 December 2023, 18:46:01

On Fri, Dec 8, 2023 at 8:30 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Wed, Dec 6, 2023 at 6:05 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > On Wed, Dec 6, 2023 at 3:46 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > > On Tue, Dec 5, 2023 at 4:41 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > > > "In general, when inequality keys are present, the initial-positioning
> > > > code only promises to position before the first possible match, not
> > > > exactly at the first match, for a forward scan; or after the last
> > > > match for a backward scan."
> > > >
> > > > My test case mostly just demonstrates how to reproduce the scenario
> > > > described by this sentence.
> > >
> > > I just realized that my test case wasn't quite minimized correctly. It
> > > depended on a custom function that was no longer created.
> > >
> > > Attached is a revised version that uses btint84cmp instead.
> >
> > Thank you for raising this issue.  Preprocessing of btree scan keys is
> > normally removing the redundant scan keys.  However, redundant scan
> > keys aren't removed when they have arguments of different types.
> > Please give me a bit of time to figure out how to workaround this.
>
> I dig into the problem.  I think this assumption is wrong in my commit.
>
> "When the key is required for opposite direction scan, it must be
> already satisfied by_bt_first() ..."
>
> In your example "foo = 90" is satisfied by_bt_first(), but "foo >
> 99::int8" is not.  I think this could be resolved by introducing a
> separate flag exactly distinguishing scan keys used for _bt_first().
> I'm going to post the patch doing this.

The draft patch is attached.  It requires polishing and proper
commenting.  But I hope the basic idea is clear.  Do you think this is
the way forward?

------
Regards,
Alexander Korotkov

Attachment

fix_btree_skip_scan_keys_optimization-v1.patch

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Peter Geoghegan

Date:

09 December 2023, 02:29:17

On Fri, Dec 8, 2023 at 10:46 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > In your example "foo = 90" is satisfied by_bt_first(), but "foo >
> > 99::int8" is not.  I think this could be resolved by introducing a
> > separate flag exactly distinguishing scan keys used for _bt_first().
> > I'm going to post the patch doing this.
>
> The draft patch is attached.  It requires polishing and proper
> commenting.  But I hope the basic idea is clear.  Do you think this is
> the way forward?

Does this really need to work at the scan key level, rather than at
the whole-scan level? Wouldn't it make more sense to just totally
disable it for the whole scan, since we'll barely ever need to do that
anyway?

My ScalarArrayOpExpr patch will need to disable this optimization,
since with that patch in place we don't necessarily go through
_bt_first each time the search-type scan keys must change. We might
need to check a few tuples from before the _bt_first-wise position of
the next set of array values, which is a problem with
opposite-direction-only inequalities (it's a little bit like the
situation from my test case, actually). That's partly why I'd prefer
this to work at the whole-scan level (though I also just don't think
that inventing SK_BT_BT_FIRST makes much sense).

I think that you should make it clearer that this whole optimization
only applies to required *inequalities*, which can be required in the
opposite direction *only*. It should be more obvious from looking at
the code that the optimization doesn't apply to required equality
strategy scan keys (those are always required in *both* scan
directions or in neither direction, so unlike the closely related
prefix optimization added by the same commit, they just can't use the
optimization that we need to fix here).

BTW, do we really need to keep around the BTScanOpaqueData.firstPage
field? Why can't the call to _bt_readpage from _bt_first (and from
_bt_endpoint) just pass "firstPage=true" as a simple argument? Note
that the first call to _bt_readpage must take place from _bt_first (or
from _bt_endpoint). The first _bt_first call is already kind of
special, in a way that is directly related to this issue. I added some
comments about that to today's commit c9c0589fda, in fact -- I think
it's an important issue in general.

--
Peter Geoghegan

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Alexander Korotkov

Date:

11 December 2023, 15:56:13

Hi, Peter!

On Sat, Dec 9, 2023 at 4:29 AM Peter Geoghegan <pg@bowt.ie> wrote:
> Does this really need to work at the scan key level, rather than at
> the whole-scan level? Wouldn't it make more sense to just totally
> disable it for the whole scan, since we'll barely ever need to do that
> anyway?
>
> My ScalarArrayOpExpr patch will need to disable this optimization,
> since with that patch in place we don't necessarily go through
> _bt_first each time the search-type scan keys must change. We might
> need to check a few tuples from before the _bt_first-wise position of
> the next set of array values, which is a problem with
> opposite-direction-only inequalities (it's a little bit like the
> situation from my test case, actually). That's partly why I'd prefer
> this to work at the whole-scan level (though I also just don't think
> that inventing SK_BT_BT_FIRST makes much sense).
>
> I think that you should make it clearer that this whole optimization
> only applies to required *inequalities*, which can be required in the
> opposite direction *only*. It should be more obvious from looking at
> the code that the optimization doesn't apply to required equality
> strategy scan keys (those are always required in *both* scan
> directions or in neither direction, so unlike the closely related
> prefix optimization added by the same commit, they just can't use the
> optimization that we need to fix here).
>
> BTW, do we really need to keep around the BTScanOpaqueData.firstPage
> field? Why can't the call to _bt_readpage from _bt_first (and from
> _bt_endpoint) just pass "firstPage=true" as a simple argument? Note
> that the first call to _bt_readpage must take place from _bt_first (or
> from _bt_endpoint). The first _bt_first call is already kind of
> special, in a way that is directly related to this issue. I added some
> comments about that to today's commit c9c0589fda, in fact -- I think
> it's an important issue in general.

Please, check the attached patchset.

The first patch removes the BTScanOpaqueData.firstPage field as you
proposed.  I think this is a good idea, thank you for the proposal.

Regarding the requiredOppositeDir bug.  I don't want to lose the
generality of the optimization.  I could work for different cases, for
example.
WHERE col1 > val1 AND col1 < val2
WHERE col1 = val1 AND col2 > val2 AND col2 < val3
WHERE col1 = val1 AND col2 = val2 AND col3 > val3 AND col3 < val4
And there could be additional scan keys, which shouldn't be skipped.
But that shouldn't mean we shouldn't skip others.

See the second patch for my next proposal to fix the problem.  Instead
of relying on _bt_first(), let's rely on the first matched item on the
page. Once it's found, we may skip scan keys required for the opposite
direction.  What do you think?

------
Regards,
Alexander Korotkov

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Peter Geoghegan

Date:

11 December 2023, 16:16:07

Will you be in Prague this week? If not this might have to wait.

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Alexander Korotkov

Date:

11 December 2023, 18:59:32

On Mon, Dec 11, 2023 at 5:56 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > BTW, do we really need to keep around the BTScanOpaqueData.firstPage
> > field? Why can't the call to _bt_readpage from _bt_first (and from
> > _bt_endpoint) just pass "firstPage=true" as a simple argument? Note
> > that the first call to _bt_readpage must take place from _bt_first (or
> > from _bt_endpoint). The first _bt_first call is already kind of
> > special, in a way that is directly related to this issue. I added some
> > comments about that to today's commit c9c0589fda, in fact -- I think
> > it's an important issue in general.
>
> Please, check the attached patchset.

Sorry, I forgot the attachment.  Here it is.

------
Regards,
Alexander Korotkov

Attachment

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Alexander Korotkov

Date:

12 December 2023, 13:22:16

On Mon, Dec 11, 2023 at 6:16 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Will you be in Prague this week? If not this might have to wait.

Sorry, I wouldn't be in Prague this week.  Due to my current
immigration status, I can't travel.
I wish you to have a lovely time in Prague.  I'm OK to wait, review
once you can.  I will probably provide a more polished version
meanwhile.

------
Regards,
Alexander Korotkov

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Alexander Korotkov

Date:

24 December 2023, 22:50:52

On Tue, Dec 12, 2023 at 3:22 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Mon, Dec 11, 2023 at 6:16 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > Will you be in Prague this week? If not this might have to wait.
>
> Sorry, I wouldn't be in Prague this week.  Due to my current
> immigration status, I can't travel.
> I wish you to have a lovely time in Prague.  I'm OK to wait, review
> once you can.  I will probably provide a more polished version
> meanwhile.

Please find the revised patchset attached.  It comes with revised
comments and commit messages.  Besides bug fixing the second patch
makes optimization easier to understand.  Now the flag used for
skipping checks of same direction required keys is named
continuescanPrechecked and means exactly that *continuescan flag is
known to be true for the last item on the page.

Any objections to pushing these two patches?

------
Regards,
Alexander Korotkov

Attachment

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Pavel Borisov

Date:

25 December 2023, 18:32:37

Hi, Alexander!

On Mon, 25 Dec 2023 at 02:51, Alexander Korotkov <aekorotkov@gmail.com> wrote:

On Tue, Dec 12, 2023 at 3:22 PM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Mon, Dec 11, 2023 at 6:16 PM Peter Geoghegan <pg@bowt.ie> wrote:
> > Will you be in Prague this week? If not this might have to wait.
>
> Sorry, I wouldn't be in Prague this week. Due to my current
> immigration status, I can't travel.
> I wish you to have a lovely time in Prague. I'm OK to wait, review
> once you can. I will probably provide a more polished version
> meanwhile.

Please find the revised patchset attached. It comes with revised
comments and commit messages. Besides bug fixing the second patch
makes optimization easier to understand. Now the flag used for
skipping checks of same direction required keys is named
continuescanPrechecked and means exactly that *continuescan flag is
known to be true for the last item on the page.

Any objections to pushing these two patches?

I've reviewed both patches:

0001 - is a pure refactoring replacing argument transfer from via struct member to transfer explicitly as a function argument. It's justified by the fact firstPage is localized only to several places. The patch looks simple and good enough.

0002:

continuescanPrechecked is semantically much better than previous requiredMatchedByPrecheck which confused me earlier. Thanks!

From the new comments, it looks a little bit hard to understand who does what. Semantics "if caller told" in comments looks more clear to me. Could you especially give attention to the comments:

"If they wouldn't be matched, then the *continuescan flag would be set for the current item and the last item on the page accordingly."

"If the key is required for the opposite direction scan, we need to know there was already at least one matching item on the page. For those keys."

> Prechecking the value of the continuescan flag for the last item on the
>+ * page (according to the scan direction).

Maybe, in this case, it would be more clear like: "...(for backwards scan it will be the first item on a page)"

Otherwise the patch 0002 looks like a good fix for the bug to be pushed.

Kind regards,

Pavel Borisov

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Alexander Korotkov

Date:

26 December 2023, 19:35:37

Pavel,

On Mon, Dec 25, 2023 at 8:32 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> I've reviewed both patches:
> 0001 - is a pure refactoring replacing argument transfer from via struct member to transfer explicitly as a function
argument.It's justified by the fact firstPage is localized only to several places. The patch looks simple and good
enough.
>
> 0002:
> continuescanPrechecked is semantically much better than previous requiredMatchedByPrecheck which confused me earlier.
Thanks!
>
> From the new comments, it looks a little bit hard to understand who does what. Semantics "if caller told" in comments
looksmore clear to me. Could you especially give attention to the comments: 
>
> "If they wouldn't be matched, then the *continuescan flag would be set for the current item and the last item on the
pageaccordingly." 
> "If the key is required for the opposite direction scan, we need to know there was already at least one matching item
onthe page.  For those keys." 
>
> > Prechecking the value of the continuescan flag for the last item on the
> >+ * page (according to the scan direction).
> Maybe, in this case, it would be more clear like: "...(for backwards scan it will be the first item on a page)"
>
> Otherwise the patch 0002 looks like a good fix for the bug to be pushed.

Thank you for your review.  I've revised comments to meet your suggestions.

------
Regards,
Alexander Korotkov

Attachment

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Pavel Borisov

Date:

27 December 2023, 11:18:35

Alexander,

On Tue, 26 Dec 2023 at 23:35, Alexander Korotkov <aekorotkov@gmail.com> wrote:

Pavel,

On Mon, Dec 25, 2023 at 8:32 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> I've reviewed both patches:
> 0001 - is a pure refactoring replacing argument transfer from via struct member to transfer explicitly as a function argument. It's justified by the fact firstPage is localized only to several places. The patch looks simple and good enough.
>
> 0002:
> continuescanPrechecked is semantically much better than previous requiredMatchedByPrecheck which confused me earlier. Thanks!
>
> From the new comments, it looks a little bit hard to understand who does what. Semantics "if caller told" in comments looks more clear to me. Could you especially give attention to the comments:
>
> "If they wouldn't be matched, then the *continuescan flag would be set for the current item and the last item on the page accordingly."
> "If the key is required for the opposite direction scan, we need to know there was already at least one matching item on the page. For those keys."
>
> > Prechecking the value of the continuescan flag for the last item on the
> >+ * page (according to the scan direction).
> Maybe, in this case, it would be more clear like: "...(for backwards scan it will be the first item on a page)"
>
> Otherwise the patch 0002 looks like a good fix for the bug to be pushed.

Thank you for your review. I've revised comments to meet your suggestions.

Thank you for revised comments! I think they are good enough.

Regards,

Pavel

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Alexander Korotkov

Date:

27 December 2023, 12:36:18

On Wed, Dec 27, 2023 at 1:18 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> Thank you for revised comments! I think they are good enough.

Pushed, thank you!

------
Regards,
Alexander Korotkov

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

"Anton A. Melnikov"

Date:

22 March 2024, 07:29:38

Hi!

Maybe _bt_readpage(scan, dir, start, true) needed at this line:


https://github.com/postgres/postgres/blob/b4080fa3dcf6c6359e542169e0e81a0662c53ba8/src/backend/access/nbtree/nbtsearch.c#L2501

?

Do we really need to try prechecking the continuescan flag here?

And the current "false" in the last arg does not match the previous code before 06b10f80ba
and the current comment above.

Would be very grateful for clarification.

With the best regards!

-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Pavel Borisov

Date:

22 March 2024, 08:02:49

Hi, Anton!

Looks like an oversight when refactoring BTScanOpaqueData.firstPage into using function argument in 06b10f80ba4.

@@ -2487,14 +2486,13 @@ _bt_endpoint(IndexScanDesc scan, ScanDirection dir)
104
105 /* remember which buffer we have pinned */
106 so->currPos.buf = buf;
107 - so->firstPage = true;
108
109 _bt_initialize_more_data(so, dir);
110
111 /*
112 * Now load data from the first page of the scan.
113 */
114 - if (!_bt_readpage(scan, dir, start))
115 + if (!_bt_readpage(scan, dir, start, false))

Attached is a fix.

Thank you!

Regards,

Pavel

On Fri, 22 Mar 2024 at 11:29, Anton A. Melnikov <a.melnikov@postgrespro.ru> wrote:

Hi!

Maybe _bt_readpage(scan, dir, start, true) needed at this line:

https://github.com/postgres/postgres/blob/b4080fa3dcf6c6359e542169e0e81a0662c53ba8/src/backend/access/nbtree/nbtsearch.c#L2501

?

Do we really need to try prechecking the continuescan flag here?

And the current "false" in the last arg does not match the previous code before 06b10f80ba
and the current comment above.

Would be very grateful for clarification.

With the best regards!

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

0001-Fix-oversight-in-refactoring-in-06b10f80ba4.patch

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

"Anton A. Melnikov"

Date:

22 March 2024, 08:14:48

On 22.03.2024 11:02, Pavel Borisov wrote:
> 
> Attached is a fix.

Thanks!

-- 
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Maxim Orlov

Date:

22 March 2024, 09:58:28

I've noticed this patch and had a quick look at it. As far as I understand, this bug

does not lead to an incorrect matching, resulting only in degradation in speed.

Anyway, consider this patch useful, hope it will be committed soon.

Best regards,

Maxim Orlov.

Re: Bug in nbtree optimization to skip > operator comparisons (or < comparisons in backwards scans)

From

Alexander Korotkov

Date:

22 March 2024, 13:26:47

On Fri, Mar 22, 2024 at 11:58 AM Maxim Orlov <orlovmg@gmail.com> wrote:
> I've noticed this patch and had a quick look at it.  As far as I understand, this bug
> does not lead to an incorrect matching, resulting only in degradation in speed.
> Anyway, consider this patch useful, hope it will be committed soon.

Pushed.
Thanks to Maxim and Pavel.

------
Regards,
Alexander Korotkov