Re: Adding skip scan (including MDAM style range skip scan) to nbtree - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Date
Msg-id CAH2-WzmodSE+gpTd1CRGU9ez8ytyyDS+Kns2r9NzgUp1s56kpw@mail.gmail.com
Whole thread Raw
In response to Re: Adding skip scan (including MDAM style range skip scan) to nbtree  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Fri, May 2, 2025 at 2:22 PM Peter Geoghegan <pg@bowt.ie> wrote:
> A slight variant of my fuzzing Python script did in fact go on to
> detect a couple of bugs.
>
> I'm attaching a compressed SQL file with repros for 2 different bugs.
> The first bug was independently detected by some kind of fuzzing
> performed by Mark Dilger, reported elsewhere [1].

Picking up from the email with the big attachment...

Both bugs are from commit 8a510275, "Further optimize nbtree search
scan key comparisons" (not the main skip scan commit). I actually
wrote code very much like the code from these patches that appeared in
certain versions of the skip scan patch series -- it was originally
supposed to be defensive hardening. This so-called hardening wasn't
kept in the final committed version because I incorrectly believed
that it wasn't necessary.

I would like to commit the first patch later today, ahead of shipping
beta1. But the second patch won't make it into beta1.

In practice the bug fixed by the first patch is more likely to affect
users, and (unlike the bug fixed by the second patch), it involves a
hard crash. The first patch prevents us from dereferencing a NULL
pointer (pstate) within _bt_advance_array_keys (unless on an
assert-enabled build, where we get an assertion failure first). It
would also be possible to fix the issue by testing if pstate itself is
not a NULL pointer in the usual defensive style, but I find the
approach taken in the first patch slightly more natural.

The second patch is more complicated, and seems like something that
I'll need to spend more time thinking about before proceeding with
commit. It has subtle behavioral implications, in that it causes the
pstate.forcenonrequired mechanism to influence when and how
_bt_advance_array_keys schedules primitive index scans in a tiny
minority of forward scan cases. I know of only 3 queries where this
happens, 2 of which are from my repro -- it's actually really hard to
find an example of this, even if you go out of your way to.

Allowing pstate.forcenonrequired to affect primscan scheduling like
this is something that I have been avoiding up until now, since that
makes things cleaner -- but I'm starting to think that that goal isn't
important enough to force the second patch to be significantly more
complicated than what I came up with here. It's not like the
behavioral differences represent a clear regression; they're just
slightly different to what we see in cases where
pstate.forcenonrequired/pstate.ikey is forcibly not applied (e.g., by
commenting-out the calls to _bt_set_startikey made by _bt_readpage).

My approach in the second patch is to simply call _bt_start_array_keys
ahead of the finaltup call to _bt_checkkeys when
pstate.forcenonrequired, which has the merit of being relatively
simple (it's likely the simplest possible approach). I'm unwilling to
pay too much of a cost in implementation complexity just to avoid
side-effects in _bt_advance_array_keys/primscan scheduling, but maybe
I'll find that the cost isn't too high.

--
Peter Geoghegan

Attachment

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Assert("vacrel->eager_scan_remaining_successes > 0")
Next
From: Jacob Champion
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER