Re: Adding skip scan (including MDAM style range skip scan) to nbtree - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Date
Msg-id CAH2-Wzk9gBuPxUPD6KwiDeTvBtBOXk9pQAj0Lxi4Zg8uVLTt=w@mail.gmail.com
Whole thread Raw
In response to Re: Adding skip scan (including MDAM style range skip scan) to nbtree  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Responses Re: Adding skip scan (including MDAM style range skip scan) to nbtree
List pgsql-hackers
On Tue, Apr 1, 2025 at 10:40 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> The review below was started for v31, then adapted to v32 when that
> arrived. I haven't cross-checked this with v33.

There's been hardly any changes to 0001- in quite a while, so that's fine.

> > Teach nbtree composite index scans to opportunistically skip over
> > irrelevant sections of composite indexes given a query with an omitted
> > prefix column.
>
> AFAIK, this is only the second reference to "composite" indexes, after
> a single mention in a comment in nbtsplitloc.c's _bt_strategy. IIUC,
> this refers to multi-column indexes, which is more frequently and more
> consistently used across the code base.

I don't think it makes much difference, but sure, I'll use the
multi-column index terminology. In both code comments, and in commit
messages.

> > When nbtree is passed input scan keys derived from a
> > query predicate "WHERE b = 5", new nbtree preprocessing steps now output
> > "WHERE a = ANY(<every possible 'a' value>) AND b = 5" scan keys.
>
> [...] new nbtree preprocessing steps now output +the equivalent of+ [...]
>
> > Preprocessing can freely add a skip array before or after any input
> > ScalarArrayOp arrays.
>
> This implies skip arrays won't be generated for WHERE b = 5 (a
> non-SAOP scankey) or WHERE b < 3 (not SAOP either), which is probably
> not intentional, so a rewording would be appreciated.

I don't agree. Yes, that sentence (taken in isolation) does not make
it 100% unambiguous. But, would anybody ever actually be misled? Even
once, ever? The misinterpretation of my words that you're concerned
about is directly contradicted by the whole opening paragraph. Plus,
it just doesn't make any sense. Obviously, you yourself never actually
interpreted it that way. Right?

The paragraph that this sentence appears in is all about the various
ways in which SAOP arrays and skip arrays are the same thing, except
at the very lowest level of the _bt_advance_array_keys code. I think
that that context makes it particularly unlikely that anybody would
ever think that I mean to imply something about the ways in which
non-array keys can be composed alongside skip arrays.

I'm pushing back here because I think that there's a real cost to
using overly defensive language, aimed at some imaginary person. The
problem with catering to such a person is that, overall, the
clarifications do more harm than good. What seems to end up happening
(I speak from experience with writing overly defensive comments) is
that the superfluous clarifications are read (by actual readers) the
wrong way -- they're read as if we must have meant quite a lot more
than what we actually intended.

More generally, I feel that it's a mistake to try to interpret your
words on behalf of the reader. While it's definitely useful to try to
anticipate the ways in which your reader might misunderstand, you
cannot reasonably do the work for them. It's usually (though not
always) best to deal with anticipated points of confusion by subtly
constructing examples that suggest that some plausible wrong
interpretation is in fact wrong, without drawing attention to it.
Coming right out and telling the reader what to not think *is* an
important tool, but it should be reserved for cases where it's truly
necessary.

> // nbtpreprocesskeys.c
>
> > +static bool _bt_s**parray_shrink
>
> I'd like to understand the "shrink" here, as it's not entirely clear to me.
> The functions are exclusively called through dispatch in
> _bt_compare_array_scankey_args, and I'd expected that naming to be
> extended to these functions.

I don't see the problem? As Alena pointed out, we're shrinking the
arrays here (or are likely to), meaning that we're usually going to
eliminate some subset of array elements. It's possible that this will
happen more than once for a given array (since there could be more
than one "contradictory" key on input). An array can only shrink
within _bt_compare_array_scankey_args -- it can never have new array
elements added.

> > + * This qual now becomes "WHERE x = ANY('{every possible x value}') and y = 4"
>
> I understand what you're going for, but a reference that indexed NULLs
> are still handled correctly (rather than removed by the filter) would
> be appreciated, as the indicated substitution would remove NULL
> values. (This doesn't have to be much more than a footnote.)

Why, though? Obviously, '{every possible x value}' isn't a valid array
literal. Doesn't that establish that this isn't a 100% literal
statement of fact?

There are a handful of places where I make a similar statement (the
commit message is another one, as is selfuncs.c). I do make this same
point about NULLs being just another value in selfuncs.c, though only
because it's relevant there. I don't want to talk about NULLs here,
because they just aren't relevant to this high-level overview at the
top of _bt_preprocess_keys. We do talk about the issue of skip arrays
and IS NOT NULL constraints elsewhere in nbtree: we talk about those
issues in _bt_first (shortly after _bt_preprocess_keys is called).

Again, it comes down to what the reader might actually be confused by,
in the real world. Is it really plausible that I could ever commit a
skip scan patch that wholly forgot to deal with NULLs sensibly? Would
you personally ever think that I could make such an obvious blunder in
a committed patch? And if you did, how long would it take you to
figure out that there was no such oversight?

> > + * Also sets *numSkipArrayKeys to # of skip arrays _bt_preprocess_array_keys
> > + * caller must add to the scan keys it'll output.  Caller must add this many
> > + * skip arrays to each of the most significant attributes lacking any keys
>
> I don't think that's a good description; as any value other than 0 or
> 1 would mean more than one skip array per such attribute, which is
> obviously incorrect. I think I'd word it like:
>
> + * Also sets *numSkipArrayKeys to # of skip arrays _bt_preprocess_array_keys
> + * caller must add to the scan keys it'll output.  Caller will have to add
> + * this many skip arrays: one for each of the most significant attributes
> + * lacking any keys that use the = strategy [...]

I agree that it's better your way. Will fix.

> > +#if 0
> > +                /* Could be a redundant input scan key, so can't do this: */
> > +                Assert(inkey->sk_strategy == BTEqualStrategyNumber ||
> > +                       (inkey->sk_flags & SK_SEARCHNULL));
> > +#endif
>
> I think this should be removed?

Why? This is me expressing that I wish I could write this assertion,
but it won't quite work. I cannot rule out rare corner-cases involving
a contradictory pair of input keys, only one of which is a = key (the
other might be some kind of inequality, which makes this would-be
assertion not quite work). (You'll see similar "almost assertions"
from time to time, in different parts of the codebase.)

> > +#ifdef DEBUG_DISABLE_SKIP_SCAN
>
> I noticed this one and only reference to DEBUG_DISABLE_SKIP_SCAN. Are
> you planning on keeping that around, or is this a leftover?

I deliberately left this in place, just in case somebody wants to see
what happens when preprocessing stops generating skip arrays entirely.
Without this, it's not too obvious that it can just be turned off by
forcing _bt_num_array_keys to return early.

> > +        ScanKey        inkey = scan->keyData + i;
> > +
> > +        /*
> > +         * Backfill skip arrays for any wholly omitted attributes prior to
> > +         * attno_inkey
> > +         */
> > +        while (attno_skip < attno_inkey)
>
> I don't understand why we're adding skip keys before looking at the
> contents of this scankey, nor why we're backfilling skip keys based on
> a now old value of attno_inkey. Please
> add some comments on how/why this is the right approach.

_bt_num_array_keys should add exactly the minimum number of skip
arrays that will allow the standard _bt_preprocess_keys logic to mark
every input scan key (copied from scan->keyData[]) as required to
continue the scan on output (when output to so->keyData[]). It just
makes sense to structure the loop in a way that adds skip arrays just
before moving on to some input scan key on some never-before-seen
index column -- that's just how this needs to work.

Very early versions of the patch added skip arrays in cases involving
inequalities that could already be marked required. That approach
could probably work, but has no advantages, and some downsides. Now we
avoid adding skip arrays given a simple case like "WHERE a BETWEEN 1
AND 10". We only want to do it in cases like "WHERE a BETWEEN 1 AND 10
AND b = 42" -- since adding a skip array on "a" is strictly necessary
to be able to mark the = key on "b" as required. In the latter case,
the loop inside _bt_num_array_keys will add a skip array on "a" once
it gets past the final "a" input key (i.e. once it sees that the next
key is the = key on "b"). In the former case, there is no key on "b",
and so we don't add any skip arrays at all (which is the correct
behavior).

BTW, this _bt_num_array_keys code was primarily tested by writing lots
of complicated cases that tickled every edge-case I could think of. I
wrote tests that relied on my nbtree scan instrumentation patch, which
can print the details of both input and output keys -- I didn't rely
on testing any runtime behavior for this (that wouldn't have worked as
well). This includes any key markings (markings such as
SK_BT_REQFWD/SK_BT_REQBKWD), which is what I expected to see on every
scan key output (barring a couple of special cases, such as the
RowCompare case). Again, that is all that the "where do we add skip
arrays?" logic in _bt_num_array_keys is concerned with.

> > prev_numSkipArrayKeys, *numSkipArrayKeys
>
> I'm a bit confused why we primarily operate on *numSkipArrayKeys,
> rather than a local variable that we store in *numSkipArrayKeys once
> we know we can generate skip keys. I.e., I'd expected something more
> in line with the following snippet (incomplete code blocks):
> I think this is easier for the compiler to push the store operation
> out of the loop (assuming it's optimizable at all; but even if it
> isn't it removes the load of *numSkipArrayKeys from the hot path).

What if I just had a local copy of numSkipArrayKeys, and copied back
into caller's arg when the function returns? We'll still need a
prev_numSkipArrayKeys under this scheme, but we won't have to set the
caller's pointer until right at the end anymore (which, I agree, seems
like it might be worth avoiding).

> I think the increment/decrement callbacks for skipsupport should
> explicitly check (by e.g. Assert) for NULL (or alternatively: same
> value) returns on overflow, and the API definition should make this
> explicit.

But the API definition *does* specifically address the opclass
author's responsibilities around NULLs? It specifically says that it's
not up to the opclass author to think about them at all.

> The current system is quite easy to build a leaking
> implementation for. Sure, there are other easy ways to break this, but
> I think it's an easy API modification that makes things just that bit
> safer.

How can I do that? The callbacks themselves (the callbacks for
functions that are called as the scan progresses) don't use the fmgr
interface.

They're simple C function pointers (rather like sort support
callbacks), and do not have any args related to NULLs. They accept a
raw Datum, which can never be a NULL. The convention followed by
similar functions that are passed a Datum that might just be a NULL is
for the function to also accept a separate "bool isnull" argument.
(Just not having such a separate bool arg is another existing
convention, and the one that I've followed here.)

> A renewed review for 0002+ will arrive at a later point.

Thanks for the review!

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: Improving tracking/processing of buildfarm test failures
Next
From: Sami Imseih
Date:
Subject: Re: RFC: Logging plan of the running query