Re: Adding skip scan (including MDAM style range skip scan) to nbtree - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Date
Msg-id CAH2-WznbeZA+59gJHzRz1GnpDRqK7G=n83ribXs4iTx=QSfeuA@mail.gmail.com
Whole thread Raw
In response to Re: Adding skip scan (including MDAM style range skip scan) to nbtree  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
List pgsql-hackers
On Tue, Apr 1, 2025 at 4:16 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> While I agree that there is such a cost, I don't think that this is
> too far fetched. They are not just added when we have SAOP scankeys,
> and I think the non-SAOP case is one of the most important areas where
> this patch improves performance. Yes, this patch improves performance
> for queries with only SAOP on non-first keys, but I've seen more
> non-SAOP queries where this patch would improve performance than
> similar queries but with SAOP.

That's all likely to be true. I just don't think that the commit
message for the big commit (the part that you took issue with) said
anything that suggests otherwise.

To recap, the sentence in question says "Preprocessing can freely add
a skip array before or after any input ScalarArrayOp arrays". This is
mostly just making a high level point about the design itself -- so I
just don't get what you mean.

The "everything is an array" design is what allows skip arrays to work
with a qual like "WHERE b IN (1, 2, 3)", as you say. It's also what
allows things like "WHERE a IN (100, 500) AND c = 55" to work
efficiently, without introducing any special cases -- it works both
ways. And, a pair of skip arrays can also be composed together, in
just the same way as a pair of SAOP arrays. This all works in the same
way; _bt_advance_array_keys concepts like array roll over continue to
apply, with essentially zero changes to the high level design,
relative to Postgres 17. That's the core idea that the paragraph in
question conveys.

Recall that _bt_advance_array_keys likes to think of simple scalar =
keys as a degenerate single-value array. They are the same thing, for
the purposes of rolling over the scan's arrays. We need to use a 3-way
ORDER proc for scalar scan keys for this reason.

> It's mostly as an observation (and not problem per se) that "compare"
> (which sounds like a pure function that doens't modify anything, e.g.
> _bt_compare) is used to dispatch to "shrink" (which does sound like
> it'd modify something).

It sounds like it's modifying something because (as you must know) it
does just that. This has been the case since the Postgres 17 SAOP
patch, of course (only the use of the term "shrink" in a helper
function is new here).

I don't want to rename _bt_compare_scankey_args now (that name is well
over 20 years old). That would be what it would take to make this
consistent in the way you expect. I just don't think it matters very
much.

> My comment is not about your potential future actions, but rather what
> any future developer or committer working on this code might think and
> worry about when reading this. = ANY {} constructs *always* have NOT
> NULL behaviour, just like any other operator clause that isn't "IS
> NULL", so clarifying that this is only similar -and does not behave
> the same in important edge cases- is IMO important.

> Not specifically for you, but for any other developer trying to get a correct
> understanding of how this works and why it is correct.

How many times does it have to be clarified, though? Do I have to put
something about it anywhere I give a brief high-level description of
how skip arrays work, where it's natural to compare them to a
traditional SAOP that generates all possible matching elements?
Explaining the concepts in question is hard enough, without having to
always list all of the ways that my analogy isn't the full and literal
truth of the matter. It's already extremely obvious that it must be
far from a literal account of what happens.

> It is my understanding that those are mostly historical artifacts of
> the code base, and not systems in active development. Their rarety
> makes it difficult to put numbers on, but IIRC at least one such case
> was recently removed for bitrot and apparent lack of use in years.

It's effectively a comment (nobody is expected to ever uncomment it by
removing the "#ifdef 0"). Sometimes, comments become obsolete. It's a
trade-off.

> > What if I just had a local copy of numSkipArrayKeys, and copied back
> > into caller's arg when the function returns? We'll still need a
> > prev_numSkipArrayKeys under this scheme, but we won't have to set the
> > caller's pointer until right at the end anymore (which, I agree, seems
> > like it might be worth avoiding).
>
> That's a nice alternative too, indeed.

I'll do it that way in the commited patch. That's probably not going
to happen until Friday morning EST, to give me another full day to
work some more on the docs.

I don't see much point in posting another version of the patchset to the list.

> > But the API definition *does* specifically address the opclass
> > author's responsibilities around NULLs? It specifically says that it's
> > not up to the opclass author to think about them at all.
>
> Yes. What I'm suggesting is to make the contract enforceable to a
> degree. If it was defined to "return (Datum) 0 on overflow", then that
> could be Assert()ed; and code that does leak memory could get detected
> by static analysis tools in the function scope because the allocation
> didn't get returned, but with this definition returning an allocation
> is never detected because that's not part of the contract.

All B-Tree support functions aren't allowed to leak memory. Same with
all operators. This will be far from the only time that we expect
opclass authors to get that right. This mostly works just fine,
probably because an opclass that leaked memory like this would visibly
break quite quickly.

> You could Assert(inc_sk_argument == (Datum) 0) in the oflow branch?
> I'm certain that (Datum) 0 is an invalid representation of a pointer,
> so we know that no allocated value is returned (be it new or
> pre-existing).

I just don't see what the point would be. Nothing would stop a
decrement/increment callback that needs to allocate memory from
returning 0 and then leaking memory anyway.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: Re: Question -- why is there no errhint_internal function?
Next
From: Andres Freund
Date:
Subject: Re: Question -- why is there no errhint_internal function?