Re: Adding skip scan (including MDAM style range skip scan) to nbtree - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Adding skip scan (including MDAM style range skip scan) to nbtree |
Date | |
Msg-id | CAH2-WznbeZA+59gJHzRz1GnpDRqK7G=n83ribXs4iTx=QSfeuA@mail.gmail.com Whole thread Raw |
In response to | Re: Adding skip scan (including MDAM style range skip scan) to nbtree (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
List | pgsql-hackers |
On Tue, Apr 1, 2025 at 4:16 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > While I agree that there is such a cost, I don't think that this is > too far fetched. They are not just added when we have SAOP scankeys, > and I think the non-SAOP case is one of the most important areas where > this patch improves performance. Yes, this patch improves performance > for queries with only SAOP on non-first keys, but I've seen more > non-SAOP queries where this patch would improve performance than > similar queries but with SAOP. That's all likely to be true. I just don't think that the commit message for the big commit (the part that you took issue with) said anything that suggests otherwise. To recap, the sentence in question says "Preprocessing can freely add a skip array before or after any input ScalarArrayOp arrays". This is mostly just making a high level point about the design itself -- so I just don't get what you mean. The "everything is an array" design is what allows skip arrays to work with a qual like "WHERE b IN (1, 2, 3)", as you say. It's also what allows things like "WHERE a IN (100, 500) AND c = 55" to work efficiently, without introducing any special cases -- it works both ways. And, a pair of skip arrays can also be composed together, in just the same way as a pair of SAOP arrays. This all works in the same way; _bt_advance_array_keys concepts like array roll over continue to apply, with essentially zero changes to the high level design, relative to Postgres 17. That's the core idea that the paragraph in question conveys. Recall that _bt_advance_array_keys likes to think of simple scalar = keys as a degenerate single-value array. They are the same thing, for the purposes of rolling over the scan's arrays. We need to use a 3-way ORDER proc for scalar scan keys for this reason. > It's mostly as an observation (and not problem per se) that "compare" > (which sounds like a pure function that doens't modify anything, e.g. > _bt_compare) is used to dispatch to "shrink" (which does sound like > it'd modify something). It sounds like it's modifying something because (as you must know) it does just that. This has been the case since the Postgres 17 SAOP patch, of course (only the use of the term "shrink" in a helper function is new here). I don't want to rename _bt_compare_scankey_args now (that name is well over 20 years old). That would be what it would take to make this consistent in the way you expect. I just don't think it matters very much. > My comment is not about your potential future actions, but rather what > any future developer or committer working on this code might think and > worry about when reading this. = ANY {} constructs *always* have NOT > NULL behaviour, just like any other operator clause that isn't "IS > NULL", so clarifying that this is only similar -and does not behave > the same in important edge cases- is IMO important. > Not specifically for you, but for any other developer trying to get a correct > understanding of how this works and why it is correct. How many times does it have to be clarified, though? Do I have to put something about it anywhere I give a brief high-level description of how skip arrays work, where it's natural to compare them to a traditional SAOP that generates all possible matching elements? Explaining the concepts in question is hard enough, without having to always list all of the ways that my analogy isn't the full and literal truth of the matter. It's already extremely obvious that it must be far from a literal account of what happens. > It is my understanding that those are mostly historical artifacts of > the code base, and not systems in active development. Their rarety > makes it difficult to put numbers on, but IIRC at least one such case > was recently removed for bitrot and apparent lack of use in years. It's effectively a comment (nobody is expected to ever uncomment it by removing the "#ifdef 0"). Sometimes, comments become obsolete. It's a trade-off. > > What if I just had a local copy of numSkipArrayKeys, and copied back > > into caller's arg when the function returns? We'll still need a > > prev_numSkipArrayKeys under this scheme, but we won't have to set the > > caller's pointer until right at the end anymore (which, I agree, seems > > like it might be worth avoiding). > > That's a nice alternative too, indeed. I'll do it that way in the commited patch. That's probably not going to happen until Friday morning EST, to give me another full day to work some more on the docs. I don't see much point in posting another version of the patchset to the list. > > But the API definition *does* specifically address the opclass > > author's responsibilities around NULLs? It specifically says that it's > > not up to the opclass author to think about them at all. > > Yes. What I'm suggesting is to make the contract enforceable to a > degree. If it was defined to "return (Datum) 0 on overflow", then that > could be Assert()ed; and code that does leak memory could get detected > by static analysis tools in the function scope because the allocation > didn't get returned, but with this definition returning an allocation > is never detected because that's not part of the contract. All B-Tree support functions aren't allowed to leak memory. Same with all operators. This will be far from the only time that we expect opclass authors to get that right. This mostly works just fine, probably because an opclass that leaked memory like this would visibly break quite quickly. > You could Assert(inc_sk_argument == (Datum) 0) in the oflow branch? > I'm certain that (Datum) 0 is an invalid representation of a pointer, > so we know that no allocated value is returned (be it new or > pre-existing). I just don't see what the point would be. Nothing would stop a decrement/increment callback that needs to allocate memory from returning 0 and then leaking memory anyway. -- Peter Geoghegan
pgsql-hackers by date: