Re: Adding skip scan (including MDAM style range skip scan) to nbtree - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Adding skip scan (including MDAM style range skip scan) to nbtree |
Date | |
Msg-id | CAH2-Wzk9gBuPxUPD6KwiDeTvBtBOXk9pQAj0Lxi4Zg8uVLTt=w@mail.gmail.com Whole thread Raw |
In response to | Re: Adding skip scan (including MDAM style range skip scan) to nbtree (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Responses |
Re: Adding skip scan (including MDAM style range skip scan) to nbtree
|
List | pgsql-hackers |
On Tue, Apr 1, 2025 at 10:40 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > The review below was started for v31, then adapted to v32 when that > arrived. I haven't cross-checked this with v33. There's been hardly any changes to 0001- in quite a while, so that's fine. > > Teach nbtree composite index scans to opportunistically skip over > > irrelevant sections of composite indexes given a query with an omitted > > prefix column. > > AFAIK, this is only the second reference to "composite" indexes, after > a single mention in a comment in nbtsplitloc.c's _bt_strategy. IIUC, > this refers to multi-column indexes, which is more frequently and more > consistently used across the code base. I don't think it makes much difference, but sure, I'll use the multi-column index terminology. In both code comments, and in commit messages. > > When nbtree is passed input scan keys derived from a > > query predicate "WHERE b = 5", new nbtree preprocessing steps now output > > "WHERE a = ANY(<every possible 'a' value>) AND b = 5" scan keys. > > [...] new nbtree preprocessing steps now output +the equivalent of+ [...] > > > Preprocessing can freely add a skip array before or after any input > > ScalarArrayOp arrays. > > This implies skip arrays won't be generated for WHERE b = 5 (a > non-SAOP scankey) or WHERE b < 3 (not SAOP either), which is probably > not intentional, so a rewording would be appreciated. I don't agree. Yes, that sentence (taken in isolation) does not make it 100% unambiguous. But, would anybody ever actually be misled? Even once, ever? The misinterpretation of my words that you're concerned about is directly contradicted by the whole opening paragraph. Plus, it just doesn't make any sense. Obviously, you yourself never actually interpreted it that way. Right? The paragraph that this sentence appears in is all about the various ways in which SAOP arrays and skip arrays are the same thing, except at the very lowest level of the _bt_advance_array_keys code. I think that that context makes it particularly unlikely that anybody would ever think that I mean to imply something about the ways in which non-array keys can be composed alongside skip arrays. I'm pushing back here because I think that there's a real cost to using overly defensive language, aimed at some imaginary person. The problem with catering to such a person is that, overall, the clarifications do more harm than good. What seems to end up happening (I speak from experience with writing overly defensive comments) is that the superfluous clarifications are read (by actual readers) the wrong way -- they're read as if we must have meant quite a lot more than what we actually intended. More generally, I feel that it's a mistake to try to interpret your words on behalf of the reader. While it's definitely useful to try to anticipate the ways in which your reader might misunderstand, you cannot reasonably do the work for them. It's usually (though not always) best to deal with anticipated points of confusion by subtly constructing examples that suggest that some plausible wrong interpretation is in fact wrong, without drawing attention to it. Coming right out and telling the reader what to not think *is* an important tool, but it should be reserved for cases where it's truly necessary. > // nbtpreprocesskeys.c > > > +static bool _bt_s**parray_shrink > > I'd like to understand the "shrink" here, as it's not entirely clear to me. > The functions are exclusively called through dispatch in > _bt_compare_array_scankey_args, and I'd expected that naming to be > extended to these functions. I don't see the problem? As Alena pointed out, we're shrinking the arrays here (or are likely to), meaning that we're usually going to eliminate some subset of array elements. It's possible that this will happen more than once for a given array (since there could be more than one "contradictory" key on input). An array can only shrink within _bt_compare_array_scankey_args -- it can never have new array elements added. > > + * This qual now becomes "WHERE x = ANY('{every possible x value}') and y = 4" > > I understand what you're going for, but a reference that indexed NULLs > are still handled correctly (rather than removed by the filter) would > be appreciated, as the indicated substitution would remove NULL > values. (This doesn't have to be much more than a footnote.) Why, though? Obviously, '{every possible x value}' isn't a valid array literal. Doesn't that establish that this isn't a 100% literal statement of fact? There are a handful of places where I make a similar statement (the commit message is another one, as is selfuncs.c). I do make this same point about NULLs being just another value in selfuncs.c, though only because it's relevant there. I don't want to talk about NULLs here, because they just aren't relevant to this high-level overview at the top of _bt_preprocess_keys. We do talk about the issue of skip arrays and IS NOT NULL constraints elsewhere in nbtree: we talk about those issues in _bt_first (shortly after _bt_preprocess_keys is called). Again, it comes down to what the reader might actually be confused by, in the real world. Is it really plausible that I could ever commit a skip scan patch that wholly forgot to deal with NULLs sensibly? Would you personally ever think that I could make such an obvious blunder in a committed patch? And if you did, how long would it take you to figure out that there was no such oversight? > > + * Also sets *numSkipArrayKeys to # of skip arrays _bt_preprocess_array_keys > > + * caller must add to the scan keys it'll output. Caller must add this many > > + * skip arrays to each of the most significant attributes lacking any keys > > I don't think that's a good description; as any value other than 0 or > 1 would mean more than one skip array per such attribute, which is > obviously incorrect. I think I'd word it like: > > + * Also sets *numSkipArrayKeys to # of skip arrays _bt_preprocess_array_keys > + * caller must add to the scan keys it'll output. Caller will have to add > + * this many skip arrays: one for each of the most significant attributes > + * lacking any keys that use the = strategy [...] I agree that it's better your way. Will fix. > > +#if 0 > > + /* Could be a redundant input scan key, so can't do this: */ > > + Assert(inkey->sk_strategy == BTEqualStrategyNumber || > > + (inkey->sk_flags & SK_SEARCHNULL)); > > +#endif > > I think this should be removed? Why? This is me expressing that I wish I could write this assertion, but it won't quite work. I cannot rule out rare corner-cases involving a contradictory pair of input keys, only one of which is a = key (the other might be some kind of inequality, which makes this would-be assertion not quite work). (You'll see similar "almost assertions" from time to time, in different parts of the codebase.) > > +#ifdef DEBUG_DISABLE_SKIP_SCAN > > I noticed this one and only reference to DEBUG_DISABLE_SKIP_SCAN. Are > you planning on keeping that around, or is this a leftover? I deliberately left this in place, just in case somebody wants to see what happens when preprocessing stops generating skip arrays entirely. Without this, it's not too obvious that it can just be turned off by forcing _bt_num_array_keys to return early. > > + ScanKey inkey = scan->keyData + i; > > + > > + /* > > + * Backfill skip arrays for any wholly omitted attributes prior to > > + * attno_inkey > > + */ > > + while (attno_skip < attno_inkey) > > I don't understand why we're adding skip keys before looking at the > contents of this scankey, nor why we're backfilling skip keys based on > a now old value of attno_inkey. Please > add some comments on how/why this is the right approach. _bt_num_array_keys should add exactly the minimum number of skip arrays that will allow the standard _bt_preprocess_keys logic to mark every input scan key (copied from scan->keyData[]) as required to continue the scan on output (when output to so->keyData[]). It just makes sense to structure the loop in a way that adds skip arrays just before moving on to some input scan key on some never-before-seen index column -- that's just how this needs to work. Very early versions of the patch added skip arrays in cases involving inequalities that could already be marked required. That approach could probably work, but has no advantages, and some downsides. Now we avoid adding skip arrays given a simple case like "WHERE a BETWEEN 1 AND 10". We only want to do it in cases like "WHERE a BETWEEN 1 AND 10 AND b = 42" -- since adding a skip array on "a" is strictly necessary to be able to mark the = key on "b" as required. In the latter case, the loop inside _bt_num_array_keys will add a skip array on "a" once it gets past the final "a" input key (i.e. once it sees that the next key is the = key on "b"). In the former case, there is no key on "b", and so we don't add any skip arrays at all (which is the correct behavior). BTW, this _bt_num_array_keys code was primarily tested by writing lots of complicated cases that tickled every edge-case I could think of. I wrote tests that relied on my nbtree scan instrumentation patch, which can print the details of both input and output keys -- I didn't rely on testing any runtime behavior for this (that wouldn't have worked as well). This includes any key markings (markings such as SK_BT_REQFWD/SK_BT_REQBKWD), which is what I expected to see on every scan key output (barring a couple of special cases, such as the RowCompare case). Again, that is all that the "where do we add skip arrays?" logic in _bt_num_array_keys is concerned with. > > prev_numSkipArrayKeys, *numSkipArrayKeys > > I'm a bit confused why we primarily operate on *numSkipArrayKeys, > rather than a local variable that we store in *numSkipArrayKeys once > we know we can generate skip keys. I.e., I'd expected something more > in line with the following snippet (incomplete code blocks): > I think this is easier for the compiler to push the store operation > out of the loop (assuming it's optimizable at all; but even if it > isn't it removes the load of *numSkipArrayKeys from the hot path). What if I just had a local copy of numSkipArrayKeys, and copied back into caller's arg when the function returns? We'll still need a prev_numSkipArrayKeys under this scheme, but we won't have to set the caller's pointer until right at the end anymore (which, I agree, seems like it might be worth avoiding). > I think the increment/decrement callbacks for skipsupport should > explicitly check (by e.g. Assert) for NULL (or alternatively: same > value) returns on overflow, and the API definition should make this > explicit. But the API definition *does* specifically address the opclass author's responsibilities around NULLs? It specifically says that it's not up to the opclass author to think about them at all. > The current system is quite easy to build a leaking > implementation for. Sure, there are other easy ways to break this, but > I think it's an easy API modification that makes things just that bit > safer. How can I do that? The callbacks themselves (the callbacks for functions that are called as the scan progresses) don't use the fmgr interface. They're simple C function pointers (rather like sort support callbacks), and do not have any args related to NULLs. They accept a raw Datum, which can never be a NULL. The convention followed by similar functions that are passed a Datum that might just be a NULL is for the function to also accept a separate "bool isnull" argument. (Just not having such a separate bool arg is another existing convention, and the one that I've followed here.) > A renewed review for 0002+ will arrive at a later point. Thanks for the review! -- Peter Geoghegan
pgsql-hackers by date: