Re: [PROPOSAL] Covering + unique indexes. - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: [PROPOSAL] Covering + unique indexes. |
Date | |
Msg-id | CAM3SWZQmxk1HnV8KgcODoZyLVgfaVEpB84Ci1YyMBPNYux0Wmw@mail.gmail.com Whole thread Raw |
In response to | Re: [PROPOSAL] Covering + unique indexes. (Peter Geoghegan <pg@heroku.com>) |
List | pgsql-hackers |
On Mon, Mar 14, 2016 at 8:43 PM, Peter Geoghegan <pg@heroku.com> wrote: > Does this work with amcheck? Maybe it works with bt_index_check(), but > not bt_index_parent_check()? I think that you need to make sure that > _bt_compare() knows about this, too. That's because it isn't good > enough to let a truncated internal IndexTuple compare equal to a > scankey when non-truncated attributes are equal. I think you need to > have an imaginary "minus infinity" attribute past the first > non-truncated attribute (i.e. "minus infinity value" for the first > *truncated* attribute). That way, the downlinks will always be lower > bounds when the non-"included"/truncated attributes are involved. This > seems necessary. No? Actually, I now think I got this slightly wrong. What's at issue is this (from nbtree README): """ Lehman and Yao assume that the key range for a subtree S is described by Ki < v <= Ki+1 where Ki and Ki+1 are the adjacent keys in the parent page. This does not work for nonunique keys (for example, if we have enough equal keys to spread across several leaf pages, there *must* be some equal bounding keys in the first level up). Therefore we assume Ki <= v <= Ki+1 instead. A search that finds exact equality to a bounding key in an upper tree level must descend to the left of that key to ensure it finds any equal keys in the preceding page. """ Today, nbtree needs to check the page to the left of an equal internal page downlink child anyway. That isn't hard-coded into _bt_compare(), though. If it was, it would be a "positive infinity" attribute, not "negative infinity" as I incorrectly said. This is because the equal IndexTuples might easily not begin exactly at the beginning of the downlink's child page (often, we should begin in the left page instead, by following the previous downlink in the parent instead -- just in case). Any kind of "infinity" attribute probably isn't necessary for your patch today, since, as referenced in the README extract above, our slightly non-standard L&Y does this in _bt_binsrch(): /* * At this point we have high == low, but be careful: they could point * past the last slot on the page. * * On a leaf page, we always return the first key >= scan key (resp. > * scan key), which could be the last slot + 1. */ if (P_ISLEAF(opaque)) return low; However, I think it's still a good idea to have a special integer in the IndexTuple explicitly indicating the attribute at which the "suffix" is truncated, even if the "suffix truncation" happens at a consistent point based on an index in your patch. That will change in the future, and we should be prepared. Even though I was partially mistaken, clearly it still wasn't okay to even try to compare non-existent attributes in internal pages, since that segfaulted. So a (mostly imaginary) "positive infinity" attribute can still exist, initially just to make _bt_compare() not crash. This attribute number (stored in "itup->t_tid.ip_posid") effectively tells the binary search code to look at the child to the left of the compared downlink (not the downlink child itself), even though that's already going to happen per the code above. So, thinking about it once more (uh, sorry), _bt_compare() has to "indicate equality"/return 0, *despite* being *logically* a "positive infinity" comparison from a higher level, in order to let the code above to handle it instead, so it isn't handled more than once. Also, not sure if less common "nextkey == true" case needs some further consideration (lets forget that detail for time being, though). Phew! So, as I said, _bt_binsrch() and/or _bt_compare() can be fixed to make sure that the scan arrives on the correct leaf page (the first leaf page that an matching IndexTuple could be on). What then, though? What about leaf pages, that *do* have the extra attributes ("INCLUDING" attributes) represented in their tuples, and *don't* "return OffsetNumberPrev(low)" at the end of _bt_binsrch() (they do the P_LEAF() thing quoted above)? Are they safe? Remember: * For nextkey=false (cmpval=1), the loop invariant is: all slots before * 'low' are < scan key, all slots at or after 'high' are >= scan key. I think this means that you need to be very careful about leaf pages, too. Speculative insertion (used by UPSERT) may not even have the extra attributes, during the precheck that starts from within check_exclusion_or_unique_constraint() -- it needs to start from the very start at the leaf level, without regard for the particular details of the non-constrained extra columns. I see that you take the number of attributes a new way, so that ultimately _bt_compare()'s "keysz" argument only includes "full" attributes for cases like the UPSERT one. That seems okay. However, what about the case where a tuple is inserted? We need to do unique enforcement on the one hand, but need to start at the right place for insertion on the other hand. Is it okay that _bt_findinsertloc() is passed "indnkeyatts" rather than "natts" now? Now, it looks okay that _bt_check_unique() has also been directly fixed to do the same, but I don't think that _bt_findinsertloc() should have received this treatment. Otherwise, the ordering on leaf pages is subtly broken, which I don't think is okay. I don't see tests for NULL values, which are a little special. Does _bt_isequal() really not require additional checks? It looks correct at a quick glance, but you should definitely have tests for this. Lots of them. Testing ------- Maybe you should add this to your tools used for testing, just customized a bit to use your new feature: https://github.com/petergeoghegan/jjanes_upsert . This stress-testing tool could be very interesting -- it was extremely valuable during the development of UPSERT (code is a bit rough, though -- I don't really know Perl). It should really stress the implementation, and show bugs. I suggest hacking amcheck to only check the leaf level, and make sure it alone is sane/in full order, correcting the patch as needed. This is a small step, and so possibly a good next step. Once we know the keyspace is at least sane at the leaf level (which I think it isn't right now, due to the _bt_findinsertloc() issue), we can build on it. After all, the keyspace at the leaf level should be the same with or without this patch -- right? We can fix that *in isolation*, gaining confidence with amcheck (hacked to just care about leaf pages), which is relatively straightforward. Afterwards, we move on to the more complicated matter of internal pages. Also, I attach a file with a selection of SQL queries that use pageinspect. I wrote these to get a quick sense of the keyspace of a B-Tree index, and a few other interesting things -- seeing how the keyspace looks by looking at internal page highkeys (the last query) is probably particularly valuable for testing a patch like this, or a more general suffix truncation patch. You might have written queries like this yourself already, and if not you can start with these. I was actually thinking of putting them on Github or something myself, but they're a bit rough at the moment. Getting a "quick sense of the keyspace" with the last query lets you see when it has suddenly become unbalanced during development -- it's a good thing to keep an eye on. That's all I have for today. I still haven't actually built the patch myself -- this feedback is all based on reading the code. So hope I missed nothing obvious due to that. -- Peter Geoghegan
Attachment
pgsql-hackers by date: