Re: Making all nbtree entries unique by having heap TIDs participatein comparisons - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Making all nbtree entries unique by having heap TIDs participatein comparisons |
Date | |
Msg-id | CAH2-Wz=pTfuTwcYcgVz2nhaynVxQBww=BdPyinxQGYORBMCbEw@mail.gmail.com Whole thread Raw |
In response to | Re: Making all nbtree entries unique by having heap TIDs participatein comparisons (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
|
List | pgsql-hackers |
On Sat, Mar 16, 2019 at 1:33 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > AFAICS, there is a copy of every page's high key in its immediate > parent. Either in the downlink of the right sibling, or as the high key > of the parent page itself. Cross-checking those would catch any > corruption in high keys. I agree that it's always true that the high key is also in the parent, and we could cross-check that within the child. Actually, I should probably figure out a way of arranging for the Bloom filter used within bt_relocate_from_root() (which has been around since PG v11) to include the key itself when fingerprinting. That would probably necessitate that we don't truncate "negative infinity" items (it was actually that way about 18 years ago), just for the benefit of verification. This is almost the same thing as what Graefe argues for (don't think that you need a low key on the leaf level, since you can cross a single level there). I wonder if truncating the negative infinity item in internal pages to zero attributes is actually worth it, since a low key might be useful for a number of reasons. > Note that this would potentially catch some corruption that the > descend-from-root would not. If you have a mismatch between the high key > of a leaf page and its parent or grandparent, all the current tuples > might be pass the rootdescend check. But a tuple might get inserted to > wrong location later. I also agree with this. However, the rootdescend check will always work better than this in some cases that you can at least imagine, for as long as there are negative infinity items to worry about. (And, even if we decided not to truncate to support easy verification, there is still a good argument to be made for involving the authoritative _bt_search() code at some point). > > Maybe you could actually do something with the high key from leaf page > > 5 to detect the stray value "20" in leaf page 6, but again, we don't > > do anything like that right now. > > Hmm, yeah, to check for stray values, you could follow the left-link, > get the high key of the left sibling, and compare against that. Graefe argues for retaining a low key, even in leaf pages (the left page's old high key becomes the left page's low key during a split, and the left page's new high key becomes the new right pages low key at the same time). It's useful for what he calls "write-optimized B-Trees", and maybe even for optional compression. As I said earlier, I guess you can just go left on the leaf level if you need to, and you have all you need. But I'd need to think about it some more. Point taken; rootdescend isn't enough to make verification exactly perfect. But it makes verification approach being perfect, because you're going to get right answers to queries as long as it passes (I think). There could be a future problem for a future insertion that you could also detect, but can't. But you'd have to be extraordinarily unlucky to have that happen for any amount of time. Unlucky even by my own extremely paranoid standard. -- Peter Geoghegan
pgsql-hackers by date: