Re: BUG #17245: Index corruption involving deduplicated entries - Mailing list pgsql-bugs
From | Peter Geoghegan |
---|---|
Subject | Re: BUG #17245: Index corruption involving deduplicated entries |
Date | |
Msg-id | CAH2-WzmM+NjF3FfzfPau8ZmDN=qx9E+=1LAUdOB6ce89DX=3RQ@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #17245: Index corruption involving deduplicated entries (Andres Freund <andres@anarazel.de>) |
Responses |
Re: BUG #17245: Index corruption involving deduplicated entries
Re: BUG #17245: Index corruption involving deduplicated entries |
List | pgsql-bugs |
On Sat, Oct 30, 2021 at 3:18 PM Andres Freund <andres@anarazel.de> wrote: > Hm. I wonder if it's not actually good to do something like it in 14, given > that we know of a path to have corrupted indexes out there. My concern is that people might not be too happy about errors that occur due to corruption in tuples that are only vaguely related to their own incoming inserted tuple. > > Separately, we should add an assertion that catches cases where a TID > > in the index points to an LP_REDIRECT line pointer, which does not > > point to a heap tuple with storage. > > Is it actually guaranteed that that's impossible to happen and wasn't ever > possible? It's not obvious to me that a LP_REDIRECT pointing to an LP_DEAD > tuple would be a real problem. As far as I know an LP_REDIRECT item can never point to an LP_DEAD item. As far as I know it has worked that way since HOT first went in. There certainly seems to be far fewer problems with a rule that says it can never happen, and so I think we should introduce that rule (or clear things up, if you prefer). ISTM that this is all about TID stability for indexes. The only reason we have LP_DEAD items in heap pages is to have something that reliably tells index scans that the heap tuple they're looking for is logically dead (and so we can't recycle TIDs except during VACUUM). Similarly, the only reason we have LP_REDIRECT items is so that index scans have lightweight stable forwarding information from the heap page that contains the HOT chain. The advantage of LP_REDIRECT items is that they allow pruning to avoid keeping around the original heap tuple (the tuple that was not a heap-only tuple) after pruning. Also, pruning doesn't have to "merge line pointers" such that the new first member of a HOT chain has the same page offset number as the original first member had -- imagine how messy that would have to be. VACUUM only ever deletes index tuples from indexes when their pointed-to TIDs were found to be LP_DEAD stub line pointers in the first heap pass. Such an LP_DEAD item represents a (former) whole HOT chain, which could just be a "single tuple degenerate HOT chain", that never actually had any heap-only tuples (say because all UPDATEs for the table will modify indexed columns). If an LP_REDIRECT item points to an LP_DEAD item, then what is VACUUM supposed to do about it when it comes time to vacuum indexes? Which TID is it supposed to delete from indexes? The LP_DEAD item, the LP_REDIRECT item that points to the LP_DEAD item, or both? While it's okay if the link from a tuple header in a HOT chain points to an LP_UNUSED item (that just indicates that the chain is "broken" at that point), it's not okay if a link from an LP_REDIRECT line pointer points to an LP_UNUSED item -- that's virtually the same condition as having a TID from an index point directly to an LP_UNUSED item, which is of course always wrong. We can do no "last tuple's xmax == current tuple's xmin" validation during chain traversal when the "last tuple" was actually just an LP_REDIRECT item. And so we need another rule for LP_REDIRECT items, to compensate. That rule seems to be: HOT chains cannot be allowed to "break" between the LP_REDIRECT item and the first tuple with storage. The only way it could be okay for an LP_REDIRECT item to point to an LP_DEAD item would be if you knew for sure that the LP_REDIRECT item would actually become LP_DEAD at the same time as the LP_DEAD item (so both get removed from indexes) -- which is a contradiction in terms. Why wouldn't pruning just mark the LP_REDIRECT item LP_DEAD instead, while making the would-be LP_DEAD item skip straight to being an LP_UNUSED item? That approach is strictly better. It just makes sense, which leads me to believe that we must have always done it that way. It would be nice to be able to say for sure that we have a simple rule: "marking a heap-only tuple LP_DEAD is always not just unnecessary, but wrong, because LP_DEAD items in heap pages are supposed to have a 1:1 mapping with dead index tuples". -- Peter Geoghegan
pgsql-bugs by date: