Re: BUG #17245: Index corruption involving deduplicated entries - Mailing list pgsql-bugs

From Peter Geoghegan
Subject Re: BUG #17245: Index corruption involving deduplicated entries
Date
Msg-id CAH2-WzmM+NjF3FfzfPau8ZmDN=qx9E+=1LAUdOB6ce89DX=3RQ@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17245: Index corruption involving deduplicated entries  (Andres Freund <andres@anarazel.de>)
Responses Re: BUG #17245: Index corruption involving deduplicated entries
Re: BUG #17245: Index corruption involving deduplicated entries
List pgsql-bugs
On Sat, Oct 30, 2021 at 3:18 PM Andres Freund <andres@anarazel.de> wrote:
> Hm. I wonder if it's not actually good to do something like it in 14, given
> that we know of a path to have corrupted indexes out there.

My concern is that people might not be too happy about errors that
occur due to corruption in tuples that are only vaguely related to
their own incoming inserted tuple.

> > Separately, we should add an assertion that catches cases where a TID
> > in the index points to an LP_REDIRECT line pointer, which does not
> > point to a heap tuple with storage.
>
> Is it actually guaranteed that that's impossible to happen and wasn't ever
> possible? It's not obvious to me that a LP_REDIRECT pointing to an LP_DEAD
> tuple would be a real problem.

As far as I know an LP_REDIRECT item can never point to an LP_DEAD
item. As far as I know it has worked that way since HOT first went in.
There certainly seems to be far fewer problems with a rule that says
it can never happen, and so I think we should introduce that rule (or
clear things up, if you prefer).

ISTM that this is all about TID stability for indexes. The only reason
we have LP_DEAD items in heap pages is to have something that reliably
tells index scans that the heap tuple they're looking for is logically
dead (and so we can't recycle TIDs except during VACUUM). Similarly,
the only reason we have LP_REDIRECT items is so that index scans have
lightweight stable forwarding information from the heap page that
contains the HOT chain. The advantage of LP_REDIRECT items is that
they allow pruning to avoid keeping around the original heap tuple
(the tuple that was not a heap-only tuple) after pruning. Also,
pruning doesn't have to "merge line pointers" such that the new first
member of a HOT chain has the same page offset number as the original
first member had -- imagine how messy that would have to be.

VACUUM only ever deletes index tuples from indexes when their
pointed-to TIDs were found to be LP_DEAD stub line pointers in the
first heap pass. Such an LP_DEAD item represents a (former) whole HOT
chain, which could just be a "single tuple degenerate HOT chain", that
never actually had any heap-only tuples (say because all UPDATEs for
the table will modify indexed columns). If an LP_REDIRECT item points
to an LP_DEAD item, then what is VACUUM supposed to do about it when
it comes time to vacuum indexes? Which TID is it supposed to delete
from indexes? The LP_DEAD item, the LP_REDIRECT item that points to
the LP_DEAD item, or both?

While it's okay if the link from a tuple header in a HOT chain points
to an LP_UNUSED item (that just indicates that the chain is "broken"
at that point), it's not okay if a link from an LP_REDIRECT line
pointer points to an LP_UNUSED item -- that's virtually the same
condition as having a TID from an index point directly to an LP_UNUSED
item, which is of course always wrong. We can do no "last tuple's xmax
== current tuple's xmin" validation during chain traversal when the
"last tuple" was actually just an LP_REDIRECT item. And so we need
another rule for LP_REDIRECT items, to compensate. That rule seems to
be: HOT chains cannot be allowed to "break" between the LP_REDIRECT
item and the first tuple with storage.

The only way it could be okay for an LP_REDIRECT item to point to an
LP_DEAD item would be if you knew for sure that the LP_REDIRECT item
would actually become LP_DEAD at the same time as the LP_DEAD item (so
both get removed from indexes) -- which is a contradiction in terms.
Why wouldn't pruning just mark the LP_REDIRECT item LP_DEAD instead,
while making the would-be LP_DEAD item skip straight to being an
LP_UNUSED item? That approach is strictly better. It just makes sense,
which leads me to believe that we must have always done it that way.
It would be nice to be able to say for sure that we have a simple
rule: "marking a heap-only tuple LP_DEAD is always not just
unnecessary, but wrong, because LP_DEAD items in heap pages are
supposed to have a 1:1 mapping with dead index tuples".

-- 
Peter Geoghegan



pgsql-bugs by date:

Previous
From: Juan José Santamaría Flecha
Date:
Subject: Re: BUG #17254: Crash with 0xC0000409 in pg_stat_statements when pg_stat_tmp\pgss_query_texts.stat exceeded 2GB.
Next
From: PG Bug reporting form
Date:
Subject: BUG #17260: Unable to Download Installer: Receiving Internal Server Error 500