On Sat, Nov 13, 2021 at 7:05 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> I've got curious if modifying the Alexander's test case could reveal
> something interesting, and sprinkled it with savepoints and rollbacks.
> Almost immediately a new problem has manifested itself, although the
> crash has nothing to do with the disconnected tuples as far as I can
> tell -- still probably worth mentioning. In this case vacuum invoked
> lazy_scan_prune, and during the first scan one of the chains had a
> HEAPTUPLE_DEAD at the third position. The processing flow fell through
> to heap_prune_record_prunable and crashed on an assert with an
> InvalidTransactionId:
Is this just with the bugfix applied? I think that it is. Looks like a
minor bug to me.
I think that I need to consistently "break" in the DEAD case, to avoid
ending up here. In other words, it should not literally be
"reinterpreted" as RECENTLY_DEAD by falling through in the switch
statement (though the concept of reinterpreting certain DEAD tuples as
RECENTLY_DEAD still seems perfectly sound).
Here's why the assertion (invalid xmax/update xid cannot be used in
heap_prune_record_prunable() call) fails:
DEAD means that you might not have a valid update XID -- aborted
update is what we expect. But RECENTLY_DEAD means that there must have
been a deleter xact, and that the xact must have committed (can't have
been that the inserter aborted). This is a consequence of the fact
that the tuple is at least still visible to somebody (or could be),
unlike in the DEAD case. And so xmax must be a valid XID, and so the
existing RECENTLY_DEAD case handling can legitimately always expect
that. But I cannot (and should not) allow a call to
heap_prune_record_prunable() with a DEAD-to-HTSV tuple, even when I
"reinterpret" it as RECENTLY_DEAD in order to make a clean
determination of the tuple to delete up until for the entire HOT
chain.
--
Peter Geoghegan