Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum - Mailing list pgsql-bugs

From Peter Geoghegan
Subject Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date
Msg-id CAH2-WzmeBU-WgDY_X=EAGkLf4bw8gPRB62HRvt++JHU3TsERgA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Dmitry Dolgov <9erthalion6@gmail.com>)
List pgsql-bugs
On Sat, Nov 13, 2021 at 7:05 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> I've got curious if modifying the Alexander's test case could reveal
> something interesting, and sprinkled it with savepoints and rollbacks.
> Almost immediately a new problem has manifested itself, although the
> crash has nothing to do with the disconnected tuples as far as I can
> tell -- still probably worth mentioning. In this case vacuum invoked
> lazy_scan_prune, and during the first scan one of the chains had a
> HEAPTUPLE_DEAD at the third position. The processing flow fell through
> to heap_prune_record_prunable and crashed on an assert with an
> InvalidTransactionId:

Is this just with the bugfix applied? I think that it is. Looks like a
minor bug to me.

I think that I need to consistently "break" in the DEAD case, to avoid
ending up here. In other words, it should not literally be
"reinterpreted" as RECENTLY_DEAD by falling through in the switch
statement (though the concept of reinterpreting certain DEAD tuples as
RECENTLY_DEAD still seems perfectly sound).

Here's why the assertion (invalid xmax/update xid cannot be used in
heap_prune_record_prunable() call) fails:

DEAD means that you might not have a valid update XID -- aborted
update is what we expect. But RECENTLY_DEAD means that there must have
been a deleter xact, and that the xact must have committed (can't have
been that the inserter aborted). This is a consequence of the fact
that the tuple is at least still visible to somebody (or could be),
unlike in the DEAD case. And so xmax must be a valid XID, and so the
existing RECENTLY_DEAD case handling can legitimately always expect
that. But I cannot (and should not) allow a call to
heap_prune_record_prunable() with a DEAD-to-HTSV tuple, even when I
"reinterpret" it as RECENTLY_DEAD in order to make a clean
determination of the tuple to delete up until for the entire HOT
chain.

-- 
Peter Geoghegan



pgsql-bugs by date:

Previous
From: Dmitry Dolgov
Date:
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Next
From: Thomas Munro
Date:
Subject: Re: BUG #17284: Assert failed in SerialAdd() when the summarize_serial mode is engaged