Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum - Mailing list pgsql-bugs

From Peter Geoghegan
Subject Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date
Msg-id CAH2-WzmNk6V6tqzuuabxoxM8HJRaWU6h12toaS-bqYcLiht16A@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
List pgsql-bugs
On Thu, Nov 11, 2021 at 9:46 PM Peter Geoghegan <pg@bowt.ie> wrote:
> I wonder if we're approaching this business with "RECENTLY_DEAD can be
> upgraded to DEAD" in entirely the wrong way. Why not just not do that
> at all anymore, on the off chance that it's unsafe? Why even take a
> small chance? Our decision has to work at the level of the whole
> entire HOT chain, and it seems to me that we should make that as
> simple as possible.

Attached revision does it that way.

It also addresses the separate issue of DEAD vs RECENTLY_DEAD
disconnected tuples -- that was the other unresolved question. This
revision takes a harder line on the state of disconnected heap-only
tuples. Andres said that he doesn't know for sure that disconnected
heap-only tuples cannot be DELETE/INSERT_IN_PROGRESS -- "I'm not
actually sure the Assert is unreachable. I can imagine cases where
we'd see e.g. DELETE/INSERT_IN_PROGRESS due to a concurrent
subtransaction abort or such". But I don't see how that's possible. In
fact, I don't even see how it's possible for these items to be
RECENTLY_DEAD -- I think that they must always be DEAD (or we're in
big trouble anyway).

These are not just any heap-only tuples. They're heap-only tuples that
cannot possibly be accessed from a HOT chain. And so it's just
physically impossible for them to be returned by index scans -- this
is a certainty. How could they not be DEAD, in every possible sense?
How could they not come from an aborted transaction, specifically?

Naturally, I also went through the exercise of trying to find a
counterexample, where pruning doesn't see a disconnected tuple as DEAD
in its HTSV. I could not get the assertion to fail with Alexander's
test case, nor with make check-world. If the assertion did fail, then
I imagine that that would have to be due to a problem elsewhere -- it
couldn't be a problem with the "disconnected heap-only tuples must
already be DEAD to HTSV" assumption itself. That is one of the few
things about this patch that *isn't* complicated.

-- 
Peter Geoghegan

Attachment

pgsql-bugs by date:

Previous
From: "Euler Taveira"
Date:
Subject: Re: Logical Replication not working for few Tables
Next
From: Andres Freund
Date:
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum