Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum - Mailing list pgsql-bugs

From Peter Geoghegan
Subject Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date
Msg-id CAH2-Wzm2MMHm+cqZtibp9kyC4hibq8nKz6t0Ag=ooegRUaLhMw@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs
On Wed, Nov 10, 2021 at 5:43 PM Andres Freund <andres@anarazel.de> wrote:
> > Actually, I was more worried about versions before Postgres 14 here --
> > versions that don't have that new DELETE_IN_PROGRESS pruning behavior
> > you mentioned.
>
> I think we're actually saying the same thing. I.e. that we, so far, don't have
> a concrete reason to worry about pre 14 versions.

Mostly, yes. I think that it's not a good idea to backpatch a fix
beyond 14 at this time.

All I was saying beyond that was: If there was a problem before 14,
then it would probably be a problem with the
"HEAPTUPLE_INSERT_IN_PROGRESS becomes DEAD" case. And, it's probably
also more likely that it would be "leaking DEAD tuples", since that is
much less obvious than the bug we know about, in 14. In general
problems with "incorrectly broken" HOT chains are very subtle, and
less well understood than other problems.

> Wouldn't it immediately corrected when called from vacuum due to the DEAD
> check? Afaict this is a one-way ratched, so the next prune is guaranteed to
> get it?

I believe that's true, but I still worry just a little about extreme
cases. I'm not concerned enough to advocate backpatch beyond 14 (not
even close!), at least for now. And so my concern is pretty abstract
and theoretical at this point.

It'll be nice to obviously not have the theoretical problem on 14,
which needs a fix for the non-theoretical problem anyway -- AFAICT
there is no extra risk anyway.

> But why would it be unreachable forever? The next hot prune will see
> !HeapTupleHeaderIsHotUpdated() and process it via the "If the tuple is DEAD
> and doesn't chain to anything else" block?

I agree.

> > You probably noticed that my patch does *not* have the same
> > "!HeapTupleHeaderIsHotUpdated()" check for these
> > unreachable-via-HOT-chain heap-only tuples. The mere fact that they're
> > unreachable is a good enough reason to consider them DEAD (and so mark
> > them LP_UNUSED). We do check heap_prune_satisfies_vacuum() for these
> > tuples too, but that is theoretically unnecessary. This feels pretty
> > water tight to me -- contradictory interpretations of what heap-only
> > tuple is and is in what HOT chain (if any) now seem impossible.
>
> Hm. I guess you *have* to actually process them regardless of
> !HeapTupleHeaderIsHotUpdated() with the new approach, because otherwise we'd
> potentially only process only the tail item of a disconnected chain in each
> heap_hot_prune() call?

That's true.

> Although that might be unreachable due to the HTSV
> calls likely breaking the chain in all disconnected cases (but perhaps not
> with multixacts, there could be remaining lockers).

Also true. But I like that we don't really have to worry about it with
the fix in place.

If we cannot reach a heap-only tuple via a valid HOT chain, it's
already effectively DEAD, anyway -- all bets are off. We could even
add a "can't happen" ERROR to catch the case where HTSV disagrees.
Unsure if that would be worth it, but it's certainly worth an
assertion -- which I've already added.

What do you think of the idea of a "can't happen" ERROR here?

-- 
Peter Geoghegan



pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Next
From: Andres Freund
Date:
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum