Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Date
Msg-id 391181.1643987721@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs
Andres Freund <andres@anarazel.de> writes:
> On 2022-02-03 15:54:28 -0500, Tom Lane wrote:
>> I'm writing release notes and wondering what I can tell users about
>> how to detect or recover from this bug.  Is a REINDEX sufficient,
>> or is the presence of the bogus redirect item going to cause
>> persistent problems?

> Good questions.

> It's hard to answer whether there's any danger after a REINDEX. Afaics the
> build scan would just pick the "lower offset" version of the root
> pointer. Which should be fine.

> It's possible there could be trouble down the line, e.g. heap pruning doing
> something weird once starting in a corrupted state, that then leads REINDEX to
> do something bogus. The simple cases look OK, because a second visit/action by
> heap_prune_chain for one tid from two different root pointers would see
> ->marked[offnum] as true. It gets more complicated once multiple intermediary
> row versions are involved, because the intermediary row versions won't be in
> ->marked if an entire chain is pruned. But afaict that should still end up
> looking like a hot chain ending in an aborted tuple or such.

OK, I'll just recommend REINDEX.

> Except that it's not trivial to get right, I could see it being worthwhile to
> add verification of hot chains to amcheck, and backpatch that to 14.

I'd have thought that'd be a fundamental component of a heap check
module, so +1 for adding it.  Dunno about the back-patch part though.
It seems like a new feature.

            regards, tom lane



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #17394: pg_dump: query returned 0 rows instead of one:
Next
From: Andres Freund
Date:
Subject: Re: BUG #17391: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL tests fail on OpenBSD 7.0