Re: BUG #17245: Index corruption involving deduplicated entries - Mailing list pgsql-bugs

From Peter Geoghegan
Subject Re: BUG #17245: Index corruption involving deduplicated entries
Date
Msg-id CAH2-Wzk-4_raTzawWGaiqNvkpwDXxv3y1AQhQyUeHfkU=tFCeA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17245: Index corruption involving deduplicated entries  (Andres Freund <andres@anarazel.de>)
Responses Re: BUG #17245: Index corruption involving deduplicated entries  (Andres Freund <andres@anarazel.de>)
List pgsql-bugs
On Sat, Oct 30, 2021 at 1:42 PM Andres Freund <andres@anarazel.de> wrote:
> I think it probably is worth adding an error check someplace that verifies
> that problems of this kind will be detected with, uh, less effort.

The draft fix is something I wanted to get out quickly to signal that
the bug is definitely going to be fixed soon. I'll need to carefully
think about this some more on Monday, to make sure that it becomes
much harder to break in roughly the same way once again.

> I think it'd also be good to add a test that specifically verifies that
> parallel vacuum doesn't have a bug around "parallel worthy" and not "parallel
> worthy" indexes. It's too easy a mistake to make, and because visible
> corruption is delayed, it's likely that we won't detect such cases.

I agree.

> ISTM that at least a basic version of this is worth doing as a check throwing
> an ERROR, rather than an assertion. It's hard to believe this'd be a
> significant portion of the cost of heap_index_delete_tuples(), and I think it
> would help catch problems a lot earlier.

I like the idea of doing that, but only on master. I think that we
could do this as an ERROR, provided we avoid adding a new inner loop
to heap_index_delete_tuples() --- we can definitely afford that. And
we can even have a nice, detailed error message that blames a
particular tuple from a specific index page, pointing to a specific
heap TID.

Separately, we should add an assertion that catches cases where a TID
in the index points to an LP_REDIRECT line pointer, which does not
point to a heap tuple with storage. That check has much less practical
value, and necessitates adding a new inner loop, which is why an
assert seems good enough to me. (The patch I've posted doesn't have
any of this LP_REDIRECT business, but my v2 revision will.)

-- 
Peter Geoghegan



pgsql-bugs by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: BUG #17255: Server crashes in index_delete_sort_cmp() due to race condition with vacuum
Next
From: Andres Freund
Date:
Subject: Re: BUG #17245: Index corruption involving deduplicated entries