Re: new heapcheck contrib module - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: new heapcheck contrib module
Date
Msg-id CAH2-WznUe3q2yF-w=wtuDWzm4KQAusKdSeRE3T8_+tcQS-CHdQ@mail.gmail.com
Whole thread Raw
In response to Re: new heapcheck contrib module  (Mark Dilger <mark.dilger@enterprisedb.com>)
List pgsql-hackers
On Mon, Apr 20, 2020 at 12:40 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> Ok, I'll work in that direction and repost when I have something along those lines.

Great, thanks!

It also occurs to me that the B-Tree checks that amcheck already has
have one remaining blindspot: While the heapallindexed verification
option has the ability to detect the absence of an index tuple that
the dummy CREATE INDEX that we perform under the hood says should be
in the index, it cannot do the opposite: It cannot detect the presence
of a malformed tuple that shouldn't be there at all, unless the index
tuple itself is corrupt. That could miss an inconsistent page image
when a few tuples have been VACUUMed away, but still appear in the
index.

In order to do that, we'd have to have something a bit like the
validate_index() heap scan that CREATE INDEX CONCURRENTLY uses. We'd
have to get a list of heap TIDs that any index tuple might be pointing
to, and then make sure that there were no TIDs in the index that were
not in that list -- tuples that were pointing to nothing in the heap
at all. This could use the index_bulk_delete() interface. This is the
kind of verification option that I might work on for debugging
purposes, but not the kind of thing I could really recommend to
ordinary users outside of exceptional cases. This is the kind of thing
that argues for more or less providing all of the verification
functionality we have through both high level and low level
interfaces. This isn't likely to be all that valuable most of the
time, and users shouldn't have to figure that out for themselves the
hard way. (BTW, I think that this could be implemented in an
index-AM-agnostic way, I think, so perhaps you can consider adding it
too, if you have time.)

One last thing for now: take a look at amcheck's
bt_tuple_present_callback() function. It has comments about HOT chain
corruption that you may find interesting. Note that this check played
a role in the "freeze the dead" corruption bug [1] -- it detected that
our initial fix for that was broken. It seems like it would be a good
idea to go back through the reproducers we've seen for some of the
more memorable corruption bugs, and actually make sure that your tool
detects them where that isn't clear. History doesn't repeat itself,
but it often rhymes.

[1] https://postgr.es/m/CAH2-Wznm4rCrhFAiwKPWTpEw2bXDtgROZK7jWWGucXeH3D1fmA@mail.gmail.com
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: design for parallel backup
Next
From: Alvaro Herrera
Date:
Subject: Re: Poll: are people okay with function/operator table redesign?