Re: new heapcheck contrib module - Mailing list pgsql-hackers

From Robert Haas
Subject Re: new heapcheck contrib module
Date
Msg-id CA+TgmobahNSY7AAGpgoTgNkTtOaQm6i45GWH2FYWkOyY6oXiOg@mail.gmail.com
Whole thread Raw
In response to Re: new heapcheck contrib module  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: new heapcheck contrib module  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Mon, Aug 3, 2020 at 1:16 PM Peter Geoghegan <pg@bowt.ie> wrote:
> If you really wanted to do this,
> you'd have to describe a practical scenario under which it made sense
> to soldier on, where we'd definitely be able to count the number of
> problems in a meaningful way, without much risk of either massively
> overcounting or undecounting inconsistencies.

I completely agree. You have to have a careful plan to make this sort
of thing work - you want to skip checking the things that are
dependent on the part already determined to be bad, without skipping
everything. You need a strategy for where and how to restart checking,
first bypassing whatever needs to be skipped.

> Consider how the search in verify_ntree.c actually works at a high
> level. If you thoroughly corrupted one B-Tree leaf page (let's say you
> replaced it with an all-zero page image), all pages to the right of
> the page would be fundamentally inaccessible to the left-to-right
> level search that is coordinated within
> bt_check_level_from_leftmost(). And yet, most real index scans can
> still be expected to work. How do you know to skip past that one
> corrupt leaf page (by going back to the parent to get the next sibling
> leaf page) during index verification? That's what it would take to do
> this in the general case, I guess.

In that particular example, you would want the function that verifies
that page to return some indicator. If it finds that two keys in the
page are out-of-order, it tells the caller that it can still follow
the right-link. But if it finds that the whole page is garbage, then
it tells the caller that it doesn't have a valid right-link and the
caller's got to do something else, like give up on the rest of the
checks or (better) try to recover a pointer to the next page from the
parent.

> More fundamentally, I wonder how
> many inconsistencies one should imagine that this index has, before we
> even get into talking about the implementation.

I think we should try not to imagine anything in particular. Just to
be clear, I am not trying to knock what you have; I know it was a lot
of work to create and it's a huge improvement over having nothing. But
in my mind, a perfect tool would do just what a human being would do
if investigating manually: assume initially that you know nothing -
the index might be totally fine, mildly corrupted in a very localized
way, completely hosed, or anything in between. And it would
systematically try to track that down by traversing the usable
pointers that it has until it runs out of things to do. It does not
seem impossible to build a tool that would allow us to take a big
index and overwrite a random subset of pages with garbage data and
have the tool tell us about all the bad pages that are still reachable
from the root by any path. If you really wanted to go crazy with it,
you could even try to find the bad pages that are not reachable from
the root, by doing a pass after the fact over all the pages that you
didn't otherwise reach. It would be a lot of work to build something
like that and maybe not the best use of time, but if I got to wave
tools into existence using my magic wand, I think that would be the
gold standard.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: WIP: BRIN multi-range indexes
Next
From: Alexander Korotkov
Date:
Subject: Re: LSM tree for Postgres