Re: new heapcheck contrib module - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: new heapcheck contrib module
Date
Msg-id CAH2-Wznp9-dfeXUA8Gz3A2Ua3MqqkGry3_RKFrDi0G5xq602rQ@mail.gmail.com
Whole thread Raw
In response to Re: new heapcheck contrib module  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Mon, Apr 20, 2020 at 12:42 PM Andres Freund <andres@anarazel.de> wrote:
> This is something we really really really need. I'm very excited to see
> progress!

+1

My experience with amcheck was that the requirement that we document
and verify pretty much every invariant (the details of which differ
slightly based on the B-Tree version in use) has had intangible
benefits. It helped me come up with a simpler, better design in the
first place. Also, many of the benchmarks that I perform get to be a
stress-test of the feature itself. It saves quite a lot of testing
work in the long run.

> I wonder if a mode where heapcheck optionally would only checks
> non-frozen (perhaps also non-all-visible) regions of a table would be a
> good idea? Would make it a lot more viable to run this regularly on
> bigger databases. Even if there's a window to not check some data
> (because it's frozen before the next heapcheck run).

That's a great idea. It could also make it practical to use the
rootdescend verification option to verify indexes selectively -- if
you don't have too many blocks to check on average, the overhead is
tolerable. This is the kind of thing that naturally belongs in the
higher level interface that I sketched already.

> We also had a *lot* of bugs that we'd have found a lot earlier, possibly
> even during development, if we had a way to easily perform these checks.

I can think of a case where it was quite unclear what the invariants
for the heap even were, at least temporarily. And this was in the
context of fixing a bug that was really quite nasty. Formally defining
the invariants in one place, and taking a position on exactly what
correct looks like seems like a very valuable exercise. Even without
the tool catching a single bug.

> I have a hard time believing this is going to be really
> reliable. E.g. the alignment requirements will vary between platforms,
> leading to different layouts. In particular, MAXALIGN differs between
> platforms.

Over on another thread, I suggested that Mark might want to have a
corruption test framework that exposes some of the bufpage.c routines.
The idea is that you can destructively manipulate a page using the
logical page interface. Something that works one level below the
access method, but one level above the raw page image. It probably
wouldn't test everything that Mark wants to test, but it would test
some things in a way that seems maintainable to me.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: design for parallel backup
Next
From: Robert Haas
Date:
Subject: Re: new heapcheck contrib module