Thread: amcheck prototype

amcheck prototype

From
Peter Geoghegan
Date:
Attached is a revision of what I previously called btreecheck, which
is now renamed to amcheck.

This is not 9.5 material - I already have 3 bigger patches in the
queue, 2 of which are large and complex and have major controversies,
and one of which has details that need to be worked out, which is
currently consuming a lot of reviewer time. There seems to be little
point in trying to get amcheck into shape for 9.5. The goals for this
as a real patch need to be worked out in greater detail. At some point
we'll need to have a discussion around both stress-testing (as a way
of finding bugs) and allowing users to verify indexes on production
systems when corruption is suspected. Since, as far as I know, no one
else has so much as applied and compiled my ON CONFLICT UPDATE patch,
it would be pretty senseless of me to add another patch to the queue.
Reviewers are clearly more overburdened than ever.

Anyway, this revision adds the ability to check invariants across
pages (that a page's right-link comports with the target page's last
item, since when targeting a particular page there is no locally
available "next" item to check the last item against, other than the
page highkey).  This even occurs for the index check user callable SQL
function that only acquire an AccessShareLock (bt_index_verify() and
bt_page_verify()). As before, it also exhaustively tests certain other
related invariants previously described [1], without really
considering their plausibility as either bugs in the B-Tree code, or
things likely to be violated in the event of organic data corruption.
In other words, I could probably stand to be considerably more
selective in what I'm testing, but in order to do that I'd need to
make up my mind about my exact goals for this tool.

amcheck is something that I thought might find bugs in approach #1 to
value locking [2] (for the ON CONFLICT UPDATE patch). However,
extensive stress testing while constantly using the tool has not
revealed any bugs. That doesn't mean that they're not there, of
course, and it doesn't really alter our understanding of approach #1,
but it's worth mentioning.

Anyway, this is presented here in the hope that it will be useful for
testing other patches, and perhaps even in testing corruption on
production systems (with appropriate precautions taken - this is still
a prototype patch - but it's also still the only thing of its kind). I
post this with the expectation that it won't make it into contrib
until PostgreSQL 9.6, or whatever we end up calling it. It might be
that someone has some feedback that allows me to build a better
temporary prototype (certainly, some testing tools were maintained out
of git for a while in the past, such as the precursor to isolation
tester), but I don't expect even that. If no one wants to do anything
with this patch in the foreseeable future (probably the current
cycle), there may still be some value in dumping my progress here. As
I said, I tend to think that its biggest problem right now is that
it's just too scatter gun, but that's probably appropriate for an
early iteration.

In general, I think we could prevent a lot of bugs by performing
targeted stress-testing with custom tools. Ideally, this tool would go
on to provide a way of doing for several different areas of the code.

[1] http://www.postgresql.org/message-id/CAM3SWZRtV+xmRWLWq6c-x7czvwavFdwFi4St1zz4dDgFH4yN4g@mail.gmail.com
[2] https://wiki.postgresql.org/wiki/Value_locking#.231._Heavyweight_page_locking_.28Peter_Geoghegan.29
--
Peter Geoghegan

Attachment

Re: amcheck prototype

From
Peter Geoghegan
Date:
On Wed, Nov 19, 2014 at 2:09 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Attached is a revision of what I previously called btreecheck, which
> is now renamed to amcheck.

Whoops. I left in modifications to pg_config_manual.h to build with
Valgrind support. Here is a version without that.

--
Peter Geoghegan

Attachment

Re: amcheck prototype

From
Peter Geoghegan
Date:
On Wed, Nov 19, 2014 at 2:09 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Attached is a revision of what I previously called btreecheck, which
> is now renamed to amcheck.

This never really went anywhere, because as a project I don't think
that it has very crisp goals. My sense is that it could be developed
in a new direction, with the goal of finding bugs in the master
branch. This seems like something that could be possible without a
large additional effort; committing the tool itself can come later.

Right now, the code that is actually tested by the tool isn't
particularly likely to have bugs. I used a slightly revised version to
constantly verify B-Trees as the regression tests are run. That didn't
catch anything, but since the tool doesn't consult the heap at all I'm
not surprised. Also, I didn't incorporate any testing of recovery with
that stress test.

I wrote amcheck with the assumption that it is useful to have a tool
that verifies several nbtree invariants, a couple of which are fairly
elaborate. amcheck *is* probably useful for detecting corruption due
to hardware failure and so on today, but that is another problem
entirely.

-- 
Peter Geoghegan