Re: [HACKERS] A design for amcheck heapam verification - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] A design for amcheck heapam verification
Date
Msg-id CAH2-WzmVKiwcNrhYFH9CTLLcmQTMH_xjW=AvxfDKAftmY47QKw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] A design for amcheck heapam verification  (Andrey Borodin <x4mmm@yandex-team.ru>)
List pgsql-hackers
On Thu, Jan 11, 2018 at 2:14 AM, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> I like heapam verification functionality and use it right now. So, I'm planning to provide review for this patch,
probably,this week.
 

Great!

> Seems like new check is working 4 orders of magnitudes faster then bt_index_parent_check() and still finds my
specificerror that bt_index_check() missed.
 
> From this output I see that there is corruption, but cannot understand:
> 1. What is the scale of corruption
> 2. Are these corruptions related or not

I don't know the answer to either question, and I don't think that
anyone else could provide much more certainty than that, at least when
it comes to the general case. I think it's important to remember why
that is.

When amcheck raises an error, that really should be a rare,
exceptional event. When I ran amcheck on Heroku's platform, that was
what we found - it tended to be some specific software bug in all
cases (turns out that Amazon's EBS is very reliable in the last few
years, at least when it comes to avoiding silent data corruption). In
general, the nature of those problems was very difficult to predict.

The PostgreSQL project strives to provide a database system that never
loses data, and I think that we generally do very well there. It's
probably also true that (for example) Yandex have some very good DBAs,
that take every reasonable step to prevent data loss (validating
hardware, providing substantial redundancy at the storage level, and
so on). We trust the system, and you trust your own operational
procedures, and for the most part everything runs well, because you
(almost) think of everything.

I think that running amcheck at scale is interesting because its very
general approach to validation gives us an opportunity to learn *what
we were wrong about*. Sometimes the reasons will be simple, and some
times they'll be complicated, but they'll always be something that we
tried to account for in some way, and just didn't think of, despite
our best efforts. I know that torn pages can happen, which is a kind
of corruption -- that's why crash recovery replays FPIs. If I knew
what problems amcheck might find, then I probably would have already
found a way to prevent them from happening in the first place - there
are limits to what we can predict. (Google "Ludic fallacy" for more
information on this general idea.)

I try to be humble about these things. Very complicated systems can
have very complicated problems that stay around for a long time
without being discovered. Just ask Intel. While it might be true that
some people will use amcheck as the first line of defense, I think
that it makes much more sense as the last line of defense. So, to
repeat myself -- I just don't know.

> I think an interface to list all or top N error could be useful.

I think that it might be useful if you could specify a limit on how
many errors you'll accept before giving up. I think that it's likely
less useful than you think, though. Once amcheck detects even a single
problem, all bets are off. Or at least any prediction that I might try
to give you now isn't worth much. Theoretically, amcheck should
*never* find any problem, which is actually what happens in the vast
majority of real world cases. When it does find a problem, there
should be some new lesson to be learned. If there isn't some new
insight, then somebody somewhere is doing a bad job.

-- 
Peter Geoghegan


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pgsql: Move handling of database properties from pg_dumpall into pg_dum
Next
From: Stephen Frost
Date:
Subject: Re: [HACKERS] PoC plpgsql - possibility to force custom or genericplan