Re: BUG #17212: pg_amcheck fails on checking temporary relations - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: BUG #17212: pg_amcheck fails on checking temporary relations
Date
Msg-id CAH2-Wzkwb+1GSONMrXByHhgkzD-N59A_kFcvsoA_+M9cNNHm5g@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17212: pg_amcheck fails on checking temporary relations  (Mark Dilger <mark.dilger@enterprisedb.com>)
Responses Re: BUG #17212: pg_amcheck fails on checking temporary relations  (Mark Dilger <mark.dilger@enterprisedb.com>)
Re: BUG #17212: pg_amcheck fails on checking temporary relations  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Mon, Oct 4, 2021 at 3:36 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote:
> >> I believe this is a bug in amcheck's btree checking functions.  Peter, can you take a look?
> >
> > Why do you say that?
>
> Because REINDEX CONCURRENTLY and the bt_index_parent_check() function seem to have lock upgrade hazards that are
unrelatedto pg_amcheck. 

The problem with that argument is that the bt_index_parent_check()
function isn't doing anything particularly special, apart from
dropping the lock. That has been its behavior for many years now.

> On to pg_amcheck's behavior....
>
> I see no evidence in the OP's complaint that pg_amcheck is misbehaving.  It launches a worker to check each relation,
printsfor the user's benefit any errors those checks raise, and finally returns 0 if they all pass and 2 otherwise.
Sincenot all relations could be checked, 2 is returned.  Returning 0 would be misleading, as it implies everything was
checkedand passed, and it can't honestly say that.  The return value 2 does not mean that anything failed.  It means
thatnot all checks passed.  When a 2 is returned, the user is expected to read the output and decide what, if anything,
theywant to do about it.  In this case, the user might decide to wait until the reindex finishes and check again, or
theymight decide they don't care. 
>
> It is true that pg_amcheck is calling bt_index_parent_check() on an invalid index, but so what?  If it chose not to
doso, it would still need to print a message about the index being unavailable for checking, and it would still have to
return2. 

Why would it have to print such a message? You seem to be presenting
this as if there is some authoritative, precise, relevant definition
of "the relations that pg_amcheck sees". But to me the relevant
details look arbitrary at best.

> It can't return 0, and it is unhelpful to leave the user in the dark about the fact that not all indexes are in the
rightstate for checking. 

Why is that unhelpful? More to the point, *why* would this alternative
behavior constitute "leaving the user in the dark"?

What about the case where the pg_class entry isn't visible to our MVCC
snapshot? Why is "skipping" such a relation not just as unhelpful?

> I think this bug report is really a feature request.  The OP appears to want an option to toggle on/off the printing
ofsuch information, perhaps with not printing it as the default. 

And I don't understand why you think that clearly-accidental
implementation details (really just bugs) should be treated as
axiomatic truths about how pg_amcheck must work. Should we now "fix"
pg_dump so that it matches pg_amcheck?

All of the underlying errors are cases that were clearly intended to
catch user error -- every single one. But apparently pg_amcheck is
incapable of error, by definition. Like HAL 9000.

--
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: BUG #17212: pg_amcheck fails on checking temporary relations
Next
From: Mark Dilger
Date:
Subject: Re: BUG #17212: pg_amcheck fails on checking temporary relations