Re: Online verification of checksums - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: Online verification of checksums
Date
Msg-id alpine.DEB.2.21.1809261703520.22248@lancre
Whole thread Raw
In response to Re: Online verification of checksums  (Michael Banck <michael.banck@credativ.de>)
List pgsql-hackers
>> The patch is missing a documentation update.
>
> I've added that now. I think the only change needed was removing the
> "server needs to be offline" part?

Yes, and also checking that the described behavior correspond to the new 
version.

>> There are debatable changes of behavior:
>>
>>     if (errno == ENOENT) return / continue...
>>
>> For instance, a file disappearing is ok online, but not so if offline. On
>> the other hand, the probability that a file suddenly disappears while the
>> server offline looks remote, so reporting such issues does not seem
>> useful.
>>
>> However I'm more wary with other continues/skips added. ISTM that skipping
>> a block because of a read error, or because it is new, or some other
>> reasons, is not the same thing, so should be counted & reported
>> differently?
>
> I think that would complicate things further without a lot of benefit.
>
> After all, we are interested in checksum failures, not necessarily read
> failures etc. so exiting on them (and skip checking possibly large parts
> of PGDATA) looks undesirable to me.

Hmmm.

I'm really saying that it is debatable, so here is some fuel to the 
debate:

If I run the check command and it cannot do its job, there is a problem 
which is as bad as a failing checksum. The only safe assumption on a 
cannot-read block is that the checksum is bad... So ISTM that on 
on some of the "skipped" errors there should be appropriate report (exit 
code, final output) that something is amiss.

-- 
Fabien.


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Allowing printf("%m") only where it actually works
Next
From: Michael Banck
Date:
Subject: Re: Online verification of checksums