Re: Online verification of checksums - Mailing list pgsql-hackers

From Michael Banck
Subject Re: Online verification of checksums
Date
Msg-id 1537974927.3800.41.camel@credativ.de
Whole thread Raw
In response to Re: Online verification of checksums  (Stephen Frost <sfrost@snowman.net>)
Responses Re: Online verification of checksums
Re: Online verification of checksums
Re: Online verification of checksums
List pgsql-hackers
Hi,

Am Mittwoch, den 26.09.2018, 10:54 -0400 schrieb Stephen Frost:
> * Michael Banck (michael.banck@credativ.de) wrote:
> > Am Mittwoch, den 26.09.2018, 13:23 +0200 schrieb Fabien COELHO:
> > > There are debatable changes of behavior:
> > > 
> > >     if (errno == ENOENT) return / continue...
> > > 
> > > For instance, a file disappearing is ok online, but not so if offline. On 
> > > the other hand, the probability that a file suddenly disappears while the 
> > > server offline looks remote, so reporting such issues does not seem 
> > > useful.
> > > 
> > > However I'm more wary with other continues/skips added. ISTM that skipping 
> > > a block because of a read error, or because it is new, or some other 
> > > reasons, is not the same thing, so should be counted & reported 
> > > differently?
> > 
> > I think that would complicate things further without a lot of benefit.
> > 
> > After all, we are interested in checksum failures, not necessarily read
> > failures etc. so exiting on them (and skip checking possibly large parts
> > of PGDATA) looks undesirable to me.
> > 
> > So I have done no changes in this part so far, what do others think
> > about this?
> 
> I certainly don't see a lot of point in doing much more than what was
> discussed previously for 'new' blocks (counting them as skipped and
> moving on).
> 
> An actual read() error (that is, a failure on a read() call such as
> getting back EIO), on the other hand, is something which I'd probably
> report back to the user immediately and then move on, and perhaps
> report again at the end.
> 
> Note that a short read isn't an error and falls under the 'new' blocks
> discussion above.

So I've added ENOENT checks when opening or statting files, i.e. EIO
would still be reported.

The current code in master exits on reads which do not return BLCKSZ,
which I've changed to a skip. So that means we now no longer check for
read failures (return code < 0) so I have now added a check for that and
emit an error message and return.

New version 5 attached.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz
Attachment

pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: Online verification of checksums
Next
From: Fabien COELHO
Date:
Subject: Re: Online verification of checksums