Re: Online verification of checksums - Mailing list pgsql-hackers

From Michael Banck
Subject Re: Online verification of checksums
Date
Msg-id 1549366193.796.9.camel@credativ.de
Whole thread Raw
In response to Re: Online verification of checksums  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Online verification of checksums
List pgsql-hackers
Hi,

Am Dienstag, den 05.02.2019, 11:30 +0100 schrieb Tomas Vondra:
> On 2/5/19 8:01 AM, Andres Freund wrote:
> > On 2019-02-05 06:57:06 +0100, Fabien COELHO wrote:
> > > > > > I'm wondering (possibly again) about the existing early exit if one block
> > > > > > cannot be read on retry: the command should count this as a kind of bad
> > > > > > block, proceed on checking other files, and obviously fail in the end, but
> > > > > > having checked everything else and generated a report. I do not think that
> > > > > > this condition warrants a full stop. ISTM that under rare race conditions
> > > > > > (eg, an unlucky concurrent "drop database" or "drop table") this could
> > > > > > happen when online, although I could not trigger one despite heavy testing,
> > > > > > so I'm possibly mistaken.
> > > > > 
> > > > > This seems like a defensible judgement call either way.
> > > > 
> > > > Right now we have a few tests that explicitly check that
> > > > pg_verify_checksums fail on broken data ("foo" in the file).  Those
> > > > would then just get skipped AFAICT, which I think is the worse behaviour
> > > > , but if everybody thinks that should be the way to go, we can
> > > > drop/adjust those tests and make pg_verify_checksums skip them.
> > > > 
> > > > Thoughts?
> > > 
> > > My point is that it should fail as it does, only not immediately (early
> > > exit), but after having checked everything else. This mean avoiding calling
> > > "exit(1)" here and there (lseek, fopen...), but taking note that something
> > > bad happened, and call exit only in the end.
> > 
> > I can see both as being valuable (one gives you a more complete picture,
> > the other a quicker answer in scripts). For me that's the point where
> > it's the prerogative of the author to make that choice.

Personally, I would prefer to keep it as simple as possible for now and
get this patch committed; in my opinion the behaviour is already like
this (early exit on corrupt files) so I don't think the online
verification patch should change this.

If we see complaints about this, then I'd be happy to change it
afterwards.

> Why not make this configurable, using a command-line option?

I like this even less - this tool is about verifying checksums, so
adding options on what to do when it encounters broken pages looks out-
of-scope to me.  Unless we want to say it should generally abort on the
first issue (i.e. on wrong checksums as well).


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: What happens if checkpoint haven't completed until the nextcheckpoint interval or max_wal_size?
Next
From: Michael Paquier
Date:
Subject: Re: Feature: temporary materialized views