Re: Online verification of checksums - Mailing list pgsql-hackers

From Michael Banck
Subject Re: Online verification of checksums
Date
Msg-id 1553031556.9697.59.camel@credativ.de
Whole thread Raw
In response to Re: Online verification of checksums  (Andres Freund <andres@anarazel.de>)
Responses Re: Online verification of checksums
List pgsql-hackers
Hi,

Am Dienstag, den 19.03.2019, 13:00 -0700 schrieb Andres Freund:
> On 2019-03-20 03:27:55 +0800, Stephen Frost wrote:
> > On Tue, Mar 19, 2019 at 23:59 Andres Freund <andres@anarazel.de> wrote:
> > > On 2019-03-19 16:52:08 +0100, Michael Banck wrote:
> > > > Am Dienstag, den 19.03.2019, 11:22 -0400 schrieb Robert Haas:
> > > > > It's torn pages that I am concerned about - the server is writing and
> > > > > we are reading, and we get a mix of old and new content.  We have been
> > > > > quite diligent about protecting ourselves from such risks elsewhere,
> > > > > and checksum verification should not be held to any lesser standard.
> > > > 
> > > > If we see a checksum failure on an otherwise correctly read block in
> > > > online mode, we retry the block on the theory that we might have read a
> > > > torn page.  If the checksum verification still fails, we compare its LSN
> > > > to the LSN of the current checkpoint and don't mind if its newer.  This
> > > > way, a torn page should not cause a false positive either way I
> > > > think?.
> > > 
> > > False positives, no. But there's plenty potential for false
> > > negatives. In plenty clusters a large fraction of the pages is going to
> > > be touched in most checkpoints.
> > 
> > 
> > How is it a false negative?  The page was in the middle of being
> > written,
> 
> You don't actually know that. It could just be random gunk in the LSN,
> and this type of logic just ignores such failures as long as the random
> gunk is above the system's LSN.

Right, I think this needs to be taken into account. For pg_basebackup,
that'd be an additional check for GetRedoRecPtr() or something 
in the below check:

[...]

> Well, I don't know what to tell you. But:
> 
>                 /*
>                  * Only check pages which have not been modified since the
>                  * start of the base backup. Otherwise, they might have been
>                  * written only halfway and the checksum would not be valid.
>                  * However, replaying WAL would reinstate the correct page in
>                  * this case. We also skip completely new pages, since they
>                  * don't have a checksum yet.
>                  */
>                 if (!PageIsNew(page) && PageGetLSN(page) < startptr)
>                 {
> 
> doesn't consider plenty scenarios, as pointed out above.  It'd be one
> thing if the concerns I point out above were actually commented upon and
> weighed not substantial enough (not that I know how). But...
> 

> > Do you have any example cases where the code in pg_basebackup has resulted
> > in either a false positive or a false negative?  Any case which can be
> > shown to result in either?
> 
> CREATE TABLE corruptme AS SELECT g.i::text AS data FROM generate_series(1, 1000000) g(i);
> SELECT pg_relation_size('corruptme');
> postgres[22890][1]=# SELECT current_setting('data_directory') || '/' || pg_relation_filepath('corruptme');
> ┌─────────────────────────────────────┐
> │              ?column?               │
> ├─────────────────────────────────────┤
> │ /srv/dev/pgdev-dev/base/13390/16384 │
> └─────────────────────────────────────┘
> (1 row)
> dd if=/dev/urandom of=/srv/dev/pgdev-dev/base/13390/16384 bs=8192 count=1 conv=notrunc
> 
> Try a basebackup and see how many times it'll detect the corrupt
> data. In the vast majority of cases you're going to see checksum
> failures when reading the data for normal operation, but not when using
> basebackup (or this new tool).

Right, see above.

> At the very very least this would need to do
> 
> a) checks that the page is all zeroes if PageIsNew() (like
>    PageIsVerified() does for the backend). That avoids missing cases
>    where corruption just zeroed out the header, but not the whole page.

We can't run pg_checksum_page() on those afterwards though as it would
fire an assertion:

|pg_checksums: [...]/../src/include/storage/checksum_impl.h:194:
|pg_checksum_page: Assertion `!(((PageHeader) (&cpage->phdr))->pd_upper
|== 0)' failed.

But we should count it as a checksum error and generate an appropriate
error message in that case.

> b) Check that pd_lsn is between startlsn and the insertion pointer. That
>    avoids accepting just about all random data.

However, for pg_checksums being a stand-alone application it can't just
access the insertion pointer, can it? We could maybe set a threshold
from the last checkpoint after which we consider the pd_lsn bogus. But
what's a good threshold here?
 
And/or we could port the other sanity checks from PageIsVerified:

|                if ((p->pd_flags & ~PD_VALID_FLAG_BITS) == 0 &&
|                       p->pd_lower <= p->pd_upper &&
|                       p->pd_upper <= p->pd_special &&
|                       p->pd_special <= BLCKSZ &&
|                       p->pd_special == MAXALIGN(p->pd_special))
|                       header_sane = true

That should catch large-scale random corruption like you showed above. 



Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.banck@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Online verification of checksums
Next
From: Andres Freund
Date:
Subject: Re: Online verification of checksums