Re: Online verification of checksums - Mailing list pgsql-hackers
From | Michael Banck |
---|---|
Subject | Re: Online verification of checksums |
Date | |
Msg-id | 1553031556.9697.59.camel@credativ.de Whole thread Raw |
In response to | Re: Online verification of checksums (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Online verification of checksums
|
List | pgsql-hackers |
Hi, Am Dienstag, den 19.03.2019, 13:00 -0700 schrieb Andres Freund: > On 2019-03-20 03:27:55 +0800, Stephen Frost wrote: > > On Tue, Mar 19, 2019 at 23:59 Andres Freund <andres@anarazel.de> wrote: > > > On 2019-03-19 16:52:08 +0100, Michael Banck wrote: > > > > Am Dienstag, den 19.03.2019, 11:22 -0400 schrieb Robert Haas: > > > > > It's torn pages that I am concerned about - the server is writing and > > > > > we are reading, and we get a mix of old and new content. We have been > > > > > quite diligent about protecting ourselves from such risks elsewhere, > > > > > and checksum verification should not be held to any lesser standard. > > > > > > > > If we see a checksum failure on an otherwise correctly read block in > > > > online mode, we retry the block on the theory that we might have read a > > > > torn page. If the checksum verification still fails, we compare its LSN > > > > to the LSN of the current checkpoint and don't mind if its newer. This > > > > way, a torn page should not cause a false positive either way I > > > > think?. > > > > > > False positives, no. But there's plenty potential for false > > > negatives. In plenty clusters a large fraction of the pages is going to > > > be touched in most checkpoints. > > > > > > How is it a false negative? The page was in the middle of being > > written, > > You don't actually know that. It could just be random gunk in the LSN, > and this type of logic just ignores such failures as long as the random > gunk is above the system's LSN. Right, I think this needs to be taken into account. For pg_basebackup, that'd be an additional check for GetRedoRecPtr() or something in the below check: [...] > Well, I don't know what to tell you. But: > > /* > * Only check pages which have not been modified since the > * start of the base backup. Otherwise, they might have been > * written only halfway and the checksum would not be valid. > * However, replaying WAL would reinstate the correct page in > * this case. We also skip completely new pages, since they > * don't have a checksum yet. > */ > if (!PageIsNew(page) && PageGetLSN(page) < startptr) > { > > doesn't consider plenty scenarios, as pointed out above. It'd be one > thing if the concerns I point out above were actually commented upon and > weighed not substantial enough (not that I know how). But... > > > Do you have any example cases where the code in pg_basebackup has resulted > > in either a false positive or a false negative? Any case which can be > > shown to result in either? > > CREATE TABLE corruptme AS SELECT g.i::text AS data FROM generate_series(1, 1000000) g(i); > SELECT pg_relation_size('corruptme'); > postgres[22890][1]=# SELECT current_setting('data_directory') || '/' || pg_relation_filepath('corruptme'); > ┌─────────────────────────────────────┐ > │ ?column? │ > ├─────────────────────────────────────┤ > │ /srv/dev/pgdev-dev/base/13390/16384 │ > └─────────────────────────────────────┘ > (1 row) > dd if=/dev/urandom of=/srv/dev/pgdev-dev/base/13390/16384 bs=8192 count=1 conv=notrunc > > Try a basebackup and see how many times it'll detect the corrupt > data. In the vast majority of cases you're going to see checksum > failures when reading the data for normal operation, but not when using > basebackup (or this new tool). Right, see above. > At the very very least this would need to do > > a) checks that the page is all zeroes if PageIsNew() (like > PageIsVerified() does for the backend). That avoids missing cases > where corruption just zeroed out the header, but not the whole page. We can't run pg_checksum_page() on those afterwards though as it would fire an assertion: |pg_checksums: [...]/../src/include/storage/checksum_impl.h:194: |pg_checksum_page: Assertion `!(((PageHeader) (&cpage->phdr))->pd_upper |== 0)' failed. But we should count it as a checksum error and generate an appropriate error message in that case. > b) Check that pd_lsn is between startlsn and the insertion pointer. That > avoids accepting just about all random data. However, for pg_checksums being a stand-alone application it can't just access the insertion pointer, can it? We could maybe set a threshold from the last checkpoint after which we consider the pd_lsn bogus. But what's a good threshold here? And/or we could port the other sanity checks from PageIsVerified: | if ((p->pd_flags & ~PD_VALID_FLAG_BITS) == 0 && | p->pd_lower <= p->pd_upper && | p->pd_upper <= p->pd_special && | p->pd_special <= BLCKSZ && | p->pd_special == MAXALIGN(p->pd_special)) | header_sane = true That should catch large-scale random corruption like you showed above. Michael -- Michael Banck Projektleiter / Senior Berater Tel.: +49 2166 9901-171 Fax: +49 2166 9901-100 Email: michael.banck@credativ.de credativ GmbH, HRB Mönchengladbach 12080 USt-ID-Nummer: DE204566209 Trompeterallee 108, 41189 Mönchengladbach Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer Unser Umgang mit personenbezogenen Daten unterliegt folgenden Bestimmungen: https://www.credativ.de/datenschutz
pgsql-hackers by date: