Re: Online verification of checksums - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Online verification of checksums
Date
Msg-id 26184.1586203155@sss.pgh.pa.us
Whole thread Raw
In response to Re: Online verification of checksums  (Michael Banck <michael.banck@credativ.de>)
Responses Re: Online verification of checksums  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Michael Banck <michael.banck@credativ.de> writes:
> [ 0001-Fix-checksum-verification-in-base-backups-for-random_V3.patch ]

I noticed that the cfbot wasn't testing this because of a minor merge
conflict.  I rebased it over that, and also readjusted things a little bit
to avoid unnecessarily reindenting existing code, in hopes of making the
patch easier to review.  Doing that reveals that the patch actually
removes a chunk of code, namely a special case for EOF.  Was that
intentional, or a result of a faulty merge earlier?  It certainly isn't
mentioned in your proposed commit message.

Another thing that's bothering me is that the patch compares page LSN
against GetInsertRecPtr(); but that function says

 * NOTE: The value *actually* returned is the position of the last full
 * xlog page. It lags behind the real insert position by at most 1 page.
 * For that, we don't need to scan through WAL insertion locks, and an
 * approximation is enough for the current usage of this function.

I'm not convinced that an approximation is good enough here.  It seems
like a page that's just now been updated could have an LSN beyond the
current XLOG page start, potentially leading to a false checksum
complaint.  Maybe we could address that by adding one xlog page to
the GetInsertRecPtr result?  Kind of a hack, but ...

            regards, tom lane

diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 5d94b9c..c7ff9a8 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -2028,15 +2028,47 @@ sendFile(const char *readfilename, const char *tarfilename,
                 page = buf + BLCKSZ * i;

                 /*
-                 * Only check pages which have not been modified since the
-                 * start of the base backup. Otherwise, they might have been
-                 * written only halfway and the checksum would not be valid.
-                 * However, replaying WAL would reinstate the correct page in
-                 * this case. We also skip completely new pages, since they
-                 * don't have a checksum yet.
+                 * We skip completely new pages after checking they are
+                 * all-zero, since they don't have a checksum yet.
                  */
-                if (!PageIsNew(page) && PageGetLSN(page) < startptr)
+                if (PageIsNew(page))
                 {
+                    if (!PageIsZero(page))
+                    {
+                        /*
+                         * pd_upper is zero, but the page is not all zero.  We
+                         * cannot run pg_checksum_page() on the page as it
+                         * would throw an assertion failure.  Consider this a
+                         * checksum failure.
+                         */
+                        checksum_failures++;
+
+                        if (checksum_failures <= 5)
+                            ereport(WARNING,
+                                    (errmsg("checksum verification failed in "
+                                            "file \"%s\", block %d: pd_upper "
+                                            "is zero but page is not all-zero",
+                                            readfilename, blkno)));
+                        if (checksum_failures == 5)
+                            ereport(WARNING,
+                                    (errmsg("further checksum verification "
+                                            "failures in file \"%s\" will not "
+                                            "be reported", readfilename)));
+                    }
+                }
+                else if (PageGetLSN(page) < startptr ||
+                         PageGetLSN(page) > GetInsertRecPtr())
+                {
+                    /*
+                     * Only check pages which have not been modified since the
+                     * start of the base backup. Otherwise, they might have
+                     * been written only halfway and the checksum would not be
+                     * valid. However, replaying WAL would reinstate the
+                     * correct page in this case. If the page LSN is larger
+                     * than the current insert pointer then we assume a bogus
+                     * LSN due to random page header corruption and do verify
+                     * the checksum.
+                     */
                     checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
                     phdr = (PageHeader) page;
                     if (phdr->pd_checksum != checksum)
@@ -2064,20 +2096,6 @@ sendFile(const char *readfilename, const char *tarfilename,

                             if (fread(buf + BLCKSZ * i, 1, BLCKSZ, fp) != BLCKSZ)
                             {
-                                /*
-                                 * If we hit end-of-file, a concurrent
-                                 * truncation must have occurred, so break out
-                                 * of this loop just as if the initial fread()
-                                 * returned 0. We'll drop through to the same
-                                 * code that handles that case. (We must fix
-                                 * up cnt first, though.)
-                                 */
-                                if (feof(fp))
-                                {
-                                    cnt = BLCKSZ * i;
-                                    break;
-                                }
-
                                 ereport(ERROR,
                                         (errcode_for_file_access(),
                                          errmsg("could not reread block %d of file \"%s\": %m",
diff --git a/src/backend/storage/page/bufpage.c b/src/backend/storage/page/bufpage.c
index d708117..2dc8322 100644
--- a/src/backend/storage/page/bufpage.c
+++ b/src/backend/storage/page/bufpage.c
@@ -82,11 +82,8 @@ bool
 PageIsVerified(Page page, BlockNumber blkno)
 {
     PageHeader    p = (PageHeader) page;
-    size_t       *pagebytes;
-    int            i;
     bool        checksum_failure = false;
     bool        header_sane = false;
-    bool        all_zeroes = false;
     uint16        checksum = 0;

     /*
@@ -120,18 +117,7 @@ PageIsVerified(Page page, BlockNumber blkno)
     }

     /* Check all-zeroes case */
-    all_zeroes = true;
-    pagebytes = (size_t *) page;
-    for (i = 0; i < (BLCKSZ / sizeof(size_t)); i++)
-    {
-        if (pagebytes[i] != 0)
-        {
-            all_zeroes = false;
-            break;
-        }
-    }
-
-    if (all_zeroes)
+    if (PageIsZero(page))
         return true;

     /*
@@ -154,6 +140,25 @@ PageIsVerified(Page page, BlockNumber blkno)
     return false;
 }

+/*
+ * PageIsZero
+ *        Check that the page consists only of zero bytes.
+ *
+ */
+bool
+PageIsZero(Page page)
+{
+    int            i;
+    size_t       *pagebytes = (size_t *) page;
+
+    for (i = 0; i < (BLCKSZ / sizeof(size_t)); i++)
+    {
+        if (pagebytes[i] != 0)
+            return false;
+    }
+
+    return true;
+}

 /*
  *    PageAddItemExtended
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index 6338176..598453e 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -6,7 +6,7 @@ use File::Basename qw(basename dirname);
 use File::Path qw(rmtree);
 use PostgresNode;
 use TestLib;
-use Test::More tests => 109;
+use Test::More tests => 112;

 program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
@@ -497,21 +497,37 @@ my $file_corrupt2 = $node->safe_psql('postgres',
 my $pageheader_size = 24;
 my $block_size = $node->safe_psql('postgres', 'SHOW block_size;');

-# induce corruption
+# induce corruption in the pageheader by writing random data into it
 system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
 open $file, '+<', "$pgdata/$file_corrupt1";
-seek($file, $pageheader_size, 0);
-syswrite($file, "\0\0\0\0\0\0\0\0\0");
+my $random_data = join '', map { ("a".."z")[rand 26] } 1 .. $pageheader_size;
+syswrite($file, $random_data);
+close $file;
+system_or_bail 'pg_ctl', '-D', $pgdata, 'start';
+
+$node->command_checks_all(
+    [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt1" ],
+    1,
+    [qr{^$}],
+    [qr/^WARNING.*checksum verification failed/s],
+    "pg_basebackup reports checksum mismatch for random pageheader data");
+rmtree("$tempdir/backup_corrupt1");
+
+# zero out the pageheader completely
+open $file, '+<', "$pgdata/$file_corrupt1";
+system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
+my $zero_data = "\0"x$pageheader_size;
+syswrite($file, $zero_data);
 close $file;
 system_or_bail 'pg_ctl', '-D', $pgdata, 'start';

 $node->command_checks_all(
-    [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt" ],
+    [ 'pg_basebackup', '-D', "$tempdir/backup_corrupt1a" ],
     1,
     [qr{^$}],
     [qr/^WARNING.*checksum verification failed/s],
-    'pg_basebackup reports checksum mismatch');
-rmtree("$tempdir/backup_corrupt");
+    "pg_basebackup reports checksum mismatch for zeroed pageheader");
+rmtree("$tempdir/backup_corrupt1a");

 # induce further corruption in 5 more blocks
 system_or_bail 'pg_ctl', '-D', $pgdata, 'stop';
diff --git a/src/include/storage/bufpage.h b/src/include/storage/bufpage.h
index 3f88683..a1fcb21 100644
--- a/src/include/storage/bufpage.h
+++ b/src/include/storage/bufpage.h
@@ -419,17 +419,18 @@ do { \
                         ((is_heap) ? PAI_IS_HEAP : 0))

 /*
- * Check that BLCKSZ is a multiple of sizeof(size_t).  In PageIsVerified(),
- * it is much faster to check if a page is full of zeroes using the native
- * word size.  Note that this assertion is kept within a header to make
- * sure that StaticAssertDecl() works across various combinations of
- * platforms and compilers.
+ * Check that BLCKSZ is a multiple of sizeof(size_t).  In PageIsZero(), it is
+ * much faster to check if a page is full of zeroes using the native word size.
+ * Note that this assertion is kept within a header to make sure that
+ * StaticAssertDecl() works across various combinations of platforms and
+ * compilers.
  */
 StaticAssertDecl(BLCKSZ == ((BLCKSZ / sizeof(size_t)) * sizeof(size_t)),
                  "BLCKSZ has to be a multiple of sizeof(size_t)");

 extern void PageInit(Page page, Size pageSize, Size specialSize);
 extern bool PageIsVerified(Page page, BlockNumber blkno);
+extern bool PageIsZero(Page page);
 extern OffsetNumber PageAddItemExtended(Page page, Item item, Size size,
                                         OffsetNumber offsetNumber, int flags);
 extern Page PageGetTempPage(Page page);

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: [PATCH] Incremental sort (was: PoC: Partial sort)
Next
From: Andrew Dunstan
Date:
Subject: Re: backup manifests and contemporaneous buildfarm failures