Re: Make mesage at end-of-recovery less scary. - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: Make mesage at end-of-recovery less scary.
Date
Msg-id 20220304.094359.2159611925830858692.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: Make mesage at end-of-recovery less scary.  (Ashutosh Sharma <ashu.coek88@gmail.com>)
Responses Re: Make mesage at end-of-recovery less scary.
List pgsql-hackers
At Thu, 3 Mar 2022 15:39:44 +0530, Ashutosh Sharma <ashu.coek88@gmail.com> wrote in 
> The new changes made in the patch look good. Thanks to the recent
> changes to speed WAL insertion that have helped us catch this bug.

Thanks for the quick checking.

> One small comment:
> 
>         record = (XLogRecord *) (state->readBuf + RecPtr % XLOG_BLCKSZ);
> -       total_len = record->xl_tot_len;
> 
> Do you think we need to change the position of the comments written
> for above code that says:

Yeah, I didn't do that since it is about header verification.  But as
you pointed, the result still doesn't look perfect.

On second thought the two seems repeating the same things.  Thus I
merged the two comments together.  In this verion 16 it looks like
this.

>    /*
>     * Validate the record header.
>     *
>     * Even though we use an XLogRecord pointer here, the whole record header
>     * might not fit on this page.  If the whole record header is on this page,
>     * validate it immediately.  Even otherwise xl_tot_len must be on this page
>     * (it is the first field of MAXALIGNed records), but we still cannot
>     * access any further fields until we've verified that we got the whole
>     * header, so do just a basic sanity check on record length, and validate
>     * the rest of the header after reading it from the next page.  The length
>     * check is necessary here to ensure that we enter the "Need to reassemble
>     * record" code path below; otherwise we might fail to apply
>     * ValidXLogRecordHeader at all.
>     */
>    record = (XLogRecord *) (state->readBuf + RecPtr % XLOG_BLCKSZ);
>
>    if (targetRecOff <= XLOG_BLCKSZ - SizeOfXLogRecord)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 00d848df6bb8b9966dfbd39c98a388fda42a3e3c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Fri, 28 Feb 2020 15:52:58 +0900
Subject: [PATCH v16] Make End-Of-Recovery error less scary

When recovery in any type ends, we see a bit scary error message like
"invalid record length" that suggests something serious is
happening. Actually if recovery meets a record with length = 0, that
usually means it finished applying all available WAL records.

Make this message less scary as "reached end of WAL". Instead, raise
the error level for other kind of WAL failure to WARNING.
---
 src/backend/access/transam/xlogreader.c   | 144 +++++++++++++++++-----
 src/backend/access/transam/xlogrecovery.c |  92 ++++++++++----
 src/backend/replication/walreceiver.c     |   7 +-
 src/bin/pg_waldump/pg_waldump.c           |  13 +-
 src/include/access/xlogreader.h           |   1 +
 src/test/recovery/t/011_crash_recovery.pl | 106 ++++++++++++++++
 6 files changed, 305 insertions(+), 58 deletions(-)

diff --git a/src/backend/access/transam/xlogreader.c b/src/backend/access/transam/xlogreader.c
index 35029cf97d..bd0f211a23 100644
--- a/src/backend/access/transam/xlogreader.c
+++ b/src/backend/access/transam/xlogreader.c
@@ -42,6 +42,8 @@ static bool allocate_recordbuf(XLogReaderState *state, uint32 reclength);
 static int    ReadPageInternal(XLogReaderState *state, XLogRecPtr pageptr,
                              int reqLen);
 static void XLogReaderInvalReadState(XLogReaderState *state);
+static bool ValidXLogRecordLength(XLogReaderState *state, XLogRecPtr RecPtr,
+                                  XLogRecord *record);
 static bool ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
                                   XLogRecPtr PrevRecPtr, XLogRecord *record, bool randAccess);
 static bool ValidXLogRecord(XLogReaderState *state, XLogRecord *record,
@@ -121,6 +123,7 @@ XLogReaderAllocate(int wal_segment_size, const char *waldir,
         pfree(state);
         return NULL;
     }
+    state->EndOfWAL = false;
     state->errormsg_buf[0] = '\0';
 
     /*
@@ -292,6 +295,7 @@ XLogReadRecord(XLogReaderState *state, char **errormsg)
     /* reset error state */
     *errormsg = NULL;
     state->errormsg_buf[0] = '\0';
+    state->EndOfWAL = false;
 
     ResetDecoder(state);
     state->abortedRecPtr = InvalidXLogRecPtr;
@@ -371,25 +375,21 @@ restart:
     Assert(pageHeaderSize <= readOff);
 
     /*
-     * Read the record length.
+     * Validate the record header.
      *
-     * NB: Even though we use an XLogRecord pointer here, the whole record
-     * header might not fit on this page. xl_tot_len is the first field of the
-     * struct, so it must be on this page (the records are MAXALIGNed), but we
-     * cannot access any other fields until we've verified that we got the
-     * whole header.
-     */
-    record = (XLogRecord *) (state->readBuf + RecPtr % XLOG_BLCKSZ);
-    total_len = record->xl_tot_len;
-
-    /*
-     * If the whole record header is on this page, validate it immediately.
-     * Otherwise do just a basic sanity check on xl_tot_len, and validate the
-     * rest of the header after reading it from the next page.  The xl_tot_len
+     * Even though we use an XLogRecord pointer here, the whole record header
+     * might not fit on this page.  If the whole record header is on this page,
+     * validate it immediately.  Even otherwise xl_tot_len must be on this page
+     * (it is the first field of MAXALIGNed records), but we still cannot
+     * access any further fields until we've verified that we got the whole
+     * header, so do just a basic sanity check on record length, and validate
+     * the rest of the header after reading it from the next page.  The length
      * check is necessary here to ensure that we enter the "Need to reassemble
      * record" code path below; otherwise we might fail to apply
      * ValidXLogRecordHeader at all.
      */
+    record = (XLogRecord *) (state->readBuf + RecPtr % XLOG_BLCKSZ);
+
     if (targetRecOff <= XLOG_BLCKSZ - SizeOfXLogRecord)
     {
         if (!ValidXLogRecordHeader(state, RecPtr, state->ReadRecPtr, record,
@@ -399,18 +399,13 @@ restart:
     }
     else
     {
-        /* XXX: more validation should be done here */
-        if (total_len < SizeOfXLogRecord)
-        {
-            report_invalid_record(state,
-                                  "invalid record length at %X/%X: wanted %u, got %u",
-                                  LSN_FORMAT_ARGS(RecPtr),
-                                  (uint32) SizeOfXLogRecord, total_len);
+        if (!ValidXLogRecordLength(state, RecPtr, record))
             goto err;
-        }
+
         gotheader = false;
     }
 
+    total_len = record->xl_tot_len;
     len = XLOG_BLCKSZ - RecPtr % XLOG_BLCKSZ;
     if (total_len > len)
     {
@@ -588,6 +583,15 @@ err:
          */
         state->abortedRecPtr = RecPtr;
         state->missingContrecPtr = targetPagePtr;
+
+        /*
+         * If the message is not set yet, that means we failed to load the
+         * page for the record.  Otherwise do not hide the existing message.
+         */
+        if (state->errormsg_buf[0] == '\0')
+            report_invalid_record(state,
+                                  "missing contrecord at %X/%X",
+                                  LSN_FORMAT_ARGS(RecPtr));
     }
 
     /*
@@ -719,6 +723,60 @@ XLogReaderInvalReadState(XLogReaderState *state)
     state->readLen = 0;
 }
 
+/*
+ * Validate record length of an XLOG record header.
+ *
+ * This is substantially a part of ValidXLogRecordHeader.  But XLogReadRecord
+ * needs this separate from the function in case of a partial record header.
+ */
+static bool
+ValidXLogRecordLength(XLogReaderState *state, XLogRecPtr RecPtr,
+                      XLogRecord *record)
+{
+    if (record->xl_tot_len == 0)
+    {
+        char       *p;
+        char       *pe;
+
+        /*
+         * We are almost sure reaching the end of WAL, make sure that the
+         * whole page after the record is filled with zeroes.
+         */
+        p = (char *) record;
+        pe = p + XLOG_BLCKSZ - (RecPtr & (XLOG_BLCKSZ - 1));
+
+        while (*p == 0 && p < pe)
+            p++;
+
+        if (p == pe)
+        {
+            /*
+             * The page after the record is completely zeroed. That suggests
+             * we don't have a record after this point. We don't bother
+             * checking the pages after since they are not zeroed in the case
+             * of recycled segments.
+             */
+            report_invalid_record(state, "empty record at %X/%X",
+                                  LSN_FORMAT_ARGS(RecPtr));
+
+            /* notify end-of-wal to callers */
+            state->EndOfWAL = true;
+            return false;
+        }
+    }
+
+    if (record->xl_tot_len < SizeOfXLogRecord)
+    {
+        report_invalid_record(state,
+                              "invalid record length at %X/%X: wanted %u, got %u",
+                              LSN_FORMAT_ARGS(RecPtr),
+                              (uint32) SizeOfXLogRecord, record->xl_tot_len);
+        return false;
+    }
+
+    return true;
+}
+
 /*
  * Validate an XLOG record header.
  *
@@ -730,14 +788,9 @@ ValidXLogRecordHeader(XLogReaderState *state, XLogRecPtr RecPtr,
                       XLogRecPtr PrevRecPtr, XLogRecord *record,
                       bool randAccess)
 {
-    if (record->xl_tot_len < SizeOfXLogRecord)
-    {
-        report_invalid_record(state,
-                              "invalid record length at %X/%X: wanted %u, got %u",
-                              LSN_FORMAT_ARGS(RecPtr),
-                              (uint32) SizeOfXLogRecord, record->xl_tot_len);
+    if (!ValidXLogRecordLength(state, RecPtr, record))
         return false;
-    }
+
     if (record->xl_rmid > RM_MAX_ID)
     {
         report_invalid_record(state,
@@ -836,6 +889,31 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
 
     XLogSegNoOffsetToRecPtr(segno, offset, state->segcxt.ws_segsize, recaddr);
 
+    StaticAssertStmt(XLOG_PAGE_MAGIC != 0, "XLOG_PAGE_MAGIC is zero");
+
+    if (hdr->xlp_magic == 0)
+    {
+        /* Regard an empty page as End-Of-WAL */
+        int            i;
+
+        for (i = 0; i < XLOG_BLCKSZ && phdr[i] == 0; i++);
+        if (i == XLOG_BLCKSZ)
+        {
+            char        fname[MAXFNAMELEN];
+
+            XLogFileName(fname, state->seg.ws_tli, segno,
+                         state->segcxt.ws_segsize);
+
+            report_invalid_record(state,
+                                  "empty page in log segment %s, offset %u",
+                                  fname,
+                                  offset);
+            state->EndOfWAL = true;
+            return false;
+        }
+
+        /* The same condition will be caught as invalid magic number */
+    }
     if (hdr->xlp_magic != XLOG_PAGE_MAGIC)
     {
         char        fname[MAXFNAMELEN];
@@ -921,6 +999,14 @@ XLogReaderValidatePageHeader(XLogReaderState *state, XLogRecPtr recptr,
                               LSN_FORMAT_ARGS(hdr->xlp_pageaddr),
                               fname,
                               offset);
+
+        /*
+         * If the page address is less than expected we assume it is an unused
+         * page in a recycled segment.
+         */
+        if (hdr->xlp_pageaddr < recaddr)
+            state->EndOfWAL = true;
+
         return false;
     }
 
diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c
index f9f212680b..750056acaf 100644
--- a/src/backend/access/transam/xlogrecovery.c
+++ b/src/backend/access/transam/xlogrecovery.c
@@ -1592,7 +1592,7 @@ PerformWalRecovery(void)
         /* just have to read next record after CheckPoint */
         Assert(xlogreader->ReadRecPtr == CheckPointLoc);
         replayTLI = CheckPointTLI;
-        record = ReadRecord(xlogreader, LOG, false, replayTLI);
+        record = ReadRecord(xlogreader, WARNING, false, replayTLI);
     }
 
     if (record != NULL)
@@ -1706,7 +1706,7 @@ PerformWalRecovery(void)
             }
 
             /* Else, try to fetch the next WAL record */
-            record = ReadRecord(xlogreader, LOG, false, replayTLI);
+            record = ReadRecord(xlogreader, WARNING, false, replayTLI);
         } while (record != NULL);
 
         /*
@@ -1765,13 +1765,20 @@ PerformWalRecovery(void)
 
         InRedo = false;
     }
-    else
+    else if (xlogreader->EndOfWAL)
     {
         /* there are no WAL records following the checkpoint */
         ereport(LOG,
                 (errmsg("redo is not required")));
 
     }
+    else
+    {
+        /* broken record found */
+        ereport(WARNING,
+                (errmsg("redo is skipped"),
+                 errhint("This suggests WAL file corruption. You might need to check the database.")));
+    }
 
     /*
      * This check is intentionally after the above log messages that indicate
@@ -2939,6 +2946,7 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
     for (;;)
     {
         char       *errormsg;
+        XLogRecPtr    ErrRecPtr = InvalidXLogRecPtr;
 
         record = XLogReadRecord(xlogreader, &errormsg);
         if (record == NULL)
@@ -2954,6 +2962,18 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
             {
                 abortedRecPtr = xlogreader->abortedRecPtr;
                 missingContrecPtr = xlogreader->missingContrecPtr;
+                ErrRecPtr = abortedRecPtr;
+            }
+            else
+            {
+                /*
+                 * EndRecPtr is the LSN we tried to read but failed. In the
+                 * case of decoding error, it is at the end of the failed
+                 * record but we don't have a means for now to know EndRecPtr
+                 * is pointing to which of the beginning or ending of the
+                 * failed record.
+                 */
+                ErrRecPtr = xlogreader->EndRecPtr;
             }
 
             if (readFile >= 0)
@@ -2963,13 +2983,16 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
             }
 
             /*
-             * We only end up here without a message when XLogPageRead()
-             * failed - in that case we already logged something. In
-             * StandbyMode that only happens if we have been triggered, so we
-             * shouldn't loop anymore in that case.
+             * We only end up here without a message when XLogPageRead() failed
+             * in that case we already logged something, or just met end-of-WAL
+             * conditions. In StandbyMode that only happens if we have been
+             * triggered, so we shouldn't loop anymore in that case. When
+             * EndOfWAL is true, we don't emit that error if any immediately
+             * and instead will show it as a part of a decent end-of-wal
+             * message later.
              */
-            if (errormsg)
-                ereport(emode_for_corrupt_record(emode, xlogreader->EndRecPtr),
+            if (!xlogreader->EndOfWAL && errormsg)
+                ereport(emode_for_corrupt_record(emode, ErrRecPtr),
                         (errmsg_internal("%s", errormsg) /* already translated */ ));
         }
 
@@ -3000,11 +3023,14 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
             /* Great, got a record */
             return record;
         }
-        else
+
+        Assert(ErrRecPtr != InvalidXLogRecPtr);
+
+        /* No valid record available from this source */
+        lastSourceFailed = true;
+
+        if (!fetching_ckpt)
         {
-            /* No valid record available from this source */
-            lastSourceFailed = true;
-
             /*
              * If archive recovery was requested, but we were still doing
              * crash recovery, switch to archive recovery and retry using the
@@ -3017,11 +3043,16 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
              * we'd have no idea how far we'd have to replay to reach
              * consistency.  So err on the safe side and give up.
              */
-            if (!InArchiveRecovery && ArchiveRecoveryRequested &&
-                !fetching_ckpt)
+            if (!InArchiveRecovery && ArchiveRecoveryRequested)
             {
+                /*
+                 * We don't report this as LOG, since we don't stop recovery
+                 * here
+                 */
                 ereport(DEBUG1,
-                        (errmsg_internal("reached end of WAL in pg_wal, entering archive recovery")));
+                        (errmsg_internal("reached end of WAL at %X/%X on timeline %u in pg_wal during crash recovery,
enteringarchive recovery",
 
+                                         LSN_FORMAT_ARGS(ErrRecPtr),
+                                         replayTLI)));
                 InArchiveRecovery = true;
                 if (StandbyModeRequested)
                     StandbyMode = true;
@@ -3042,12 +3073,24 @@ ReadRecord(XLogReaderState *xlogreader, int emode,
                 continue;
             }
 
-            /* In standby mode, loop back to retry. Otherwise, give up. */
-            if (StandbyMode && !CheckForStandbyTrigger())
-                continue;
-            else
-                return NULL;
+            /*
+             * recovery ended.
+             *
+             * Emit a decent message if we met end-of-WAL. Otherwise we should
+             * have already emitted an error message.
+             */
+            if (xlogreader->EndOfWAL)
+                ereport(LOG,
+                        (errmsg("reached end of WAL at %X/%X on timeline %u",
+                                LSN_FORMAT_ARGS(ErrRecPtr), replayTLI),
+                         (errormsg ? errdetail_internal("%s", errormsg) : 0)));
         }
+
+        /* In standby mode, loop back to retry. Otherwise, give up. */
+        if (StandbyMode && !CheckForStandbyTrigger())
+            continue;
+        else
+            return NULL;
     }
 }
 
@@ -3129,12 +3172,16 @@ retry:
                                          private->replayTLI,
                                          xlogreader->EndRecPtr))
         {
+            Assert(!StandbyMode || CheckForStandbyTrigger());
+
             if (readFile >= 0)
                 close(readFile);
             readFile = -1;
             readLen = 0;
             readSource = XLOG_FROM_ANY;
 
+            /* promotion exit is not end-of-WAL */
+            xlogreader->EndOfWAL = !StandbyMode;
             return -1;
         }
     }
@@ -3767,7 +3814,8 @@ emode_for_corrupt_record(int emode, XLogRecPtr RecPtr)
 {
     static XLogRecPtr lastComplaint = 0;
 
-    if (readSource == XLOG_FROM_PG_WAL && emode == LOG)
+    /* use currentSource as readSource is reset at failure */
+    if (currentSource == XLOG_FROM_PG_WAL && emode <= WARNING)
     {
         if (RecPtr == lastComplaint)
             emode = DEBUG1;
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index ceaff097b9..4f117ea4da 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -472,10 +472,9 @@ WalReceiverMain(void)
                         else if (len < 0)
                         {
                             ereport(LOG,
-                                    (errmsg("replication terminated by primary server"),
-                                     errdetail("End of WAL reached on timeline %u at %X/%X.",
-                                               startpointTLI,
-                                               LSN_FORMAT_ARGS(LogstreamResult.Write))));
+                                    (errmsg("replication terminated by primary server at %X/%X on timeline %u.",
+                                            LSN_FORMAT_ARGS(LogstreamResult.Write),
+                                            startpointTLI)));
                             endofwal = true;
                             break;
                         }
diff --git a/src/bin/pg_waldump/pg_waldump.c b/src/bin/pg_waldump/pg_waldump.c
index 2340dc247b..215abe95dc 100644
--- a/src/bin/pg_waldump/pg_waldump.c
+++ b/src/bin/pg_waldump/pg_waldump.c
@@ -1173,9 +1173,16 @@ main(int argc, char **argv)
         exit(0);
 
     if (errormsg)
-        fatal_error("error in WAL record at %X/%X: %s",
-                    LSN_FORMAT_ARGS(xlogreader_state->ReadRecPtr),
-                    errormsg);
+    {
+        if (xlogreader_state->EndOfWAL)
+            pg_log_info("end of WAL at %X/%X: %s",
+                        LSN_FORMAT_ARGS(xlogreader_state->EndRecPtr),
+                        errormsg);
+        else
+            fatal_error("error in WAL record at %X/%X: %s",
+                        LSN_FORMAT_ARGS(xlogreader_state->EndRecPtr),
+                        errormsg);
+    }
 
     XLogReaderFree(xlogreader_state);
 
diff --git a/src/include/access/xlogreader.h b/src/include/access/xlogreader.h
index 477f0efe26..7b314ef10e 100644
--- a/src/include/access/xlogreader.h
+++ b/src/include/access/xlogreader.h
@@ -174,6 +174,7 @@ struct XLogReaderState
      */
     XLogRecPtr    ReadRecPtr;        /* start of last record read */
     XLogRecPtr    EndRecPtr;        /* end+1 of last record read */
+    bool        EndOfWAL;        /* was the last attempt EOW? */
 
     /*
      * Set at the end of recovery: the start point of a partial record at the
diff --git a/src/test/recovery/t/011_crash_recovery.pl b/src/test/recovery/t/011_crash_recovery.pl
index 14154d1ce0..01033334d6 100644
--- a/src/test/recovery/t/011_crash_recovery.pl
+++ b/src/test/recovery/t/011_crash_recovery.pl
@@ -10,7 +10,9 @@ use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
 use Config;
+use IPC::Run;
 
+my $reached_eow_pat = "reached end of WAL at ";
 my $node = PostgreSQL::Test::Cluster->new('primary');
 $node->init(allows_streaming => 1);
 $node->start;
@@ -48,7 +50,15 @@ is($node->safe_psql('postgres', qq[SELECT pg_xact_status('$xid');]),
 
 # Crash and restart the postmaster
 $node->stop('immediate');
+my $logstart = get_log_size($node);
 $node->start;
+my $max_attempts = 360;
+while ($max_attempts-- >= 0)
+{
+    last if (find_in_log($node, $reached_eow_pat, $logstart));
+    sleep 0.5;
+}
+ok ($max_attempts >= 0, "end-of-wal is logged");
 
 # Make sure we really got a new xid
 cmp_ok($node->safe_psql('postgres', 'SELECT pg_current_xact_id()'),
@@ -61,4 +71,100 @@ is($node->safe_psql('postgres', qq[SELECT pg_xact_status('$xid');]),
 $stdin .= "\\q\n";
 $tx->finish;    # wait for psql to quit gracefully
 
+my $segsize = $node->safe_psql('postgres',
+       qq[SELECT setting FROM pg_settings WHERE name = 'wal_segment_size';]);
+
+# make sure no records afterwards go to the next segment
+$node->safe_psql('postgres', qq[
+                 SELECT pg_switch_wal();
+                 CHECKPOINT;
+                 CREATE TABLE t();
+]);
+$node->stop('immediate');
+
+# identify REDO WAL file
+my $cmd = "pg_controldata -D " . $node->data_dir();
+$cmd = ['pg_controldata', '-D', $node->data_dir()];
+$stdout = '';
+$stderr = '';
+IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;
+ok($stdout =~ /^Latest checkpoint's REDO WAL file:[ \t] *(.+)$/m,
+   "checkpoint file is identified");
+my $chkptfile = $1;
+
+# identify the last record
+my $walfile = $node->data_dir() . "/pg_wal/$chkptfile";
+$cmd = ['pg_waldump', $walfile];
+$stdout = '';
+$stderr = '';
+my $lastrec;
+IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;
+foreach my $l (split(/\r?\n/, $stdout))
+{
+    $lastrec = $l;
+}
+ok(defined $lastrec, "last WAL record is extracted");
+ok($stderr =~ /end of WAL at ([0-9A-F\/]+): .* at \g1/,
+   "pg_waldump emits the correct ending message");
+
+# read the last record LSN excluding leading zeroes
+ok ($lastrec =~ /, lsn: 0\/0*([1-9A-F][0-9A-F]+),/,
+    "LSN of the last record identified");
+my $lastlsn = $1;
+
+# corrupt the last record
+my $offset = hex($lastlsn) % $segsize;
+open(my $segf, '+<', $walfile) or die "failed to open $walfile\n";
+seek($segf, $offset, 0);  # halfway break the last record
+print $segf "\0\0\0\0";
+close($segf);
+
+# pg_waldump complains about the corrupted record
+$stdout = '';
+$stderr = '';
+IPC::Run::run $cmd, '>', \$stdout, '2>', \$stderr;
+ok($stderr =~ /fatal: error in WAL record at 0\/$lastlsn: .* at 0\/$lastlsn/,
+   "pg_waldump emits the correct error message");
+
+# also server complains
+$logstart = get_log_size($node);
+$node->start;
+$max_attempts = 360;
+while ($max_attempts-- >= 0)
+{
+    last if (find_in_log($node, "WARNING:  invalid record length at 0/$lastlsn: wanted [0-9]+, got 0",
+                         $logstart));
+    sleep 0.5;
+}
+ok($max_attempts >= 0, "header error is logged at $lastlsn");
+
+# no end-of-wal message should be seen this time
+ok(!find_in_log($node, $reached_eow_pat, $logstart),
+   "false log message is not emitted");
+
+$node->stop('immediate');
+
 done_testing();
+
+#### helper routines
+# return the size of logfile of $node in bytes
+sub get_log_size
+{
+    my ($node) = @_;
+
+    return (stat $node->logfile)[7];
+}
+
+# find $pat in logfile of $node after $off-th byte
+sub find_in_log
+{
+    my ($node, $pat, $off) = @_;
+
+    $off = 0 unless defined $off;
+    my $log = PostgreSQL::Test::Utils::slurp_file($node->logfile);
+    return 0 if (length($log) <= $off);
+
+    $log = substr($log, $off);
+
+    return $log =~ m/$pat/;
+}
-- 
2.27.0


pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Next
From: Michael Paquier
Date:
Subject: Re: pg_stop_backup() v2 incorrectly marked as proretset