Re: corruption of WAL page header is never reported - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: corruption of WAL page header is never reported
Date
Msg-id 20210719.151441.1342311546952131179.horikyota.ntt@gmail.com
Whole thread Raw
In response to corruption of WAL page header is never reported  (Yugo NAGATA <nagata@sraoss.co.jp>)
Responses Re: corruption of WAL page header is never reported  (Yugo NAGATA <nagata@sraoss.co.jp>)
List pgsql-hackers
Hello.

At Sun, 18 Jul 2021 04:55:05 +0900, Yugo NAGATA <nagata@sraoss.co.jp> wrote in 
> Hello,
> 
> I found that any corruption of WAL page header found during recovery is never
> reported in log messages. If wal page header is broken, it is detected in
> XLogReaderValidatePageHeader called from  XLogPageRead, but the error messages
> are always reset and never reported.

Good catch!  Currently recovery stops showing no reason if it is
stopped by page-header errors.

> I attached a patch to fix it in this way.

However, it is a kind of a roof-over-a-roof.  What we should do is
just omitting the check in XLogPageRead while in standby mode.

> Or, if we wouldn't like to report an error for each check and also what we want
> to check here is just about old recycled WAL instead of header corruption itself, 
> I wander that we could check just xlp_pageaddr instead of calling
> XLogReaderValidatePageHeader.

I'm not sure. But as described in the commit message, the commit
intended to save a common case and there's no obvious reason to (and
not to) restrict the check only to page address. So it uses the
established checking function.

I was tempted to adjust the comment just above by adding "while in
standby mode", but "so that we can retry immediately" is suggesting
that so I didn't do that in the attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From 30033d810bcc784da55600792484603e1c46b3d7 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Mon, 19 Jul 2021 14:49:34 +0900
Subject: [PATCH v1] Don't forget message of hage-header errors while not in
 standby mode

The commit 0668719801 intended to omit page-header errors only while
in standby mode but actually it is always forgotten.  As the result
the message of the end of a crash recovery lacks the reason for the
stop. Fix that by doing the additional check only while in standby
mode.
---
 src/backend/access/transam/xlog.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2ee9515139..79513fb8b5 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12317,7 +12317,8 @@ retry:
      * Validating the page header is cheap enough that doing it twice
      * shouldn't be a big deal from a performance point of view.
      */
-    if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
+    if (StandbyMode &&
+        !XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
     {
         /* reset any error XLogReaderValidatePageHeader() might have set */
         xlogreader->errormsg_buf[0] = '\0';
-- 
2.27.0


pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: RFC: Logging plan of the running query
Next
From: Masahiko Sawada
Date:
Subject: Re: [PoC] Improve dead tuple storage for lazy vacuum