Re: corruption of WAL page header is never reported - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: corruption of WAL page header is never reported
Date
Msg-id 20210719.174707.905942361719166634.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: corruption of WAL page header is never reported  (Yugo NAGATA <nagata@sraoss.co.jp>)
Responses Re: corruption of WAL page header is never reported  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Re: corruption of WAL page header is never reported  (Yugo NAGATA <nagata@sraoss.co.jp>)
List pgsql-hackers
At Mon, 19 Jul 2021 16:00:39 +0900, Yugo NAGATA <nagata@sraoss.co.jp> wrote in 
> Your patch doesn't fix the issue that the error message is never reported in
> standby mode. When a WAL page header is broken, the standby would silently repeat
> retrying forever.

Ok, I see your point and agree to that.

> I think we have to let users know the corruption of WAL page header even in
> standby mode, not? A corruption of WAL record header is always reported,
> by the way. (See that XLogReadRecord is calling ValidXLogRecordHeader.)

Howeer, I'm still on the opinion that we don't need to check that
while in standby mode.

How about the attached?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From fe23fbf51569eb7a305bd9e619e918aec0b87bf7 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Mon, 19 Jul 2021 14:49:34 +0900
Subject: [PATCH] Don't hide xlog page-header errors while recovery

The commit 0668719801 intended to handle the case of standby mode but
it inadvertently catches also while not in standby mode.  Change the
condition to catch only while in non-standby mode.

Addition to that, it is reasonable to show the error message even in
standby mode. Show the error message if any.

Author: Yugo NAGATA <nagata@sraoss.co.jp>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
---
 src/backend/access/transam/xlog.c | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 2ee9515139..221e7d1f07 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -12317,8 +12317,17 @@ retry:
      * Validating the page header is cheap enough that doing it twice
      * shouldn't be a big deal from a performance point of view.
      */
-    if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
+    if (StandbyMode &&
+        !XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
     {
+        /*
+         * in this case we consume this error right now then retry immediately,
+         * the message is already translated
+         */
+        if (xlogreader->errormsg_buf[0])
+            ereport(emode_for_corrupt_record(emode, EndRecPtr),
+                    (errmsg_internal("%s", xlogreader->errormsg_buf)));
+
         /* reset any error XLogReaderValidatePageHeader() might have set */
         xlogreader->errormsg_buf[0] = '\0';
         goto next_record_is_invalid;
-- 
2.27.0


pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Update maintenance_work_mem/autovacuum_work_mem to reflect the 1GB limitation with VACUUM
Next
From: Amit Kapila
Date:
Subject: Re: Skipping logical replication transactions on subscriber side