Re: corruption of WAL page header is never reported - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: corruption of WAL page header is never reported |
Date | |
Msg-id | 20210719.151441.1342311546952131179.horikyota.ntt@gmail.com Whole thread Raw |
In response to | corruption of WAL page header is never reported (Yugo NAGATA <nagata@sraoss.co.jp>) |
Responses |
Re: corruption of WAL page header is never reported
|
List | pgsql-hackers |
Hello. At Sun, 18 Jul 2021 04:55:05 +0900, Yugo NAGATA <nagata@sraoss.co.jp> wrote in > Hello, > > I found that any corruption of WAL page header found during recovery is never > reported in log messages. If wal page header is broken, it is detected in > XLogReaderValidatePageHeader called from XLogPageRead, but the error messages > are always reset and never reported. Good catch! Currently recovery stops showing no reason if it is stopped by page-header errors. > I attached a patch to fix it in this way. However, it is a kind of a roof-over-a-roof. What we should do is just omitting the check in XLogPageRead while in standby mode. > Or, if we wouldn't like to report an error for each check and also what we want > to check here is just about old recycled WAL instead of header corruption itself, > I wander that we could check just xlp_pageaddr instead of calling > XLogReaderValidatePageHeader. I'm not sure. But as described in the commit message, the commit intended to save a common case and there's no obvious reason to (and not to) restrict the check only to page address. So it uses the established checking function. I was tempted to adjust the comment just above by adding "while in standby mode", but "so that we can retry immediately" is suggesting that so I didn't do that in the attached. regards. -- Kyotaro Horiguchi NTT Open Source Software Center From 30033d810bcc784da55600792484603e1c46b3d7 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Mon, 19 Jul 2021 14:49:34 +0900 Subject: [PATCH v1] Don't forget message of hage-header errors while not in standby mode The commit 0668719801 intended to omit page-header errors only while in standby mode but actually it is always forgotten. As the result the message of the end of a crash recovery lacks the reason for the stop. Fix that by doing the additional check only while in standby mode. --- src/backend/access/transam/xlog.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 2ee9515139..79513fb8b5 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -12317,7 +12317,8 @@ retry: * Validating the page header is cheap enough that doing it twice * shouldn't be a big deal from a performance point of view. */ - if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf)) + if (StandbyMode && + !XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf)) { /* reset any error XLogReaderValidatePageHeader() might have set */ xlogreader->errormsg_buf[0] = '\0'; -- 2.27.0
pgsql-hackers by date: