Re: corruption of WAL page header is never reported - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: corruption of WAL page header is never reported
Date
Msg-id 20210902.131716.1067877126401208860.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: corruption of WAL page header is never reported  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: corruption of WAL page header is never reported
Re: corruption of WAL page header is never reported
List pgsql-hackers
At Thu, 2 Sep 2021 12:19:25 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> 
> 
> On 2021/07/19 18:52, Yugo NAGATA wrote:
> > Well, I think my first patch was not wrong. The difference with the
> > latest
> > patch is just whether we perform the additional check when we are not
> > in
> > standby mode or not. The behavior is basically the same although which
> > function
> > detects and prints the page-header error in cases of crash recovery is
> > different.
> 
> Yes, so which patch do you think is better? I like your version
> because there seems no reason why XLogPageRead() should skip
> XLogReaderValidatePageHeader() when not in standby mode.

Did you read the comment just above?

xlog.c:12523
>     * Check the page header immediately, so that we can retry immediately if
>     * it's not valid. This may seem unnecessary, because XLogReadRecord()
>     * validates the page header anyway, and would propagate the failure up to
>     * ReadRecord(), which would retry. However, there's a corner case with
>     * continuation records, if a record is split across two pages such that

So when not in standby mode, the same check is performed by xlogreader
which has the responsibility to validate the binary data read by
XLogPageRead. The page-header validation is a compromise to save a
specific case.

> Also I'm tempted to move ereport() and reset of errmsg_buf to
> under next_record_is_invalid as follows. That is, in standby mode
> whenever we find an invalid record and retry reading WAL page
> in XLogPageRead(), we report the error message and reset it.
> For now in XLogPageRead(), only XLogReaderValidatePageHeader()
> sets errmsg_buf, but in the future other code or function doing that
> may be added. For that case, the following change seems more elegant.
> Thought?

I don't think it is good choice to conflate read-failure and header
validation failure from the view of modularity.

>      * shouldn't be a big deal from a performance point of view.
>       */
>      if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
> -    {
> -        /* reset any error XLogReaderValidatePageHeader() might have set */
> -        xlogreader->errormsg_buf[0] = '\0';
>          goto next_record_is_invalid;
> -    }
>        return readLen;
>  @@ -12517,7 +12513,17 @@ next_record_is_invalid:
>        /* In standby-mode, keep trying */
>      if (StandbyMode)
> +    {
> +        if (xlogreader->errormsg_buf[0])
> +        {
> +            ereport(emode_for_corrupt_record(emode, EndRecPtr),
> + (errmsg_internal("%s", xlogreader->errormsg_buf)));
> +
> + /* reset any error XLogReaderValidatePageHeader() might have set */
> +            xlogreader->errormsg_buf[0] = '\0';
> +        }
>          goto retry;
> +    }
>      else
>          return -1;
>  }

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: row filtering for logical replication
Next
From: Peter Smith
Date:
Subject: Re: row filtering for logical replication