Re: corruption of WAL page header is never reported - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: corruption of WAL page header is never reported
Date
Msg-id 20210913.110004.2070315095617522330.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: corruption of WAL page header is never reported  (Fujii Masao <masao.fujii@oss.nttdata.com>)
Responses Re: corruption of WAL page header is never reported
List pgsql-hackers
At Fri, 10 Sep 2021 10:38:39 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in 
> 
> 
> On 2021/09/07 2:02, Fujii Masao wrote:
> > Even if we do this while NOT in standby mode, ISTM that this function
> > doesn't
> > return with a valid errmsg_buf because it's reset. So probably the
> > comment
> > should be updated as follows?
> > -------------------------
> > We don't do this while not in standby mode because we don't need to
> > retry
> > immediately if the page header is not valid. Instead, XLogReadRecord()
> > is
> > responsible to check the page header.
> > -------------------------
> 
> I updated the comment as above. Patch attached.
> 
> -     * it's not valid. This may seem unnecessary, because XLogReadRecord()
> + * it's not valid. This may seem unnecessary, because
> ReadPageInternal()
>       * validates the page header anyway, and would propagate the failure up to
> 
> I also applied this change because ReadPageInternal() not
> XLogReadRecord()
> calls XLogReaderValidatePageHeader().

Yeah, good catch.


+     * Note that we don't do this while not in standby mode because we don't
+     * need to retry immediately if the page header is not valid. Instead,
+     * ReadPageInternal() is responsible for validating the page header.

The point here is "retry this page, not this record", so "we don't need
to retry immediately" looks a bit ambiguous. So how about something
like this?

Note that we don't do this while not in standby mode because we don't
need to avoid retrying this entire record even if the page header is
not valid. Instead, ReadPageInternal() is responsible for validating
the page header in that case.

Everything else looks fine to me.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: walsender timeout on logical replication set
Next
From: Amit Kapila
Date:
Subject: Re: Added missing invalidations for all tables publication