Re: Infinite loop in XLogPageRead() on standby - Mailing list pgsql-hackers

From Ants Aasma
Subject Re: Infinite loop in XLogPageRead() on standby
Date
Msg-id CANwKhkM_DzL2cMtDXN-qTuMSyx9Pyo4tBADFPJTVN65jkPoGUQ@mail.gmail.com
Whole thread Raw
In response to Re: Infinite loop in XLogPageRead() on standby  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: Infinite loop in XLogPageRead() on standby
List pgsql-hackers
On Wed, 13 Mar 2024 at 04:56, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
>
> At Mon, 11 Mar 2024 16:43:32 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
> > Oh, I once saw the fix work, but seems not to be working after some
> > point. The new issue was a corruption of received WAL records on the
> > first standby, and it may be related to the setting.
>
> I identified the cause of the second issue. When I tried to replay the
> issue, the second standby accidentally received the old timeline's
> last page-spanning record till the end while the first standby was
> promoting (but it had not been read by recovery). In addition to that,
> on the second standby, there's a time window where the timeline
> increased but the first segment of the new timeline is not available
> yet. In this case, the second standby successfully reads the
> page-spanning record in the old timeline even after the second standby
> noticed that the timeline ID has been increased, thanks to the
> robustness of XLogFileReadAnyTLI().
>
> I think the primary change to XLogPageRead that I suggested is correct
> (assuming the use of wal_segment_size instead of the
> constant). However, still XLogFileReadAnyTLI() has a chance to read
> the segment from the old timeline after the second standby notices a
> timeline switch, leading to the second issue. The second issue was
> fixed by preventing XLogFileReadAnyTLI from reading segments from
> older timelines than those suggested by the latest timeline
> history. (In other words, disabling the "AnyTLI" part).
>
> I recall that there was a discussion for commit 4bd0ad9e44, about the
> objective of allowing reading segments from older timelines than the
> timeline history suggests. In my faint memory, we concluded to
> postpone making the decision to remove the feature due to uncertainity
> about the objective. If there's no clear reason to continue using
> XLogFileReadAnyTLI(), I suggest we stop its use and instead adopt
> XLogFileReadOnTLHistory(), which reads segments that align precisely
> with the timeline history.


This sounds very similar to the problem described in [1]. And I think
both will be resolved by that change.

[1] https://postgr.es/m/CANwKhkMN3QwAcvuDZHb6wsvLRtkweBiYso-KLFykkQVWuQLcOw%40mail.gmail.com



pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: remaining sql/json patches
Next
From: Teodor Sigaev
Date:
Subject: Re: type cache cleanup improvements