At Mon, 17 Jul 2023 15:20:30 +0300, Aleksander Alekseev <aleksander@timescale.com> wrote in
> Thanks for working on this, it bugged me for a while. I noticed that
> cfbot is not happy with the patch so I rebased it.
> postgresql:pg_waldump test suite didn't pass after the rebase. I fixed
> it too. Other than that the patch LGTM so I'm not changing its status
> from "Ready for Committer".
Thanks for the rebasing.
> It looks like the patch was moved between the commitfests since
> 2020... If there is anything that may help merging it into PG17 please
> let me know.
This might be just too-much or there might be some doubt in this..
This change basically makes a zero-length record be considered as the
normal end of WAL.
The most controvorsial point I think in the design is the criteria for
an error condition. The assumption is that the WAL is sound if all
bytes following a complete record, up to the next page boundary, are
zeroed out. This is slightly narrower than the original criteria,
merely checking the next record is zero-length. Naturally, there
might be instances where that page has been blown out due to device
failure or some other reasons. Despite this, I believe it is
preferable rather than always issuing a warning (in the LOG level,
though) about a potential WAL corruption.
I've adjusted the condition for muting repeated log messages at the
same LSN, changing it from ==LOG to <=WARNING. This is simply a
consequence of following the change of "real" warnings from LOG to
WARNING. I believe this is acceptable even without considering
aforementioned change, as any single retriable (<ERROR) error at an
LSN should be sufficient to alert users about potential issues.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center