On Fri, Feb 02, 2018 at 12:21:49AM +0000, Simon Riggs wrote:
> Yes, it would be about 99% of the time.
When it comes to recovery, I don't think that 99% is a guarantee
sufficient. (Wondering about the maths behind such a number as well.)
> But you have it backwards - we are not assuming that case. That is the
> only case that has risk - the one where an old WAL record starts at
> exactly the place the latest one stops. Otherwise the rest of the WAL
> record will certainly fail the CRC check, since it will effectively
> have random data in it, as you say.
Your patch assumes that a single WAL segment recycling is fine to
escape based on the file name, but you need to think beyond that. It
seems to me that your assumption is wrong if the tail of a segment gets
reused after more cycles than a single one, which could happen when
doing recovery from an archive, where segments used could have junk in
them. So you actually *increase* the odds of problems if a segment is
forcibly switched and archived, then reused in recovery after being
fetched from an archive.
Since 9.4, the tail of WAL segments is filled with zeros so this brings
more confidence, but I think that we cannot exchange the existing
reliability with performance. So like others have expressed on
this thread, the approach taken does not sound good to me, because we
increase the risks of junk WAL records being used during recovery, just
for the sake of performance. An approach that could be better is to
replace XLogRecord->xl_prev by a new field in XLogPageHeaderData which
tracks XLogRecPtr for the previous page, say xlp_prevpageaddr, and use
that for all sanity checks that xlogreader.c is doing.
If I can count with my fingers correctly, SizeOfXLogLongPHD is 40 bytes
long, and SizeOfXLogShortPHD is 24 bytes, so even with an increase of 8
bytes for the page header you gain space as one record is usually much
likely less than a full xlog page if you remove xl_prev.
--
Michael