On Sat, Jan 13, 2018 at 03:40:01PM +0000, Simon Riggs wrote: > The new two byte value is protected by CRC. The 2 byte value repeats > every 32768 WAL files. Any bit error in that value that made it appear > to be a current value would need to have a rare set of circumstances.
If you use the two lower bytes of the segment number, then this gets repeated every 64k segments, no? In terms of space this represents 500GB worth of WAL segments with a default segment size. Hence the more PostgreSQL scales, the more there is a risk of collision, and I am pretty sure that there are already deployments around allocating hundreds of gigs worth of space for the WAL partition. There are no problems of this class if using the 8-byte field xl_prev. It seems to me that we don't want to take any risks here.
IMHO we're missing a point here. When xl_prev is changed to a 2-byte value (as the patch suggests), the risk only comes when an old WAL file is recycled for some future WAL file and the old and the future WAL file's segment numbers share the same low order 16-bits. Now as the patch stands, we take precaution to never reuse a WAL file with conflicting low order bits.
This particular piece of the patch deals with that:
For example, old WAL file 000000010000000000000001 will NEVER be reused as 000000010000000000010001. So even if there are any intact old WAL records in the recycled file, they will be detected correctly during replay since the SegID stored in the WAL record will not match the SegID as derived from the WAL file segment number. The replay code does this for every WAL record it sees.
There were some concerns about bit-flipping, which may inadvertently SegID stored in the carried over WAL record so that it now matches with the current WAL files' SegID, but TBH I don't see why we can't trust CRC to detect that. Because if we can't, then there would be other problems during WAL replay.