On Fri, 2013-04-05 at 10:34 +0200, Florian Pflug wrote:
> Maybe we could scan forward to check whether a corrupted WAL record is
> followed by one or more valid ones with sensible LSNs. If it is,
> chances are high that we haven't actually hit the end of the WAL. In
> that case, we could either log a warning, or (better, probably) abort
> crash recovery.
+1.
> Corruption of fields which we require to scan past the record would
> cause false negatives, i.e. no trigger an error even though we do
> abort recovery mid-way through. There's a risk of false positives too,
> but they require quite specific orderings of writes and thus seem
> rather unlikely. (AFAICS, the OS would have to write some parts of
> record N followed by the whole of record N+1 and then crash to cause a
> false positive).
Does the xlp_pageaddr help solve this?
Also, we'd need to be a little careful when written-but-not-flushed WAL
data makes it to disk, which could cause a false positive and may be a
fairly common case.
Regards,Jeff Davis