Re: corrupt pages detected by enabling checksums - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: corrupt pages detected by enabling checksums
Date
Msg-id CA+U5nMLCQcfzjZ1vXG6tEma7VCK7XLcz+rb-c9W=2jjLW_mEQg@mail.gmail.com
Whole thread Raw
In response to Re: corrupt pages detected by enabling checksums  (Jim Nasby <jim@nasby.net>)
Responses Re: corrupt pages detected by enabling checksums  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 9 May 2013 20:28, Jim Nasby <jim@nasby.net> wrote:

>> Unfortunately, it seems that doing any kind of validation to determine
>> that we have a valid end-of-the-WAL inherently requires some kind of
>> separate durable write somewhere. It would be a tiny amount of data (an
>> LSN and maybe some extra crosscheck information), so I could imagine
>> that would be just fine given the right hardware; but if we just write
>> to disk that would be pretty bad. Ideas welcome.

Not so sure.

If the WAL record length is intact, and it probably is, then we can
test whether the next WAL record is valid also.

If the current WAL record is corrupt and the next WAL record is
corrupt, then we have a problem.

If the current WAL record is corrupt and the next WAL record is in
every way valid, we can potentially continue. But we need to keep
track of accumulated errors to avoid getting into a worse situation.
Obviously, we would need to treat the next WAL record with complete
scepticism, but I have seen cases where only a single WAL record was
corrupt.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Re: [GENERAL] pg_upgrade fails, "mismatch of relation OID" - 9.1.9 to 9.2.4
Next
From: Bruce Momjian
Date:
Subject: Re: Re: [GENERAL] pg_upgrade fails, "mismatch of relation OID" - 9.1.9 to 9.2.4