Re: corrupt pages detected by enabling checksums - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: corrupt pages detected by enabling checksums
Date
Msg-id CA+U5nMK0rhw-zmcEUftaF8nir+xhXn-BMiWO47jyA++EtRTXBA@mail.gmail.com
Whole thread Raw
In response to Re: corrupt pages detected by enabling checksums  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 10 May 2013 18:23, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Greg Stark <stark@mit.edu> writes:
>> A single WAL record can be over 24kB.
>
> <pedantic>
> Actually, WAL records can run to megabytes.  Consider for example a
> commit record for a transaction that dropped thousands of tables ---
> there'll be info about each such table in the commit record, to cue
> replay to remove those files.
> </pedantic>
>
>> If you replayed the following record but not this record you would
>> have an inconsistent database. ...
>> Or it could be an index insert for that tuple would would result in a
>> physically inconsistent database with index pointers that point to
>> incorrect tuples. Index scans would return tuples that didn't match
>> the index  or would miss tuples that should be returned.
>
> Skipping actions such as index page splits would lead to even more fun.
>
> Even in simple cases such as successive inserts and deletions in the
> same heap page, failing to replay some of the actions is going to be
> disastrous.  The *best case* scenario for that is that WAL replay
> PANICs when it notices that the action it's trying to replay is
> inconsistent with the current state of the page, eg it's trying to
> insert at a TID that already exists.

Agreed

> IMO we can't proceed past a broken WAL record.  The actually useful
> suggestion upthread was that we try to notice whether there seem
> to be valid WAL records past the broken one, so that we could warn
> the DBA that some commits might have been lost.  I don't think we
> can do much in the way of automatic data recovery, but we could give
> the DBA a chance to do forensics rather than blindly starting up (and
> promptly overwriting all the evidence).

The usual answer is switchover to the standby, hoping that the WAL was
not corrupted before it got sent there also.

If all servers go down people will want a "carry on regardless" option
as well, since the fault can be investigated on a standby. Even with
all of the above caveats,  lying on our backs with our feet in the air
'cos we lost a few blocks will not impress anyone. Doing that will
likely be a medium-high risk thing, but that will still be better than
the certainty of a down server. It would need to be a manually iniated
option.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: corrupt pages detected by enabling checksums
Next
From: Jeff Janes
Date:
Subject: Re: corrupt pages detected by enabling checksums