Home > mailing lists

Re: corrupt pages detected by enabling checksums - Mailing list pgsql-hackers

From	Simon Riggs
Subject	Re: corrupt pages detected by enabling checksums
Date	May 10, 2013 06:44:35
Msg-id	CA+U5nMKOw0WB7r9XQecFToRmnERFQ+FbnaXYRPOo=gfPeyX31Q@mail.gmail.com Whole thread
In response to	Re: corrupt pages detected by enabling checksums (Greg Stark <stark@mit.edu>)
Responses	Re: corrupt pages detected by enabling checksums
List	pgsql-hackers

Tree view

On 9 May 2013 23:13, Greg Stark <stark@mit.edu> wrote:
> On Thu, May 9, 2013 at 10:45 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 9 May 2013 22:39, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Simon Riggs <simon@2ndQuadrant.com> writes:
>>>> If the current WAL record is corrupt and the next WAL record is in
>>>> every way valid, we can potentially continue.
>>>
>>> That seems like a seriously bad idea.
>>
>> I agree. But if you knew that were true, is stopping a better idea?
>
> Having one corrupt record followed by a valid record is not an
> abnormal situation. It could easily be the correct end of WAL.

I disagree, that *is* an abnormal situation and would not be the
"correct end-of-WAL".

Each WAL record contains a "prev" pointer to the last WAL record. So
for the next record to be valid the prev pointer would need to be
exactly correct.

> However it is possible to reduce the window. Every time the
> transaction log is synced a different file can be updated with the a
> known minimum transaction log recovery point. Even if it's not synced
> consistently on every transaction commit or wal sync it would serve as
> a low water mark. Recovering to that point is not sufficient but is
> necessary for a consistent recovery. That file could be synced lazily,
> say, every 10s or something like that and would guarantee that any wal
> corruption would be caught except for the last 10s of wal traffic for
> example.

I think it would be easy enough to have the WALwriter update the
minRecoveryPoint once per cycle, after it has flushed WAL.

Given the importance of pg_control and its small size, it seems like
it would be a good idea to take a backup copy of it every checkpoint
to make sure we have that data safe. And have pg_resetxlog keep a copy
also every time that is run.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Bruce Momjian
Date: 10 May 2013, 02:14:42
Subject: Re: Re: [GENERAL] pg_upgrade fails, "mismatch of relation OID" - 9.1.9 to 9.2.4

From: Dave Page
Date: 10 May 2013, 06:46:44
Subject: Re: improving PL/Python builds on OS X

Re: corrupt pages detected by enabling checksums - Mailing list pgsql-hackers

Previous

Next