Re: corrupt pages detected by enabling checksums - Mailing list pgsql-hackers

From Jim Nasby
Subject Re: corrupt pages detected by enabling checksums
Date
Msg-id 518FFE23.8070102@nasby.net
Whole thread Raw
In response to Re: corrupt pages detected by enabling checksums  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On 5/9/13 5:18 PM, Jeff Davis wrote:
> On Thu, 2013-05-09 at 14:28 -0500, Jim Nasby wrote:
>> What about moving some critical data from the beginning of the WAL
>> record to the end? That would make it easier to detect that we don't
>> have a complete record. It wouldn't necessarily replace the CRC
>> though, so maybe that's not good enough.
>>
>> Actually, what if we actually *duplicated* some of the same WAL header
>> info at the end of the record? Given a reasonable amount of data that
>> would damn-near ensure that a torn record was detected, because the
>> odds of having the exact same sequence of random bytes would be so
>> low. Potentially even just duplicating the LSN would suffice.
>
> I think both of these ideas have some false positives and false
> negatives.
>
> If the corruption happens at the record boundary, and wipes out the
> special information at the end of the record, then you might think it
> was not fully flushed, and we're in the same position as today.
>
> If the WAL record is large, and somehow the beginning and the end get
> written to disk but not the middle, then it will look like corruption;
> but really the WAL was just not completely flushed. This seems pretty
> unlikely, but not impossible.
>
> That being said, I like the idea of introducing some extra checks if a
> perfect solution is not possible.

Yeah, I don't think a perfect solution is possible, short of attempting to tie directly into the filesystem (ie: on a
journalingFS have some way to essentially treat the FS journal as WAL).
 

One additional step we might be able to take would be to scan forward looking for a record that would tell us when an
fsyncmust have occurred (heck, maybe we should add an fsync WAL record...). If we find a corrupt WAL record followed by
anfsync we know that we've now lost data. That closes some of the holes. Actually, that might handle all the holes...
 

>> On the separate write idea, if that could be controlled by a GUC I
>> think it'd be worth doing. Anyone that needs to worry about this
>> corner case probably has hardware that would support that.
>
> It sounds pretty easy to do that naively. I'm just worried that the
> performance will be so bad for so many users that it's not a very
> reasonable choice.
>
> Today, it would probably make more sense to just use sync rep. If the
> master's WAL is corrupt, and it starts up too early, then that should be
> obvious when you try to reconnect streaming replication. I haven't tried
> it, but I'm assuming that it gives a useful error message.

I wonder if there are DW environments that are too large to keep a SR copy but would be able to afford the double-write
overhead.

BTW, isn't performance what killed the double-buffer idea?
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: Logging of PAM Authentication Failure
Next
From: Jim Nasby
Date:
Subject: Re: corrupt pages detected by enabling checksums