Re: Block-level CRC checks - Mailing list pgsql-hackers

From Decibel!
Subject Re: Block-level CRC checks
Date
Msg-id 2220D58D-DD72-4F17-8DF2-E2ACD25CA774@decibel.org
Whole thread Raw
In response to Re: Block-level CRC checks  (pgsql@mohawksoft.com)
Responses Re: Block-level CRC checks  (pgsql@mohawksoft.com)
List pgsql-hackers
On Sep 30, 2008, at 2:17 PM, pgsql@mohawksoft.com wrote:
>> A customer of ours has been having trouble with corrupted data for  
>> some
>> time.  Of course, we've almost always blamed hardware (and we've seen
>> RAID controllers have their firmware upgraded, among other  
>> actions), but
>> the useful thing to know is when corruption has happened, and where.
>
> That is an important statement, to know when it happens not  
> necessarily to
> be able to recover the block or where in the block it is corrupt.  
> Is that
> correct?

Oh, correcting the corruption would be AWESOME beyond belief! But at  
this point I'd settle for just knowing it had happened.

>> So we've been tasked with adding CRCs to data files.
>
> CRC or checksum? If the objective is merely general "detection" there
> should be some latitude in choosing the methodology for performance.

See above. Perhaps the best win would be a case where you could  
choose which method you wanted. We generally have extra CPU on the  
servers, so we could afford to burn some cycles with more complex  
algorithms.

>> The idea is that these CRCs are going to be checked just after  
>> reading
>> files from disk, and calculated just before writing it.  They are
>> just a protection against the storage layer going mad; they are not
>> intended to protect against faulty RAM, CPU or kernel.
>
> It will actually find faults in all if it. If the CPU can't add and/ 
> or a
> RAM location lost a bit, this will blow up just as easily as a bad  
> block.
> It may cause "false identification" of an error, but it will keep a  
> bad
> system from hiding.

Well, very likely not, since the intention is to only compute the CRC  
when we write the block out, at least for now. In the future I would  
like to be able to detect when a CPU or memory goes bonkers and poops  
on something, because that's actually happened to us as well.

>> The implementation I'm envisioning requires the use of a new relation
>> fork to store the per-block CRCs.  Initially I'm aiming at a CRC32  
>> sum
>> for each block.  FlushBuffer would calculate the checksum and  
>> store it
>> in the CRC fork; ReadBuffer_common would read the page, calculate the
>> checksum, and compare it to the one stored in the CRC fork.
>
> Hell, all that is needed is a long or a short checksum value in the  
> block.
> I mean, if you just want a sanity test, it doesn't take much. Using a
> second relation creates confusion. If there is a CRC discrepancy  
> between
> two different blocks, who's wrong? You need a third "control" to  
> know. If
> the block knows its CRC or checksum and that is in error, the block is
> bad.

I believe the idea was to make this as non-invasive as possible. And  
it would be really nice if this could be enabled without a dump/ 
reload (maybe the upgrade stuff would make this possible?)
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



pgsql-hackers by date:

Previous
From: "Jeffrey Baker"
Date:
Subject: Re: Block-level CRC checks
Next
From: Joshua Drake
Date:
Subject: Re: Block-level CRC checks