Re: Enabling Checksums - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: Enabling Checksums |
Date | |
Msg-id | 1363713488.2369.55.camel@jdavis-laptop Whole thread Raw |
In response to | Re: Enabling Checksums (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Enabling Checksums
Re: Enabling Checksums |
List | pgsql-hackers |
On Sat, 2013-03-16 at 20:41 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > On 15 March 2013 13:08, Andres Freund <andres@2ndquadrant.com> wrote: > >> I commented on this before, I personally think this property makes fletcher a > >> not so good fit for this. Its not uncommon for parts of a block being all-zero > >> and many disk corruptions actually change whole runs of bytes. [ referring to Ants's comment that the existing algorithm doesn't distinguish between 0x00 and 0xFF ] > Meh. I don't think that argument holds a lot of water. The point of > having checksums is not so much to notice corruption as to be able to > point the finger at flaky hardware. If we have an 8K page with only > 1K of data in it, and we fail to notice that the hardware dropped a lot > of bits in the other 7K, we're not doing our job; and that's not really > something to write off, because it would be a lot better if we complain > *before* the hardware manages to corrupt something valuable. I will move back to verifying the page hole, as well. There are a few approaches: 1. Verify that the page hole is zero before write and after read. 2. Include it in the calculation (if we think there are some corner cases where the hole might not be all zero). 3. Zero the page hole before write, and verify that it's zero on read. This can be done during the memcpy at no performance penalty in PageSetChecksumOnCopy(), but that won't work for PageSetChecksumInplace(). With option #2 or #3, we might also verify that the hole is all-zero if asserts are enabled. > So I think we'd be best off to pick an algorithm whose failure modes > don't line up so nicely with probable hardware failure modes. It's > worth noting that one of the reasons that CRCs are so popular is > precisely that they were designed to detect burst errors with high > probability. Another option is to use a different modulus. The page http://en.wikipedia.org/wiki/Fletcher%27s_checksum suggests that a prime number can be a good modulus for Fletcher-32. Perhaps we could use 251 instead of 255? That would make it less likely to miss a common form of hardware failure, although it would also reduce the number of possible checksums slightly (about 4% fewer than 2^16). I'm leaning toward this option now, or a CRC of some kind if the performance is reasonable. Regards,Jeff Davis
pgsql-hackers by date: