Re: Enabling Checksums - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Enabling Checksums |
Date | |
Msg-id | CA+U5nMLwTzbL=rCF5UMatjA9529StyOosL+c3KcSAde6bW_GRQ@mail.gmail.com Whole thread Raw |
In response to | Re: Enabling Checksums (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Enabling Checksums
(Greg Smith <greg@2ndQuadrant.com>)
|
List | pgsql-hackers |
On 17 March 2013 00:41, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> On 15 March 2013 13:08, Andres Freund <andres@2ndquadrant.com> wrote: >>> I commented on this before, I personally think this property makes fletcher a >>> not so good fit for this. Its not uncommon for parts of a block being all-zero >>> and many disk corruptions actually change whole runs of bytes. > >> I think you're right to pick up on this point, and Ants has done a >> great job of explaining the issue more clearly. > >> My perspective, after some thought, is that this doesn't matter to the >> overall effectiveness of this feature. > >> PG blocks do have large runs of 0x00 in them, though that is in the >> hole in the centre of the block. If we don't detect problems there, >> its not such a big deal. Most other data we store doesn't consist of >> large runs of 0x00 or 0xFF as data. Most data is more complex than >> that, so any runs of 0s or 1s written to the block will be detected. > > Meh. I don't think that argument holds a lot of water. The point of > having checksums is not so much to notice corruption as to be able to > point the finger at flaky hardware. If we have an 8K page with only > 1K of data in it, and we fail to notice that the hardware dropped a lot > of bits in the other 7K, we're not doing our job; and that's not really > something to write off, because it would be a lot better if we complain > *before* the hardware manages to corrupt something valuable. > > So I think we'd be best off to pick an algorithm whose failure modes > don't line up so nicely with probable hardware failure modes. It's > worth noting that one of the reasons that CRCs are so popular is > precisely that they were designed to detect burst errors with high > probability. I think that's a reasonable refutation of my argument, so I will relent, especially since nobody's +1'd me. >> What I think we could do here is to allow people to set their checksum >> algorithm with a plugin. > > Please, no. What happens when their plugin goes missing? Or they > install the wrong one on their multi-terabyte database? This feature is > already on the hairy edge of being impossible to manage; we do *not* > need to add still more complication. Agreed. (And thanks for saying please!) So I'm now moving towards commit using a CRC algorithm. I'll put in a feature to allow algorithm be selected at initdb time, though that is mainly a convenience to allow us to more easily do further testing on speedups and whether there are any platform specific regressions there. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: