Re: Enabling Checksums - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Enabling Checksums
Date
Msg-id CA+U5nMLwTzbL=rCF5UMatjA9529StyOosL+c3KcSAde6bW_GRQ@mail.gmail.com
Whole thread Raw
In response to Re: Enabling Checksums  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Enabling Checksums  (Greg Smith <greg@2ndQuadrant.com>)
List pgsql-hackers
On 17 March 2013 00:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> On 15 March 2013 13:08, Andres Freund <andres@2ndquadrant.com> wrote:
>>> I commented on this before, I personally think this property makes fletcher a
>>> not so good fit for this. Its not uncommon for parts of a block being all-zero
>>> and many disk corruptions actually change whole runs of bytes.
>
>> I think you're right to pick up on this point, and Ants has done a
>> great job of explaining the issue more clearly.
>
>> My perspective, after some thought, is that this doesn't matter to the
>> overall effectiveness of this feature.
>
>> PG blocks do have large runs of 0x00 in them, though that is in the
>> hole in the centre of the block. If we don't detect problems there,
>> its not such a big deal. Most other data we store doesn't consist of
>> large runs of 0x00 or 0xFF as data. Most data is more complex than
>> that, so any runs of 0s or 1s written to the block will be detected.
>
> Meh.  I don't think that argument holds a lot of water.  The point of
> having checksums is not so much to notice corruption as to be able to
> point the finger at flaky hardware.  If we have an 8K page with only
> 1K of data in it, and we fail to notice that the hardware dropped a lot
> of bits in the other 7K, we're not doing our job; and that's not really
> something to write off, because it would be a lot better if we complain
> *before* the hardware manages to corrupt something valuable.
>
> So I think we'd be best off to pick an algorithm whose failure modes
> don't line up so nicely with probable hardware failure modes.  It's
> worth noting that one of the reasons that CRCs are so popular is
> precisely that they were designed to detect burst errors with high
> probability.

I think that's a reasonable refutation of my argument, so I will
relent, especially since nobody's +1'd me.


>> What I think we could do here is to allow people to set their checksum
>> algorithm with a plugin.
>
> Please, no.  What happens when their plugin goes missing?  Or they
> install the wrong one on their multi-terabyte database?  This feature is
> already on the hairy edge of being impossible to manage; we do *not*
> need to add still more complication.

Agreed. (And thanks for saying please!)

So I'm now moving towards commit using a CRC algorithm. I'll put in a
feature to allow algorithm be selected at initdb time, though that is
mainly a convenience  to allow us to more easily do further testing on
speedups and whether there are any platform specific regressions
there.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Boszormenyi Zoltan
Date:
Subject: Re: Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]
Next
From: Simon Riggs
Date:
Subject: Re: Enabling Checksums