Re: Enabling Checksums - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Enabling Checksums
Date
Msg-id CA+U5nMJezD73T7YRon=k1Gq1drbnuRMXNWAQJ=hxaOwOb8_Kpw@mail.gmail.com
Whole thread Raw
In response to Re: Enabling Checksums  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On 8 March 2013 03:31, Bruce Momjian <bruce@momjian.us> wrote:

> I also see the checksum patch is taking a beating.  I wanted to step
> back and ask what percentage of known corruptions cases will this
> checksum patch detect?  What percentage of these corruptions would
> filesystem checksums have detected?
>
> Also, don't all modern storage drives have built-in checksums, and
> report problems to the system administrator?  Does smartctl help report
> storage corruption?
>
> Let me take a guess at answering this --- we have several layers in a
> database server:
>
>         1 storage
>         2 storage controller
>         3 file system
>         4 RAM
>         5 CPU
>
> My guess is that storage checksums only cover layer 1, while our patch
> covers layers 1-3, and probably not 4-5 because we only compute the
> checksum on write.
>
> If that is correct, the open question is what percentage of corruption
> happens in layers 1-3?

Yes, checksums patch is taking a beating, and so it should. If we find
a reason to reject, we should.

CPU and RAM error checking are pretty standard now. Storage isn't
necessarily the same. The figures we had from the Google paper early
in development showed it was worth checksumming storage, but not
memory. I did originally argue for memory also, but there was
insufficient evidence of utility.

At the moment, we only reject blocks if the header is damaged. That
covers basic sanity checks on about 10 bytes near the start of every
block. Given that some errors might still be allowed through, lets say
that covers just 8 bytes of the block. Checksums cover the whole block
and detect most errors, >99.999%. Which means that we will detect
errors on 8192 bytes of the block. Which means that checksums are
approximately 1000 times better at spotting corruption than not using
them. Or put it another way, if you don't use checksums, by the time
you see a single corrupt block header you will on average have lost
about 500 blocks/4MB of user data. That doesn't sound too bad, but if
your database has been giving wrong answers during the period those
blocks went bad, you could be looking at a significant number of
reads/writes gone bad, since updates would spread corruption to other
rows and data would be retrieved incorrectly over a long period.

I agree with Robert's comments. This isn't a brilliant design, its a
brilliant stop-gap until we get a better design. However, that is a
whole chunk of work away, with pg_upgrade handling on-disk page
rewrites, plus some as yet undecided redesign of the way hint bits
work. It's a long way off.

There are performance wrinkles also, no question. For some
applications, not losing data is worth the hit.

Given the patch offers choice to users, I think its acceptable to look
towards committing it.

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Ever seen transient garbage results from DELETE RETURNING?
Next
From: Greg Smith
Date:
Subject: Re: Btrfs clone WIP patch