Re: [HACKERS] Checksums by default? - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: [HACKERS] Checksums by default?
Date
Msg-id 20170124014736.GJ18360@tamriel.snowman.net
Whole thread Raw
In response to Re: [HACKERS] Checksums by default?  (Peter Geoghegan <pg@heroku.com>)
Responses Re: [HACKERS] Checksums by default?  (Peter Geoghegan <pg@heroku.com>)
Re: [HACKERS] Checksums by default?  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
List pgsql-hackers
* Peter Geoghegan (pg@heroku.com) wrote:
> On Mon, Jan 23, 2017 at 5:26 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > Not sure how this part of that sentence was missed:
> >
> > -----
> > ... even though they were enabled as soon as the feature became
> > available.
> > -----
> >
> > Which would seem to me to say "the code's been running for a long time
> > on a *lot* of systems without throwing a false positive or surfacing a
> > bug."
>
> I think you've both understood what I said correctly. Note that I
> remain neutral on the question of whether or not checksums should be
> enabled by default.
>
> Perhaps I've missed the point entirely, but, I have to ask: How could
> there ever be false positives? With checksums, false positives are
> simply not allowed. Therefore, there cannot be a false positive,
> unless we define checksums as a mechanism that should only find
> problems that originate somewhere at or below the filesystem. We
> clearly have not done that, so ISTM that checksums could legitimately
> find bugs in the checksum code. I am not being facetious.

I'm not sure I'm following your question here.  A false positive would
be a case where the checksum code throws an error on a page whose
checksum is correct, or where the checksum has failed but nothing is
actually wrong/different on the page.

As for the purpose of checksums, it's exactly to identify cases where
the page has been changed since we wrote it out, due to corruption in
the kernel, filesystem, storage system, etc.  As we only check them when
we read in a page and calculate them when we go to write the page out,
they aren't helpful for shared_buffers corruption, generally speaking.

It might be interesting to consider checking them in 'clean' pages in
shared_buffers in a background process, as that, presumably, *would*
detect shared buffers corruption.

Thanks!

Stephen

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Checksums by default?
Next
From: Petr Jelinek
Date:
Subject: Re: [HACKERS] Checksums by default?