Re: [HACKERS] Checksums by default? - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: [HACKERS] Checksums by default?
Date
Msg-id CAM3SWZQuySEU6VaTgZV8sDmu4ZLvnFA6_anR8VwGsuzC+7y91w@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Checksums by default?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Checksums by default?  (Peter Geoghegan <pg@heroku.com>)
List pgsql-hackers
On Wed, Jan 25, 2017 at 12:23 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Also, I think that one of the big problems with the way checksums work
> is that you don't find problems with your archived data until it's too
> late.  Suppose that in February bits get flipped in a block.  You
> don't access the data until July[1].  Well, it's nice to have the
> system tell you that the data is corrupted, but what are you going to
> do about it?  By that point, all of your backups are probably
> corrupted.  So it's basically:
>
> ERROR: you're screwed
>
> It's nice to know that (maybe?) but without a recovery strategy a
> whole lot of people who get that message are going to immediately
> start asking "How do I ignore the fact that I'm screwed and try to
> read the data anyway?".

That's also how I tend to think about it.

I understand that my experience with storage devices is unusually
narrow compared to everyone else here. That's why I remain neutral on
the high level question of whether or not we ought to enable checksums
by default. I'll ask other hackers to answer what may seem like a very
naive question, while bearing what I just said in mind. The question
is: Have you ever actually seen a checksum failure in production? And,
if so, how helpful was it?

I myself have not, despite the fact that Heroku uses checksums
wherever possible, and has the technical means to detect problems like
this across the entire fleet of customer databases. Not even once.
This is not what I would have expected myself several years ago.

-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Proposal : For Auto-Prewarm.
Next
From: Tobias Oberstein
Date:
Subject: Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..