Re: [HACKERS] Checksums by default? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Checksums by default?
Date
Msg-id CA+TgmoacvAUTv5TTEHOgQ_XfU1WxMa2dnDLPVTNgMgwg7B9Jzw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Checksums by default?  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Responses Re: [HACKERS] Checksums by default?  (Peter Geoghegan <pg@heroku.com>)
Re: [HACKERS] Checksums by default?  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Wed, Jan 25, 2017 at 12:02 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> I'm not completely grokking your second paragraph, but I would think that an
> average user would love got get a heads-up that their hardware is failing.

Sure.  If the database runs fast enough with checksums enabled,
there's basically no reason to have them turned off.  The issue is
when it doesn't.

Also, it's not as if there are no other ways of checking whether your
disks are failing.  SMART, for example, is supposed to tell you about
incipient hardware failures before PostgreSQL ever sees a bit flip.
Surely an average user would love to get a heads-up that their
hardware is failing even when that hardware is not being used to power
PostgreSQL, yet many people don't bother to configure SMART (or
similar proprietary systems provided by individual vendors).

Trying to force those people to use checksums is just masterminding;
they've made their own decision that it's not worth bothering with.
When something goes wrong, WE still care about distinguishing hardware
failure from PostgreSQL failure.   Our pride is on the line.  But the
customer often doesn't.  The DBA isn't the same person as the
operating system guy, and the operating system guy isn't going to
listen to the DBA even if the DBA complains of checksum failures.  Or
the customer has 100 things on the same piece of hardware and
PostgreSQL is the only one that failed; or alternatively they all
failed around the same time; either way the culprit is obvious.  Or
the remedy is to restore from backup[1] whether the problem is
hardware or software and regardless of whose software is to blame.  Or
their storage cost a million dollars and is a year old and they simply
won't believe that it's failing.  Or their storage cost a hundred
dollars and is 8 years old and they're looking for an excuse to
replace it whether it's responsible for the problem du jour or not.

I think it's great that we have a checksum feature and I think it's
great for people who want to use it and are willing to pay the cost of
it to turn it on.  I don't accept the argument that all of our users,
or even most of them, fall into that category.  I also think it's
disappointing that there's such a vigorous argument for changing the
default when so little follow-on development has gone into this
feature.  If we had put any real effort into making this easier to
turn on and off, for example, the default value would be less
important, because people could change it more easily.  But nobody's
making that effort.  I suggest that the people who think this a
super-high-value feature should be willing to put some real work into
improving it instead of trying to force it down everybody's throat
as-is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] Alternatively, sometimes the remedy is to wish the had a usable
backup while frantically running pg_resetxlog.



pgsql-hackers by date:

Previous
From: David Fetter
Date:
Subject: Re: [HACKERS] COPY as a set returning function
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Declarative partitioning vs. information_schema