Re: [HACKERS] Checksums by default? - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: [HACKERS] Checksums by default?
Date
Msg-id 20170125233024.GW9812@tamriel.snowman.net
Whole thread Raw
In response to Re: [HACKERS] Checksums by default?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Checksums by default?  (Peter Geoghegan <pg@heroku.com>)
Re: [HACKERS] Checksums by default?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Robert,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Wed, Jan 25, 2017 at 2:23 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > Yet, our default is to have them disabled and *really* hard to enable.
>
> First of all, that could be fixed by further development.

I'm certainly all for doing so, but I don't agree that it necessairly is
required before we flip the default.  That said, if the way to get
checksums enabled by default is providing a relativly easy way to turn
them off, then that's something which I'll do what I can to help work
towards.  In other words, I'm not going to continue to argue, given the
various opinions of the group, that we should just flip it tomorrow.

I hope to discuss it further after we have the ability to turn it off
easily.

> Second, really hard to enable is a relative term.  I accept that
> enabling checksums is not a pleasant process.  Right now, you'd have
> to do a dump/restore, or use logical replication to replicate the data
> to a new cluster and then switch over.  On the other hand, if
> checksums are really a critical feature, how are people getting to the
> point where they've got a mission-critical production system and only
> then discovering that they want to enable checksums?

I truely do wish everyone would come talk to me before building out a
database.  Perhaps that's been your experience, in which case, I envy
you, but I tend to get a reaction more along the lines of "wait, what do
you mean I had to pass some option to initdb to enable checksum?!?!".
The fact that we've got a WAL implementation and clearly understand
fsync requirements, why full page writes make sense, and that our WAL
has its own CRCs which isn't possible to disable, tends to lead people
to think we really know what we're doing and that we care a lot about
their data.

> > I agree that it's unfortunate that we haven't put more effort into
> > fixing that- I'm all for it, but it's disappointing to see that people
> > are not in favor of changing the default as I believe it would both help
> > our users and encourage more development of the feature.
>
> I think it would help some users and hurt others.  I do agree that it
> would encourage more development of the feature -- almost of
> necessity.  In particular, I bet it would spur development of an
> efficient way of turning checksums off -- but I'd rather see us
> approach it from the other direction: let's develop an efficient way
> of turning the feature on and off FIRST.  Deciding that the feature
> has to be on for everyone because turning it on later is too hard for
> the people who later decide they want it is letting the tail wag the
> dog.

As I have said, I don't believe it has to be on for everyone.

> Also, I think that one of the big problems with the way checksums work
> is that you don't find problems with your archived data until it's too
> late.  Suppose that in February bits get flipped in a block.  You
> don't access the data until July[1].  Well, it's nice to have the
> system tell you that the data is corrupted, but what are you going to
> do about it?  By that point, all of your backups are probably
> corrupted.  So it's basically:

If your backup system is checking the checksums when backing up PG,
which I think every backup system *should* be doing, then guess what?
You've got a backup which you can go back to immediately, possibly with
the ability to restore all of the data from WAL.  That won't always be
the case, naturally, but it's a much better position than simply having
a system which continues to degrade until you've actually reached the
"you're screwed" level because PG will no longer read a page or perhaps
can't even start up, *and* you no longer have any backups.

As it is, there are backup solutions which *do* check the checksum when
backing up PG.  This is no longer, thankfully, some hypothetical thing,
but something which really exists and will hopefully keep users from
losing data.

> It's nice to know that (maybe?) but without a recovery strategy a
> whole lot of people who get that message are going to immediately
> start asking "How do I ignore the fact that I'm screwed and try to
> read the data anyway?".

And we have options for that.

> And then you wonder what the point of having
> the feature turned on is, especially if it's costly.  It's almost an
> attractive nuisance at that point - nobody wants to be the user that
> turns off checksums because they sound good on paper, but when you
> actually have a problem an awful lot of people are NOT going to want
> to try to restore from backup and maybe lose recent transactions.
> They're going to want to ignore the checksum failures.  That's kind of
> awful.

Presently, last I checked at least, the database system doesn't fall
over and die if a single page's checksum fails.  I agree entirely that
we want the system to fail gracefully (unless the user instructs us
otherwise, perhaps because they have a redundant system that they can
flip to immediately).

> Peter's comments upthread get at this: "We need to invest in
> corruption detection/verification tools that are run on an as-needed
> basis."  Exactly.  If we could verify that our data is good before
> throwing away our old backups, that'd be good.

Ok.  We have that.  I admit that it's not in PG proper, yet, but I can't
turn back the clock and make that happen, but we *do* have the ability
to verify that our backups are free of checksum errors, on checksum
enabled databases, before we throw away the old backup.

> If we could verify
> that our indexes were structurally sane, that would be superior to
> anything checksums can ever give us because it catches not only
> storage failures but also software failures within PostgreSQL itself
> and user malfeasance above the PostgreSQL layer (e.g. redefining the
> supposedly-immutable function to give different answers) and damage
> inflicted inadvertently by environmental changes (e.g. upgrading glibc
> and having strcoll() change its mind).

We have a patch for that and I would love to see it in core already.  If
it's possible to make that run against a backup, then I'd be happy to
see about integrating it into a backup solution.

> If we could verify that every
> XID and MXID in the heap points to a clog or multixact record that
> still exists, that'd catch more than just bit flips.

Interesting idea...  That might not be too hard to implement on
systems with enough memory to pull all of clog in and then check as the
pages go by.  I can't promise anything, but I'm definitely going to
discuss the idea with David.

> I'm not trying to downplay the usefulness of checksums *in a certain
> context*.  It's a good feature, and I'm glad we have it.  But I think
> you're somewhat inflating the utility of it while discounting the very
> real costs.

The costs for checksums don't bother me any more than the costs for WAL
or WAL CRCs or full page writes.  They may not be required on every
system, but they're certainly required on more than 'zero' entirely
reasonable systems which people deploy in their production environments.
I'd rather walk into an engagement where the user is saying "yeah, we
enabled checksums and it caught this corruption issue" than having to
break the bad news, which I've had to do over and over, that their
existing system hasn't got checksums enabled.  This isn't hypothetical,
it's what I run into regularly with entirely reasonable and skilled
engineers who have been deploying PG.

Thanks!

Stephen

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument
Next
From: Peter Geoghegan
Date:
Subject: Re: [HACKERS] Checksums by default?