Re: Enabling Checksums - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Enabling Checksums
Date
Msg-id 1352663799.3113.45.camel@jdavis-laptop
Whole thread Raw
In response to Re: Enabling Checksums  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Enabling Checksums
Re: Enabling Checksums
List pgsql-hackers
On Fri, 2012-11-09 at 09:57 -0800, Josh Berkus wrote:
> Huh?  Why would a GUC not make sense?  How else would you make sure that
> checksums where on when you started the system?

If we stored the information in pg_control, you could check with
pg_controldata. We could have a separate utility, pg_checksums, that can
alter the state and/or do an offline verification. And initdb would take
an option that would start everything out fully protected with
checksums.

The problem with a GUC is that checksums aren't really something you can
change by just changing the variable and restarting, unless you are only
using checksums opportunistically (only write checksums when a page is
dirtied and only verify a checksum if the header indicates that it's
present).

There are also usability issues. If someone has a fully-protected
instance, and turns the GUC off, and starts the server, they'll lose the
"fully-protected" status on the first write, and have to re-read all the
data to get back to fully protected. That just doesn't seem right to me.

> Well, large databases would tend to be stuck permanently in "Enabling",
> becuase the user would never vacuum old cold partitions in order to
> checksum them.  So we need to be prepared for this to be the end state
> for a lot of databases.

That may be true, but if that's the case, it's more like a 3-bit
checksum than a 16-bit checksum, because of the page-header corruption
problem. I don't know of any way to give those users more than that,
which won't be good enough for the set-at-initdb time users.

> In fact, we'd need three settings for the checksum GUC:
> 
> OFF -- don't checksum anything, equal to state (1) above
> 
> WRITES -- checksum pages which are being written anyway, but ignore
> tables which aren't touched.  Permanent "Enabling" state.
> 
> ALL -- checksum everything you can.  particularly, autovacuum would
> checksum any table which was not already checksummed at the next vacuum
> of that table.  Goal is to get to state 3 above.

That's slightly more eager, but it's basically the same as the WRITES
state. In order to get to the fully-protected state, you still need to
somehow make sure that all of the old data is checksummed.

And the "fully protected" state is important in my opinion, because
otherwise we aren't protected against corrupt page headers that say they
have no checksum (even when it really should have a checksum).

> > Does it make sense to store this information in pg_control? That doesn't
> > require adding any new file, and it has the benefit that it's already
> > checksummed. It's available during recovery and can be made available
> > pretty easily in the places where we write data.
> > 
> > And the next question is what commands to add to change state. Ideas:
> > 
> >    CHECKSUMS ENABLE; -- set state to "Enabling"
> >    CHECKSUMS DISABLE; -- set state to "Off"
> 
> Don't like this, please make it a GUC.

I'll see if you have ideas about how to resolve the problems with a GUC
that I mentioned above. But if not, then what about using a utility,
perhaps called pg_checksums? That way we wouldn't need new syntax.

> As there's no such thing as system-wide vacuum, we're going to have to
> track whether a table is "fully checksummed" in the system catalogs.

It seems like this is going down the road of per-table checksums. I'm
not opposed to that, but that has a low chance of making 9.3.

Let's try to do something simpler now that leaves open the possibility
of more flexibility later. I'm inclined to agree with Robert that the
first patch should probably be an initdb-time option. Then, we can allow
a lazy mode (like your WRITES state) and an eager offline check with a
pg_checksums utility. Then we can work towards per-table checksums,
control via VACUUM, protecting the SLRU, treating zero pages as invalid,
protecting temp files (which can be a GUC), replication integration,
etc.

> Hmmm, better to have a 2nd GUC:
> 
> checksum_fail_action = WARNING | ERROR
> 
> ... since some people want the write or read to fail, and others just
> want to see it in the logs.

Checksums don't introduce new failure modes on writes, only on reads.

And for reads, I think we have a problem doing anything less than an
ERROR. If we allow the read to succeed, we either risk a crash (or
silently corrupting other buffers in shared memory), or we have to put a
zero page in its place. But we already have the zero_damaged_pages
option, which I think is better because reading corrupt data is only
useful for data recovery efforts.

> So, thinking about it, state (3) is never the state of an entire
> installation; it's always the state of individual tables.

That contradicts the idea of using a GUC then. It would make more sense
to have extra syntax or extra VACUUM modes to accomplish that per-table.

Unfortunately, I'm worried that the per-table approach will not be
completed by 9.3. Do you see something about my proposal that makes it
harder to get where we want to go in the future?

If we do ultimately get per-table checksums, then I agree that a flag in
pg_control may be a bit of a wart, but it's easy enough to remove later.

Regards,Jeff Davis




pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Unresolved error 0xC0000409 on Windows Server
Next
From: Matthew Gerber
Date:
Subject: Re: Unresolved error 0xC0000409 on Windows Server