Re: Page Checksums - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Page Checksums
Date
Msg-id 201112211629.45491.andres@anarazel.de
Whole thread Raw
In response to Re: Page Checksums  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-hackers
On Wednesday, December 21, 2011 04:21:53 PM Kevin Grittner wrote:
> Greg Smith <greg@2ndQuadrant.com> wrote:
> >>  Some people think I border on the paranoid on this issue.
> > 
> > Those people are also out to get you, just like the hardware.
> 
> Hah!  I *knew* it!
> 
> >> Are you arguing that autovacuum should be disabled after crash
> >> recovery?  I guess if you are arguing that a database VACUUM
> >> might destroy recoverable data when hardware starts to fail, I
> >> can't argue.
> > 
> > A CRC failure suggests to me a significantly higher possibility
> > of hardware likely to lead to more corruption than a normal crash
> > does though.
> 
> Yeah, the discussion has me coming around to the point of view
> advocated by Andres: that it should be treated the same as corrupt
> pages detected through other means.  But that can only be done if
> you eliminate false positives from hint-bit-only updates.  Without
> some way to handle that, I guess that means the idea is dead.
> 
> Also, I'm not sure that our shop would want to dedicate any space
> per page for this, since we're comparing between databases to ensure
> that values actually match, row by row, during idle time.  A CRC or
> checksum is a lot weaker than that.  I can see where it would be
> very valuable where more rigorous methods aren't in use; but it
> would really be just extra overhead with little or no benefit for
> most of our database clusters.
Comparing between database will by far not recognize failures in all data 
because you surely will not use all indexes. With index only scans the 
likelihood of unnoticed heap corruption also increases.
E.g. I have seen disk level corruption silently corrupting a unique index so 
it didn't cover all data anymore which lead to rather big problems.
Not everyone can do regular dump+restore tests to protect against such 
scenarios...

Andres


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: CLOG contention
Next
From: Leonardo Francalanci
Date:
Subject: Re: Page Checksums