Re: Page Checksums - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Page Checksums
Date
Msg-id 4EF1843C.6090101@2ndQuadrant.com
Whole thread Raw
In response to Re: Page Checksums  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: Page Checksums
List pgsql-hackers
On 12/19/2011 06:14 PM, Kevin Grittner wrote:
>> But if you need all that infrastructure just to get the feature
>> launched, that's a bit hard to stomach.
>>      
>
> Triggering a vacuum or some hypothetical "scrubbing" feature?
>    

What you were suggesting doesn't require triggering  just a vacuum 
though--it requires triggering some number of vacuums, for all impacted 
relations.  You said yourself that "all tables if the there's no way to 
rule any of them out" was a possibility.  I'm just pointing out that 
scheduling that level of work is a logistics headache, and it would be 
reasonable for people to expect some help with that were it to become a 
necessary thing falling out of the implementation.

> Some people think I border on the paranoid on this issue.

Those people are also out to get you, just like the hardware.

> Are you arguing that autovacuum should be disabled after crash
> recovery?  I guess if you are arguing that a database VACUUM might
> destroy recoverable data when hardware starts to fail, I can't
> argue.

A CRC failure suggests to me a significantly higher possibility of 
hardware likely to lead to more corruption than a normal crash does though.

>> The main way I expect to validate this sort of thing is with an as
>> yet unwritten function to grab information about a data block from
>> a standby server for this purpose, something like this:
>>
>> Master:  Computed CRC A, Stored CRC B; error raised because A!=B
>> Standby:  Computed CRC C, Stored CRC D
>>
>> If C==D&&  A==C, the corruption is probably overwritten bits of
>> the CRC B.
>>      
>
> Are you arguing we need *that* infrastructure to get the feature
> launched?
>    

No; just pointing out the things I'd eventually expect people to want, 
because they help answer questions about what to do when CRC failures 
occur.  The most reasonable answer to "what should I do about suspected 
corruption on a page?" in most of the production situations I worry 
about is "see if it's recoverable from the standby".  I see this as 
being similar to how RAID-1 works:  if you find garbage on one drive, 
and you can get a clean copy of the block from the other one, use that 
to recover the missing data.  If you don't have that capability, you're 
stuck with no clear path forward when a CRC failure happens, as you 
noted downthread.

This obviously gets troublesome if you've recently written a page out, 
so there's some concern about whether you are checking against the 
correct version of the page or not, based on where the standby's replay 
is at.  I see that as being a case that's also possible to recover from 
though, because then the page you're trying to validate on the master is 
likely sitting in the recent WAL stream.  This is already the sort of 
thing companies doing database recovery work (of which we are one) deal 
with, and I doubt any proposal will cover every possible situation.  In 
some cases there may be no better answer than "show all the known 
versions and ask the user to sort it out".  The method I suggested would 
sometimes kick out an automatic fix.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: CLOG contention
Next
From: Marti Raudsepp
Date:
Subject: Re: [PATCH] Fix ScalarArrayOpExpr estimation for GIN indexes