On Tue, 2009-12-01 at 10:55 -0500, Tom Lane wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
> > On Tue, 2009-12-01 at 16:40 +0200, Heikki Linnakangas wrote:
> >> It's not hard to imagine that when a hardware glitch happens
> >> causing corruption, it also causes the system to crash. Recalculating
> >> the CRCs after crash would mask the corruption.
>
> > They are already masked from us, so continuing to mask those errors
> > would not put us in a worse position.
>
> No, it would just destroy a large part of the argument for why this
> is worth doing. "We detect disk errors ... except for ones that happen
> during a database crash." "Say what?"
I know what I said sounds ridiculous, I'm just trying to keep my mind
open about the tradeoffs. The way to detect 100% of corruptions is to
WAL-log 100% of writes to blocks and we know that sucks performance -
twas me that said it in the original discussion. I'm trying to explore
whether we can detect <100% of other errors at some intermediate
percentage of WAL-logging. If we decide that there isn't an intermediate
position worth taking, I'm happy, as long it was a fact-based decision.
> The fundamental problem with this is the same as it's been all along:
> the tradeoff between implementation work expended, performance overhead
> added, and net number of real problems detected (with a suitably large
> demerit for actually *introducing* problems) just doesn't look
> attractive. You can make various compromises that improve one or two of
> these factors at the cost of making the others worse, but at the end of
> the day I've still not seen a combination that seems worth doing.
I agree. But also I do believe there are people that care enough about
this to absorb a performance hit and the new features in 8.5 will bring
in a new crop of people that care about those things very much.
-- Simon Riggs www.2ndQuadrant.com