Re: Block-level CRC checks - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Block-level CRC checks
Date
Msg-id 1259623701.13774.10218.camel@ebony
Whole thread Raw
In response to Re: Block-level CRC checks  (Aidan Van Dyk <aidan@highrise.ca>)
Responses Re: Block-level CRC checks
List pgsql-hackers
On Mon, 2009-11-30 at 16:49 -0500, Aidan Van Dyk wrote:
> * Simon Riggs <simon@2ndQuadrant.com> [091130 16:28]:
> > 
> > You've written that as if you are spotting a problem. It sounds to me
> > that this is exactly the situation we would like to detect and this is a
> > perfect way of doing that.
> > 
> > What do you see is the purpose here apart from spotting corruptions?
> > 
> > Do we think error rates are so low we can recover the corruption by
> > doing something clever with the CRC? I envisage most corruptions as
> > being unrecoverable except from backup/WAL/replicated servers. 
> > 
> > It's been a long day, so perhaps I've misunderstood.
> 
> No, I believe the torn-page problem is exactly the thing that made the
> checksum talks stall out last time...  The torn page isn't currently a
> problem on only-hint-bit-dirty writes, because if you get
> half-old/half-new, the only changes is the hint bit - no big loss, the
> data is still the same.
> 
> But, with a form of check-sums, when you read it it next time, is it
> corrupt?  According to the check-sum, yes, but in reality, the *data* is
> still valid, just that the check sum is/isn't correctly matching the
> half-changed hint bits...

A good argument, but we're missing some proportion.

There are at most 240 hint bits in an 8192 byte block. So that is less
than 0.5% of the data block where a single bit error would not corrupt
data, and 0% of the data block where a 2+ bit error would not corrupt
data. Put it another way, more than 99.5% of possible errors would cause
data loss, so I would at least like the option of being told about them.

The other perspective is that these errors are unlikely to be caused by
cosmic rays and other quantum effects, they are more likely to be caused
by hardware errors. Hardware errors are frequently repeatable, so one
bank of memory or one section of DRAM is damaged and will give errors.
If we don't report an error, the next error from that piece of hardware
is almost certain to cause data loss, so even a false positive result
should be treated as a good indicator of a true positive detection
result in the future.

If protection against data loss really does need to be so invasive that
we need to WAL-log all changes, then lets make it a table-level option.
If people want to pay the price, we should at least give them the option
of doing so. We can think of ways of optimising it later. Since I was
the one who opposed this on the basis of performance, I want to rescind
that objection and say lets make it an option for those that wish to
trade performance for some visibility of possible data loss errors.

-- Simon Riggs           www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Application name patch - v4
Next
From: Bruce Momjian
Date:
Subject: Re: Application name patch - v4