Re: Block-level CRC checks - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Block-level CRC checks |
Date | |
Msg-id | 1259623701.13774.10218.camel@ebony Whole thread Raw |
In response to | Re: Block-level CRC checks (Aidan Van Dyk <aidan@highrise.ca>) |
Responses |
Re: Block-level CRC checks
|
List | pgsql-hackers |
On Mon, 2009-11-30 at 16:49 -0500, Aidan Van Dyk wrote: > * Simon Riggs <simon@2ndQuadrant.com> [091130 16:28]: > > > > You've written that as if you are spotting a problem. It sounds to me > > that this is exactly the situation we would like to detect and this is a > > perfect way of doing that. > > > > What do you see is the purpose here apart from spotting corruptions? > > > > Do we think error rates are so low we can recover the corruption by > > doing something clever with the CRC? I envisage most corruptions as > > being unrecoverable except from backup/WAL/replicated servers. > > > > It's been a long day, so perhaps I've misunderstood. > > No, I believe the torn-page problem is exactly the thing that made the > checksum talks stall out last time... The torn page isn't currently a > problem on only-hint-bit-dirty writes, because if you get > half-old/half-new, the only changes is the hint bit - no big loss, the > data is still the same. > > But, with a form of check-sums, when you read it it next time, is it > corrupt? According to the check-sum, yes, but in reality, the *data* is > still valid, just that the check sum is/isn't correctly matching the > half-changed hint bits... A good argument, but we're missing some proportion. There are at most 240 hint bits in an 8192 byte block. So that is less than 0.5% of the data block where a single bit error would not corrupt data, and 0% of the data block where a 2+ bit error would not corrupt data. Put it another way, more than 99.5% of possible errors would cause data loss, so I would at least like the option of being told about them. The other perspective is that these errors are unlikely to be caused by cosmic rays and other quantum effects, they are more likely to be caused by hardware errors. Hardware errors are frequently repeatable, so one bank of memory or one section of DRAM is damaged and will give errors. If we don't report an error, the next error from that piece of hardware is almost certain to cause data loss, so even a false positive result should be treated as a good indicator of a true positive detection result in the future. If protection against data loss really does need to be so invasive that we need to WAL-log all changes, then lets make it a table-level option. If people want to pay the price, we should at least give them the option of doing so. We can think of ways of optimising it later. Since I was the one who opposed this on the basis of performance, I want to rescind that objection and say lets make it an option for those that wish to trade performance for some visibility of possible data loss errors. -- Simon Riggs www.2ndQuadrant.com
pgsql-hackers by date: