Re: Block-level CRC checks - Mailing list pgsql-hackers
| From | Simon Riggs | 
|---|---|
| Subject | Re: Block-level CRC checks | 
| Date | |
| Msg-id | 1259653966.13774.11898.camel@ebony Whole thread Raw | 
| In response to | Re: Block-level CRC checks (Tom Lane <tgl@sss.pgh.pa.us>) | 
| Responses | Re: Block-level CRC checks | 
| List | pgsql-hackers | 
On Mon, 2009-11-30 at 20:02 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > On Mon, 2009-11-30 at 16:49 -0500, Aidan Van Dyk wrote: > >> No, I believe the torn-page problem is exactly the thing that made the > >> checksum talks stall out last time... The torn page isn't currently a > >> problem on only-hint-bit-dirty writes, because if you get > >> half-old/half-new, the only changes is the hint bit - no big loss, the > >> data is still the same. > > > A good argument, but we're missing some proportion. > > No, I think you are. The problem with the described behavior is exactly > that it converts a non-problem into a problem --- a big problem, in > fact: uncorrectable data loss. Loss of hint bits is expected and > tolerated in the current system design. But a block with bad CRC is not > going to have any automated recovery path. > > So the difficulty is that in the name of improving system reliability > by detecting infrequent corruption events, we'd be decreasing system > reliability by *creating* infrequent corruption events, added onto > whatever events we were hoping to detect. There is no strong argument > you can make that this isn't a net loss --- you'd need to pull some > error-rate numbers out of the air to even try to make the argument, > and in any case the fact remains that more data gets lost with the CRC > than without it. The only thing the CRC is really buying is giving > the PG project a more plausible argument for blaming data loss on > somebody else; it's not helping the user whose data got lost. > > It's hard to justify the amount of work and performance hit we'd take > to obtain a "feature" like that. I think there is a clear justification for an additional option. There is no "creation" of corruption events. This scheme detects corruption events that *have* occurred. Now I understand that we previously would have recovered seamlessly from such events, but they were corruption events nonetheless and I think they need to be reported. (For why, see Conclusion #2, below). The frequency of such events against other corruption events is important here. You are right that there is effectively one new *type* of corruption event but without error-rate numbers you can't say that this shows substantially "more data gets lost with the CRC than without it". So let me say this again: the argument that inaction is a safe response here relies upon error-rate numbers going in your favour. You don't persuade us of one argument purely by observing that the alternate proposition requires a certain threshold error-rate - both propositions do. So its a straight: "what is the error-rate?" discussion and ISTM that there is good evidence of what that is. --- So, what is the probability of single-bit errors effecting hint bits? The hint bits can occupy any portion of the block, so their positions are random. They occupy less than 0.5% of the block, so they must account for a very small proportion of hardware-induced errors. Since most reasonable servers use Error Correcting Memory, I would expect not to see a high level of single bit errors, even though we know they are occurring in the underlying hardware (Conclusion #1, Schroeder et al, 2009) What is the chance that a correctable corruption event is in no way linked to another non-correctable event later? We would need to argue that corruptions are a purely stochastic process in all cases, yet again, there is evidence of both a clear and strong linkage from correctable to non-correctable errors. (Conclusion #2 and Conclusion #7, Schroeder et al, 2009). Schroeder et al http://www.cs.toronto.edu/~bianca/papers/sigmetrics09.pdf (thanks Greg!) Based on that paper, ISTM that ignorable hint bit corruptions are likely to account for a very small proportion of all corruptions, and of those, "70-80%" would show up as a non-ignorable corruptions within a month anyway. So the immediate effect on reliability is tiny, if any. The effect on detection is huge, which eventually produces significantly higher relability overall. > The only thing the CRC is really buying is giving > the PG project a more plausible argument for blaming data loss on > somebody else; it's not helping the user whose data got lost. This isn't about blame, its about detection. If we know something has happened we can do something about it. Experienced people know that hardware goes wrong, they just want to be told so they can fix it. I blocked development of a particular proposal earlier for performance reasons, but did not intend to block progress completely. It seems likely the checks will cause a performance hit. So make them an option. -- Simon Riggs www.2ndQuadrant.com
pgsql-hackers by date: