Re: Block-level CRC checks - Mailing list pgsql-hackers

From Aidan Van Dyk
Subject Re: Block-level CRC checks
Date
Msg-id 20091130214913.GU17573@oak.highrise.ca
Whole thread Raw
In response to Re: Block-level CRC checks  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Block-level CRC checks
List pgsql-hackers
* Simon Riggs <simon@2ndQuadrant.com> [091130 16:28]:
> 
> You've written that as if you are spotting a problem. It sounds to me
> that this is exactly the situation we would like to detect and this is a
> perfect way of doing that.
> 
> What do you see is the purpose here apart from spotting corruptions?
> 
> Do we think error rates are so low we can recover the corruption by
> doing something clever with the CRC? I envisage most corruptions as
> being unrecoverable except from backup/WAL/replicated servers. 
> 
> It's been a long day, so perhaps I've misunderstood.

No, I believe the torn-page problem is exactly the thing that made the
checksum talks stall out last time...  The torn page isn't currently a
problem on only-hint-bit-dirty writes, because if you get
half-old/half-new, the only changes is the hint bit - no big loss, the
data is still the same.

But, with a form of check-sums, when you read it it next time, is it
corrupt?  According to the check-sum, yes, but in reality, the *data* is
still valid, just that the check sum is/isn't correctly matching the
half-changed hint bits...

And then many not-so-really-attractive workarounds where thrown around,
with nothing nice falling into place...

a.

-- 
Aidan Van Dyk                                             Create like a god,
aidan@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: draft RFC: concept for partial, wal-based replication
Next
From: Dimitri Fontaine
Date:
Subject: Re: Application name patch - v4