Thread: fault tolerance...

fault tolerance...

From
Christopher Quinn
Date:
Hello,

I've been wondering how pgsql goes about guaranteeing data 
integrity in the face of soft failures. In particular 
whether it uses an alternative to the double root block 
technique - which is writing, as a final indication of the 
validity of new log records, to alternate disk blocks at 
fixed disk locations some meta information including the 
location of the last log record written.
This is the only technique I know of - does pgsql use 
something analogous?

Also, I note from the developer docs the comment on cacheing 
disk drives: can anyone supply a reference on this subject 
(I have been on the lookout for a long time without success) 
and perhaps more generally on the subject of what exactly 
can go wrong with a disk write when struck by power failure.

Lastly, is there any form of integrity checking on disk 
block level data? I have vague recollections of seeing 
mention of crc/xor in relation to Oracle or DB2.
Whether or not pgsql uses any such scheme I am curious to 
know a rationale for its use - it makes me wonder about 
what, if anything, can be relied on 100%!

Thanks,
Chris Quinn



Re: fault tolerance...

From
Tom Lane
Date:
Christopher Quinn <cq@htec.demon.co.uk> writes:
> I've been wondering how pgsql goes about guaranteeing data 
> integrity in the face of soft failures. In particular 
> whether it uses an alternative to the double root block 
> technique - which is writing, as a final indication of the 
> validity of new log records, to alternate disk blocks at 
> fixed disk locations some meta information including the 
> location of the last log record written.
> This is the only technique I know of - does pgsql use 
> something analogous?

The WAL log uses per-record CRCs plus sequence numbers (both per-record
and per-page) as a way of determining where valid information stops.
I don't see any need for relying on a "root block" in the sense you
describe.

> Lastly, is there any form of integrity checking on disk 
> block level data? I have vague recollections of seeing 
> mention of crc/xor in relation to Oracle or DB2.

At present we rely on the disk drive to not drop data once it's been
successfully fsync'd (at least not without detecting a read error later).
There was some discussion of adding per-page CRCs as a second-layer
check, but no one seems very excited about it.  The performance costs
would be nontrivial and we have not seen all that many reports of field
failures in which a CRC would have improved matters.
        regards, tom lane


Re: fault tolerance...

From
Christopher Quinn
Date:
Tom Lane wrote:
> Christopher Quinn <cq@htec.demon.co.uk> writes:
> 
> 
> The WAL log uses per-record CRCs plus sequence numbers (both per-record
> and per-page) as a way of determining where valid information stops.
> I don't see any need for relying on a "root block" in the sense you
> describe.
> 

Yes I see.
I imagine if a device were used for the log (non-file so no 
EOF to denote end of log/valid-data) there is the 
possibility that old record space after the last/valid 
record might contain bytes which appear to form another 
valid record ... if it weren't for the security of a crc.


> check, but no one seems very excited about it.  The performance costs
> would be nontrivial and we have not seen all that many reports of field
> failures in which a CRC would have improved matters.
> 

Access to hard data on such corruption or its theoretical 
likelihood would be nice!
Have you referenced any material yourself in deciding what 
measures to implement to achieve the level of data security 
pgsql currently offers?

Thanks,
Chris