Thread: fault tolerance...
Hello, I've been wondering how pgsql goes about guaranteeing data integrity in the face of soft failures. In particular whether it uses an alternative to the double root block technique - which is writing, as a final indication of the validity of new log records, to alternate disk blocks at fixed disk locations some meta information including the location of the last log record written. This is the only technique I know of - does pgsql use something analogous? Also, I note from the developer docs the comment on cacheing disk drives: can anyone supply a reference on this subject (I have been on the lookout for a long time without success) and perhaps more generally on the subject of what exactly can go wrong with a disk write when struck by power failure. Lastly, is there any form of integrity checking on disk block level data? I have vague recollections of seeing mention of crc/xor in relation to Oracle or DB2. Whether or not pgsql uses any such scheme I am curious to know a rationale for its use - it makes me wonder about what, if anything, can be relied on 100%! Thanks, Chris Quinn
Christopher Quinn <cq@htec.demon.co.uk> writes: > I've been wondering how pgsql goes about guaranteeing data > integrity in the face of soft failures. In particular > whether it uses an alternative to the double root block > technique - which is writing, as a final indication of the > validity of new log records, to alternate disk blocks at > fixed disk locations some meta information including the > location of the last log record written. > This is the only technique I know of - does pgsql use > something analogous? The WAL log uses per-record CRCs plus sequence numbers (both per-record and per-page) as a way of determining where valid information stops. I don't see any need for relying on a "root block" in the sense you describe. > Lastly, is there any form of integrity checking on disk > block level data? I have vague recollections of seeing > mention of crc/xor in relation to Oracle or DB2. At present we rely on the disk drive to not drop data once it's been successfully fsync'd (at least not without detecting a read error later). There was some discussion of adding per-page CRCs as a second-layer check, but no one seems very excited about it. The performance costs would be nontrivial and we have not seen all that many reports of field failures in which a CRC would have improved matters. regards, tom lane
Tom Lane wrote: > Christopher Quinn <cq@htec.demon.co.uk> writes: > > > The WAL log uses per-record CRCs plus sequence numbers (both per-record > and per-page) as a way of determining where valid information stops. > I don't see any need for relying on a "root block" in the sense you > describe. > Yes I see. I imagine if a device were used for the log (non-file so no EOF to denote end of log/valid-data) there is the possibility that old record space after the last/valid record might contain bytes which appear to form another valid record ... if it weren't for the security of a crc. > check, but no one seems very excited about it. The performance costs > would be nontrivial and we have not seen all that many reports of field > failures in which a CRC would have improved matters. > Access to hard data on such corruption or its theoretical likelihood would be nice! Have you referenced any material yourself in deciding what measures to implement to achieve the level of data security pgsql currently offers? Thanks, Chris