Thread: RE: CRCs (was: beta testing version)
> > That's why an end marker must follow all valid records. ... > > That requires an extra out-of-sequence write. Yes, and also increase probability to corrupt already committed to log data. > (I'd also like to see CRCs on all the table blocks as well; is there > a place to put them?) Do we need it? "physical log" feature suggested by Andreas will protect us from non atomic data block writes. Vadim
P.S.: I would volunteer to integrate CRC routines into postgres if somebody points me in the right direction in the source code. Horst
> > (I'd also like to see CRCs on all the table blocks as well; is there > > a place to put them?) > > Do we need it? "physical log" feature suggested by Andreas will protect > us from non atomic data block writes. CRCs are neccessary because of glitches, hardware failures, operating system bugs, viruses, etc - a lot of factors which can alter data stored on the harddisk independend of postgresql. I learned this lesson the hard way when I wrote a database application for a hospital, where data integrity is vital. Logging CRCs with each record gave us proof that data had been corrupted by "external" factors (we never found out what it was). It was only a few bytes in a data base with several 100k of records, but still intolerable. Medicine is heading a way where decisions will be backed up by computerized algorithms which in turn depend on exact data. A one bit glitch in a Terabyte database can make the difference between life and death. These glitches will happen, no doubt. Doesn't matter - as long as you have some means of proofing your data integrity and some mechanism of alerting you when shit has happend. At present I am coordinating another medical project, we have chosen PostgreSQL as our backend, and the main problem we have is creating efficient CRC triggers (I'd wish postgres would support generic triggers that are valid system wide or at least valid for all tables inheriting the same table) for own homegrown integrity logging. Horst
On Thu, Dec 07, 2000 at 12:22:12PM -0800, Mikheev, Vadim wrote: > > > That's why an end marker must follow all valid records. > > That requires an extra out-of-sequence write. > Yes, and also increase probability to corrupt already committed > to log data. Are you referring to the case where the drive loses power in mid-write? That is solved by either arranging for the markers to always be placed at the start of a block, or by plugging in a UPS. -- Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
On Thu, Dec 07, 2000 at 12:22:12PM -0800, Mikheev, Vadim wrote: > > > That's why an end marker must follow all valid records. > ... > > > > That requires an extra out-of-sequence write. > > Yes, and also increase probability to corrupt already committed > to log data. > > > (I'd also like to see CRCs on all the table blocks as well; is there > > a place to put them?) > > Do we need it? "physical log" feature suggested by Andreas will protect > us from non atomic data block writes. There are myriad sources of corruption, including RAM bit rot and software bugs. The earlier and more reliably it's caught, the better. The goal is to be able to say that a power outage won't invisibly corrupt your database. Here is are sources to a 64-bit CRC computation, under BSD license: http://gcc.gnu.org/ml/gcc/1999-11n/msg00592.html Nathan Myers ncm@zembu.com