Thread: RE: CRCs (was: beta testing version)

RE: CRCs (was: beta testing version)

From

"Mikheev, Vadim"

Date:

07 December 2000, 15:50:02

> > That's why an end marker must follow all valid records.  
...
> 
> That requires an extra out-of-sequence write. 

Yes, and also increase probability to corrupt already committed
to log data.

> (I'd also like to see CRCs on all the table blocks as well; is there
> a place to put them?)

Do we need it? "physical log" feature suggested by Andreas will protect
us from non atomic data block writes.

Vadim

Re: CRCs (was: beta testing version)

From

"Horst Herb"

Date:

07 December 2000, 16:36:51

P.S.: I would volunteer to integrate CRC routines into postgres if somebody
points me in the right direction in the source code.

Horst

Re: CRCs (was: beta testing version)

From

"Horst Herb"

Date:

07 December 2000, 16:37:30

> > (I'd also like to see CRCs on all the table blocks as well; is there
> > a place to put them?)
>
> Do we need it? "physical log" feature suggested by Andreas will protect
> us from non atomic data block writes.

CRCs are neccessary because of glitches, hardware failures, operating system
bugs, viruses, etc - a lot of factors which can alter data stored on the
harddisk independend of postgresql. I learned this lesson the hard way when
I wrote a database application for a hospital, where data integrity is
vital.

Logging CRCs with each record gave us proof that data had been corrupted by
"external" factors (we never found out what it was). It was only a few bytes
in a data base with several 100k of records, but still intolerable. Medicine
is heading a way where decisions will be backed up by computerized
algorithms which in turn depend on exact data. A one bit glitch in a
Terabyte database can make the difference between life and death. These
glitches will happen, no doubt. Doesn't matter - as long as you have some
means of proofing your data integrity and some mechanism of alerting you
when shit has happend.

At present I am coordinating another medical project, we have chosen
PostgreSQL as our backend, and the main problem we have is creating
efficient CRC triggers (I'd wish postgres would support generic triggers
that are valid system wide or at least valid for all tables inheriting the
same table) for own homegrown integrity logging.

Horst

Re: CRCs (was: beta testing version)

From

Bruce Guenter

Date:

07 December 2000, 17:36:33

On Thu, Dec 07, 2000 at 12:22:12PM -0800, Mikheev, Vadim wrote:
> > > That's why an end marker must follow all valid records.
> > That requires an extra out-of-sequence write.
> Yes, and also increase probability to corrupt already committed
> to log data.

Are you referring to the case where the drive loses power in mid-write?
That is solved by either arranging for the markers to always be placed
at the start of a block, or by plugging in a UPS.
--
Bruce Guenter <bruceg@em.ca>                       http://em.ca/~bruceg/

Re: CRCs (was: beta testing version)

From

ncm@zembu.com (Nathan Myers)

Date:

07 December 2000, 18:32:29

On Thu, Dec 07, 2000 at 12:22:12PM -0800, Mikheev, Vadim wrote:
> > > That's why an end marker must follow all valid records.  
> ...
> > 
> > That requires an extra out-of-sequence write. 
> 
> Yes, and also increase probability to corrupt already committed
> to log data.
> 
> > (I'd also like to see CRCs on all the table blocks as well; is there
> > a place to put them?)
> 
> Do we need it? "physical log" feature suggested by Andreas will protect
> us from non atomic data block writes.

There are myriad sources of corruption, including RAM bit rot and
software bugs.  The earlier and more reliably it's caught, the better.
The goal is to be able to say that a power outage won't invisibly
corrupt your database.

Here is are sources to a 64-bit CRC computation, under BSD license:
 http://gcc.gnu.org/ml/gcc/1999-11n/msg00592.html

Nathan Myers
ncm@zembu.com