Re: CRC was: Re: beta testing version - Mailing list pgsql-hackers

From ncm@zembu.com (Nathan Myers)
Subject Re: CRC was: Re: beta testing version
Date
Msg-id 20001212123112.F30335@store.zembu.com
Whole thread Raw
In response to Re: CRC was: Re: beta testing version  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Thu, Dec 07, 2000 at 07:36:33PM -0500, Tom Lane wrote:
> ncm@zembu.com (Nathan Myers) writes:
> > 2. I disagree with way the above statistics were computed.  That eleven 
> >    million-year figure gets whittled down pretty quickly when you 
> >    factor in all the sources of corruption, even without crashes.  
> >    (Power failures are only one of many sources of corruption.)  They 
> >    grow with the size and activity of the database.  Databases are 
> >    getting very large and busy indeed.
> 
> Sure, but the argument still holds.  If the net MTBF of your underlying
> system is less than a day, it's too unreliable to run a database that
> you want to trust.  Doesn't matter what the contributing failure
> mechanisms are.  In practice, I'd demand an MTBF of a lot more than a
> day before I'd accept a hardware system as satisfactory...

In many intended uses (such as Landmark's original plan?) it is not just 
one box, but hundreds or thousands.  With thousands of databases deployed, 
the MTBF (including power outages) for commodity hardware is well under a 
day, and there's not much you can do about that.

In a large database (e.g. 64GB) you have 8M blocks.  Each hash covers
one block.  With a 32-bit checksum, when you check one block, you have 
a 2^(-32) likelihood of missing an error, assuming there is one.  With 
8M blocks, you can only claim a 2^(-9) chance.

This is what I meant by "whittling".  A factor of ten or a thousand
here, another there, and pretty soon the possibility of undetected
corruption is something that can't reasonably be ruled out.


> > 3. Many users clearly hope to be able to pull the plug on their hardware 
> >    and get back up confidently.  While we can't promise they won't have 
> >    to go to their backups, we should at least be equipped to promise,
> >    with confidence, that they will know whether they need to.
> 
> And the difference in odds between 2^32 and 2^64 matters here?  I made
> a numerical case that it doesn't, and you haven't refuted it.  By your
> logic, we might as well say that we should be using a 128-bit CRC, or
> 256-bit, or heck, a few kilobytes.  It only takes a little longer to go
> up each step, right, so where should you stop?  I say MTBF measured in
> megayears ought to be plenty.  Show me the numerical argument that 64
> bits is the right place on the curve.

I agree that this is a reasonable question.  However, the magic of 
exponential growth makes any dissatisfaction with a 64-bit checksum
far less likely than with a 32-bit checksum.

It would forestall any such problems to arrange a configure-time
flag such as "--with-checksum crc-32" or "--with-checksum md4",
and make it clear where to plug in the checksum of one's choice.
Then, ship 7.2 with just crc-32 and let somebody else produce 
patches for alternatives if they want them.

BTW, I have been looking for Free 64-bit CRC codes/polynomials and 
the closest thing I have found so far was Mark Mitchell's hash, 
translated from the Modula-3 system.  All the tape drive makers
advertise (but don't publish (AFAIK)) a 64-bit CRC.

A reasonable approach would be to deliver CRC-32 in 7.2, and then
reconsider the default later if anybody contributes good alternatives.

Nathan Myers
ncm@zembu.com


pgsql-hackers by date:

Previous
From: Randy Jonasz
Date:
Subject: Re: RFC C++ Interface
Next
From: ncm@zembu.com (Nathan Myers)
Date:
Subject: Re: RFC: CRC datatype