Re: CRC was: Re: beta testing version - Mailing list pgsql-hackers
From | ncm@zembu.com (Nathan Myers) |
---|---|
Subject | Re: CRC was: Re: beta testing version |
Date | |
Msg-id | 20001212123112.F30335@store.zembu.com Whole thread Raw |
In response to | Re: CRC was: Re: beta testing version (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Thu, Dec 07, 2000 at 07:36:33PM -0500, Tom Lane wrote: > ncm@zembu.com (Nathan Myers) writes: > > 2. I disagree with way the above statistics were computed. That eleven > > million-year figure gets whittled down pretty quickly when you > > factor in all the sources of corruption, even without crashes. > > (Power failures are only one of many sources of corruption.) They > > grow with the size and activity of the database. Databases are > > getting very large and busy indeed. > > Sure, but the argument still holds. If the net MTBF of your underlying > system is less than a day, it's too unreliable to run a database that > you want to trust. Doesn't matter what the contributing failure > mechanisms are. In practice, I'd demand an MTBF of a lot more than a > day before I'd accept a hardware system as satisfactory... In many intended uses (such as Landmark's original plan?) it is not just one box, but hundreds or thousands. With thousands of databases deployed, the MTBF (including power outages) for commodity hardware is well under a day, and there's not much you can do about that. In a large database (e.g. 64GB) you have 8M blocks. Each hash covers one block. With a 32-bit checksum, when you check one block, you have a 2^(-32) likelihood of missing an error, assuming there is one. With 8M blocks, you can only claim a 2^(-9) chance. This is what I meant by "whittling". A factor of ten or a thousand here, another there, and pretty soon the possibility of undetected corruption is something that can't reasonably be ruled out. > > 3. Many users clearly hope to be able to pull the plug on their hardware > > and get back up confidently. While we can't promise they won't have > > to go to their backups, we should at least be equipped to promise, > > with confidence, that they will know whether they need to. > > And the difference in odds between 2^32 and 2^64 matters here? I made > a numerical case that it doesn't, and you haven't refuted it. By your > logic, we might as well say that we should be using a 128-bit CRC, or > 256-bit, or heck, a few kilobytes. It only takes a little longer to go > up each step, right, so where should you stop? I say MTBF measured in > megayears ought to be plenty. Show me the numerical argument that 64 > bits is the right place on the curve. I agree that this is a reasonable question. However, the magic of exponential growth makes any dissatisfaction with a 64-bit checksum far less likely than with a 32-bit checksum. It would forestall any such problems to arrange a configure-time flag such as "--with-checksum crc-32" or "--with-checksum md4", and make it clear where to plug in the checksum of one's choice. Then, ship 7.2 with just crc-32 and let somebody else produce patches for alternatives if they want them. BTW, I have been looking for Free 64-bit CRC codes/polynomials and the closest thing I have found so far was Mark Mitchell's hash, translated from the Modula-3 system. All the tape drive makers advertise (but don't publish (AFAIK)) a 64-bit CRC. A reasonable approach would be to deliver CRC-32 in 7.2, and then reconsider the default later if anybody contributes good alternatives. Nathan Myers ncm@zembu.com
pgsql-hackers by date: