Thread: RE: CRC was: Re: beta testing version
> > This may be implemented very fast (if someone points me where > > I can find CRC func). And I could implement "physical log" > > till next monday. > > I have been experimenting with CRCs for the past 6 month in > our database for internal logging purposes. Downloaded a lot of > hash libraries, tried different algorithms, and implemented a few > myself. Which algorithm do you want? Have a look at the openssl > libraries (www.openssl.org) for a start -if you don't find what > you want let me know. Thanks. > As the logging might include large data blocks, especially > now that we can TOAST our data, TOAST breaks data into a few 2K (or so) tuples to be inserted separately. But first after checkpoint btree split will require logging of 2x8K record -:( > I would strongly suggest to use strong hashes like RIPEMD or > MD5 instead of CRC-32 and the like. Sure, it takes more time > tocalculate and more place on the hard disk, but then: a database > without data integrity (and means of _proofing_ integrity) is > pretty worthless. Other opinions? Also, we shouldn't forget licence issues. Vadim
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes: >> I would strongly suggest to use strong hashes like RIPEMD or >> MD5 instead of CRC-32 and the like. > Other opinions? Also, we shouldn't forget licence issues. I agree with whoever commented that crypto hashes are silly for this application. A 64-bit CRC *might* be enough stronger than a 32-bit CRC to be worth the extra calculation, but frankly I doubt that too. Remember that we are already sitting atop hardware that's really pretty reliable, despite the carping that's been going on in this thread. All that we have to do is detect the infrequent case where a block of data didn't get written due to system failure. It's wildly pessimistic to think that we might get called on to do so as much as once a day (if you are trying to run a reliable database, and are suffering power failures once a day, and haven't bought a UPS, you're a lost cause). A 32-bit CRC will fail to detect such an error with a probability of about 1 in 2^32. So, a 32-bit CRC will have an MBTF of 2^32 days, or 11 million years, on the wildly pessimistic side --- real installations probably 100 times better. That's plenty for me, and improving the odds to 2^64 or 2^128 is not worth any slowdown IMHO. regards, tom lane
On Thu, Dec 07, 2000 at 04:35:00PM -0500, Tom Lane wrote: > Remember that we are already sitting atop hardware that's really > pretty reliable, despite the carping that's been going on in this > thread. All that we have to do is detect the infrequent case where a > block of data didn't get written due to system failure. It's wildly > pessimistic to think that we might get called on to do so as much as > once a day (if you are trying to run a reliable database, and are > suffering power failures once a day, and haven't bought a UPS, you're > a lost cause). A 32-bit CRC will fail to detect such an error with a > probability of about 1 in 2^32. So, a 32-bit CRC will have an MBTF of > 2^32 days, or 11 million years, on the wildly pessimistic side --- > real installations probably 100 times better. That's plenty for me, > and improving the odds to 2^64 or 2^128 is not worth any slowdown > IMHO. 1. Computing a CRC-64 takes only about twice as long as a CRC-32, for 2^32 times the confidence. That's pretty cheap confidence. 2. I disagree with way the above statistics were computed. That eleven million-year figure gets whittled down pretty quicklywhen you factor in all the sources of corruption, even without crashes. (Power failures are only one of manysources of corruption.) They grow with the size and activity of the database. Databases are getting very largeand busy indeed. 3. Many users clearly hope to be able to pull the plug on their hardware and get back up confidently. While we can't promisethey won't have to go to their backups, we should at least be equipped to promise, with confidence, that they willknow whether they need to. 4. For a way to mark the "current final" log entry, you want a lot more confidence, because you read a lot more of them,and reading beyond the end may cause you to corrupt a currently-valid database, which seems a lot worse than justusing a corrupted database. Still, I agree that a 32-bit CRC is better than none at all. Nathan Myers ncm@zembu.com
ncm@zembu.com (Nathan Myers) writes: > 2. I disagree with way the above statistics were computed. That eleven > million-year figure gets whittled down pretty quickly when you > factor in all the sources of corruption, even without crashes. > (Power failures are only one of many sources of corruption.) They > grow with the size and activity of the database. Databases are > getting very large and busy indeed. Sure, but the argument still holds. If the net MTBF of your underlying system is less than a day, it's too unreliable to run a database that you want to trust. Doesn't matter what the contributing failure mechanisms are. In practice, I'd demand an MTBF of a lot more than a day before I'd accept a hardware system as satisfactory... > 3. Many users clearly hope to be able to pull the plug on their hardware > and get back up confidently. While we can't promise they won't have > to go to their backups, we should at least be equipped to promise, > with confidence, that they will know whether they need to. And the difference in odds between 2^32 and 2^64 matters here? I made a numerical case that it doesn't, and you haven't refuted it. By your logic, we might as well say that we should be using a 128-bit CRC, or 256-bit, or heck, a few kilobytes. It only takes a little longer to go up each step, right, so where should you stop? I say MTBF measured in megayears ought to be plenty. Show me the numerical argument that 64 bits is the right place on the curve. > 4. For a way to mark the "current final" log entry, you want a lot more > confidence, because you read a lot more of them, You only need to make the distinction during a restart, so I don't think that argument is correct. regards, tom lane
I believe that there are many good points to have CRC facilities "built int", and I failed to detect any arguments against it. In my domain (the medical domain) we simply can't use data without "proof" of integrity ("proof" as in highest possible level of confidence within reasonable effort) Therefore, I propose defining new data types like "CRC32", "CRC64", "RIPEMD", whatever (rather than pluggable arbitrary CRCs). Similar as with the SERIAL data type, the CRC data type would generate automatically a trigger function that calculates a CRC across a tuple (omitting the CRC property of course, and maybe the OID as well) before each update and store it in itself. Is there anything wrong with this idea? Can somebody help me by pointing me into the right direction to implement it? (The person who implemeted the SERIAL data type maybe ?) Regards, Horst
> Therefore, I propose defining new data types like "CRC32", "CRC64", > "RIPEMD", whatever (rather than pluggable arbitrary CRCs). I suspect that you are really looking at the problem from the wrong end. CRC checking should not need to be done by the database user, with a fancy type. The postgres server itself should guarantee data integrity - you shouldn't have to worry about it in userland. This is, in fact, what the recent discussion on this list has been proposing... Chris
> I suspect that you are really looking at the problem from the wrong end. > CRC checking should not need to be done by the database user, with a fancy > type. The postgres server itself should guarantee data integrity - you > shouldn't have to worry about it in userland. I agree in principle. However, performance sometimes is more important than integrity. Think of a data logger of uncritical data. A online forum. There a plenty of occasions where you don't have to worry for a single bit on or off, but a lot to worry about performance. Look at all those people using M$ Access or MySQL who don't give a damn about data integrity. As opposed to them, there will always be other "typical" database applications where 100% integrity is paramount. Then it is nice to have a choice of CRCs, where the database designer can choose according to his/her specific performance/integrity balanced needs. This is why I would prefer the "datatype" solution. > This is, in fact, what the recent discussion on this list has been > proposing... AFAIK the thread for "built in" crcs referred only to CRCs in the transaction log. This here is a different thing. CRCs in the transaction log are crucial to proof integrity of the log, CRCs as datatype are neccessary to proof integrity of database entries at row level. Always remember that a psotgres data base on the harddisk can be manipulated accidentally / maliciously without postgres even running. These are the cases where you need row level CRCs. Horst
"Horst Herb" <hherb@malleenet.net.au> writes: > AFAIK the thread for "built in" crcs referred only to CRCs in the > transaction log. This here is a different thing. CRCs in the transaction log > are crucial to proof integrity of the log, CRCs as datatype are neccessary > to proof integrity of database entries at row level. I think a row-level CRC is rather pointless. Perhaps it'd be a good idea to have a disk-page-level CRC, though. That would check the rows on the page *and* allow catching errors in the page and tuple overhead structures, which row-level CRCs would not cover. I suspect TOAST breaks your notion of computing a CRC at trigger time anyway --- some of the fields may be toasted already, some not. If you're sufficiently paranoid that you insist you need a row-level CRC, it seems to me that you'd want to generate it and later check it in your application, not in the database. That's the only way you get end-to-end coverage. Surely you don't trust your TCP connection to the server, either? regards, tom lane
> I think a row-level CRC is rather pointless. Perhaps it'd be a good > idea to have a disk-page-level CRC, though. That would check the rows > on the page *and* allow catching errors in the page and tuple overhead > structures, which row-level CRCs would not cover. row level is neccessary to be able tocheck integrity at application level. > I suspect TOAST breaks your notion of computing a CRC at trigger time > anyway --- some of the fields may be toasted already, some not. The workaround is a loggingtable where you store the crcs as well. Lateron, an "integrity daemon" can compare whether match or not. > If you're sufficiently paranoid that you insist you need a row-level > CRC, it seems to me that you'd want to generate it and later check it > in your application, not in the database. That's the only way you get Oh, sure, that is the way we do it now. And no, nothing to do with paranoia. Burnt previously badly by assumption that a decent SQL server is a "guarantee" for data integrity. Shit simply happens. > end-to-end coverage. Surely you don't trust your TCP connection to the > server, either? TCP _IS_ heavily checksummed. But yes, we _do_ calculate checksums at the client, recalculate at the server, and compare after the transaction is completed. As we have only few writes between heavy read access, the performance penalty doing this (for our purposes) is minimal. Horst
On Thu, Dec 07, 2000 at 04:01:23PM -0800, Nathan Myers wrote: > 1. Computing a CRC-64 takes only about twice as long as a CRC-32, for > 2^32 times the confidence. That's pretty cheap confidence. Incidentally, I benchmarked the previously mentioned 64-bit fingerprint, the standard 32-bit CRC, MD5 and SHA, and the fastest algorithm on my Celeron and on a PIII was MD5. The 64-bit fingerprint was only a hair slower, the CRC was (quite surprisingly) about 40% slower, and the implementation of SHA that I had available was a real dog. Taking an arbitrary 32 bits of a MD5 would likely be less collision prone than using a 32-bit CRC, and it appears faster as well. -- Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
Date: Fri, 8 Dec 2000 12:19:39 -0600 From: Bruce Guenter <bruceg@em.ca> Incidentally, I benchmarked the previously mentioned 64-bit fingerprint, the standard 32-bit CRC, MD5 and SHA, and thefastest algorithm on my Celeron and on a PIII was MD5. The 64-bit fingerprint was only a hair slower, the CRC was (quitesurprisingly) about 40% slower, and the implementation of SHA that I had available was a real dog. Taking an arbitrary32 bits of a MD5 would likely be less collision prone than using a 32-bit CRC, and it appears faster as well. I just want to confirm that you used something like the fast 32-bit CRC algorithm, appended. The one posted earlier was accurate but slow. Ian /** Copyright (C) 1986 Gary S. Brown. You may use this program, or* code or tables extracted from it, as desired withoutrestriction.*/ /* Modified slightly by Ian Lance Taylor, ian@airs.com, for use with Taylor UUCP. */ #include "uucp.h" #include "prot.h" /* First, the polynomial itself and its table of feedback terms. The */ /* polynomial is */ /* X^32+X^26+X^23+X^22+X^16+X^12+X^11+X^10+X^8+X^7+X^5+X^4+X^2+X^1+X^0 */ /* Note that we take it "backwards" and put the highest-order term in */ /* the lowest-order bit. The X^32 term is "implied"; the LSB is the */ /* X^31 term, etc. The X^0 term (usually shown as "+1") results in */ /* the MSB being 1. */ /* Note that the usual hardware shift register implementation, which */ /* is what we're using (we're merely optimizing it by doing eight-bit */ /* chunks at a time) shifts bits into the lowest-order term. In our */ /* implementation, that means shifting towards the right. Why do we */ /* do it this way? Because the calculated CRC must be transmitted in */ /* order from highest-order term to lowest-order term. UARTs transmit */ /* characters in order from LSB to MSB. By storing the CRC this way, */ /* we hand it to the UART in the order low-byte to high-byte; the UART */ /* sends each low-bit to hight-bit; and the result is transmission bit */ /* by bit from highest- to lowest-order term without requiring any bit */ /* shuffling on our part. Reception works similarly. */ /* The feedback terms table consists of 256, 32-bit entries. Notes: */ /* */ /* The table can be generated at runtime if desired; code to do so */ /* is shown later. It might not be obvious, but the feedback */ /* terms simply represent the results of eight shift/xor opera- */ /* tions for all combinations of data and CRC register values. */ /* [this code is no longer present--ian] */ /* */ /* The values must be right-shifted by eight bits by the "updcrc" */ /* logic; the shift must be unsigned (bring in zeroes). On some */ /* hardware you could probably optimize the shift in assembler by */ /* using byte-swap instructions. */ static const unsigned long aicrc32tab[] = { /* CRC polynomial 0xedb88320 */ 0x00000000L, 0x77073096L, 0xee0e612cL, 0x990951baL, 0x076dc419L, 0x706af48fL, 0xe963a535L, 0x9e6495a3L, 0x0edb8832L, 0x79dcb8a4L, 0xe0d5e91eL, 0x97d2d988L, 0x09b64c2bL, 0x7eb17cbdL, 0xe7b82d07L, 0x90bf1d91L, 0x1db71064L, 0x6ab020f2L, 0xf3b97148L, 0x84be41deL, 0x1adad47dL, 0x6ddde4ebL, 0xf4d4b551L, 0x83d385c7L, 0x136c9856L, 0x646ba8c0L, 0xfd62f97aL, 0x8a65c9ecL, 0x14015c4fL, 0x63066cd9L, 0xfa0f3d63L, 0x8d080df5L, 0x3b6e20c8L, 0x4c69105eL, 0xd56041e4L, 0xa2677172L, 0x3c03e4d1L, 0x4b04d447L, 0xd20d85fdL, 0xa50ab56bL, 0x35b5a8faL, 0x42b2986cL, 0xdbbbc9d6L, 0xacbcf940L, 0x32d86ce3L, 0x45df5c75L, 0xdcd60dcfL, 0xabd13d59L, 0x26d930acL, 0x51de003aL, 0xc8d75180L, 0xbfd06116L, 0x21b4f4b5L, 0x56b3c423L, 0xcfba9599L, 0xb8bda50fL, 0x2802b89eL, 0x5f058808L, 0xc60cd9b2L, 0xb10be924L, 0x2f6f7c87L, 0x58684c11L, 0xc1611dabL, 0xb6662d3dL, 0x76dc4190L, 0x01db7106L, 0x98d220bcL, 0xefd5102aL, 0x71b18589L, 0x06b6b51fL, 0x9fbfe4a5L, 0xe8b8d433L, 0x7807c9a2L, 0x0f00f934L, 0x9609a88eL, 0xe10e9818L, 0x7f6a0dbbL, 0x086d3d2dL, 0x91646c97L, 0xe6635c01L, 0x6b6b51f4L, 0x1c6c6162L, 0x856530d8L, 0xf262004eL, 0x6c0695edL, 0x1b01a57bL, 0x8208f4c1L, 0xf50fc457L, 0x65b0d9c6L, 0x12b7e950L, 0x8bbeb8eaL, 0xfcb9887cL, 0x62dd1ddfL, 0x15da2d49L, 0x8cd37cf3L, 0xfbd44c65L, 0x4db26158L, 0x3ab551ceL, 0xa3bc0074L, 0xd4bb30e2L, 0x4adfa541L, 0x3dd895d7L, 0xa4d1c46dL, 0xd3d6f4fbL, 0x4369e96aL, 0x346ed9fcL, 0xad678846L, 0xda60b8d0L, 0x44042d73L, 0x33031de5L, 0xaa0a4c5fL, 0xdd0d7cc9L, 0x5005713cL, 0x270241aaL, 0xbe0b1010L, 0xc90c2086L, 0x5768b525L, 0x206f85b3L, 0xb966d409L, 0xce61e49fL, 0x5edef90eL, 0x29d9c998L, 0xb0d09822L, 0xc7d7a8b4L, 0x59b33d17L, 0x2eb40d81L, 0xb7bd5c3bL, 0xc0ba6cadL, 0xedb88320L, 0x9abfb3b6L, 0x03b6e20cL, 0x74b1d29aL, 0xead54739L, 0x9dd277afL, 0x04db2615L, 0x73dc1683L, 0xe3630b12L, 0x94643b84L, 0x0d6d6a3eL, 0x7a6a5aa8L, 0xe40ecf0bL, 0x9309ff9dL, 0x0a00ae27L, 0x7d079eb1L, 0xf00f9344L, 0x8708a3d2L, 0x1e01f268L, 0x6906c2feL, 0xf762575dL, 0x806567cbL, 0x196c3671L, 0x6e6b06e7L, 0xfed41b76L, 0x89d32be0L, 0x10da7a5aL, 0x67dd4accL, 0xf9b9df6fL, 0x8ebeeff9L, 0x17b7be43L, 0x60b08ed5L, 0xd6d6a3e8L, 0xa1d1937eL, 0x38d8c2c4L, 0x4fdff252L, 0xd1bb67f1L, 0xa6bc5767L, 0x3fb506ddL, 0x48b2364bL, 0xd80d2bdaL, 0xaf0a1b4cL, 0x36034af6L, 0x41047a60L, 0xdf60efc3L, 0xa867df55L, 0x316e8eefL, 0x4669be79L, 0xcb61b38cL, 0xbc66831aL, 0x256fd2a0L, 0x5268e236L, 0xcc0c7795L, 0xbb0b4703L, 0x220216b9L, 0x5505262fL, 0xc5ba3bbeL, 0xb2bd0b28L, 0x2bb45a92L, 0x5cb36a04L, 0xc2d7ffa7L, 0xb5d0cf31L, 0x2cd99e8bL, 0x5bdeae1dL, 0x9b64c2b0L, 0xec63f226L, 0x756aa39cL, 0x026d930aL, 0x9c0906a9L, 0xeb0e363fL, 0x72076785L, 0x05005713L, 0x95bf4a82L, 0xe2b87a14L, 0x7bb12baeL, 0x0cb61b38L, 0x92d28e9bL, 0xe5d5be0dL, 0x7cdcefb7L, 0x0bdbdf21L, 0x86d3d2d4L, 0xf1d4e242L, 0x68ddb3f8L, 0x1fda836eL, 0x81be16cdL, 0xf6b9265bL, 0x6fb077e1L, 0x18b74777L, 0x88085ae6L, 0xff0f6a70L, 0x66063bcaL, 0x11010b5cL, 0x8f659effL, 0xf862ae69L, 0x616bffd3L, 0x166ccf45L, 0xa00ae278L, 0xd70dd2eeL, 0x4e048354L, 0x3903b3c2L, 0xa7672661L, 0xd06016f7L, 0x4969474dL, 0x3e6e77dbL, 0xaed16a4aL, 0xd9d65adcL, 0x40df0b66L, 0x37d83bf0L, 0xa9bcae53L, 0xdebb9ec5L, 0x47b2cf7fL, 0x30b5ffe9L, 0xbdbdf21cL, 0xcabac28aL, 0x53b39330L, 0x24b4a3a6L, 0xbad03605L, 0xcdd70693L, 0x54de5729L, 0x23d967bfL, 0xb3667a2eL, 0xc4614ab8L, 0x5d681b02L, 0x2a6f2b94L, 0xb40bbe37L, 0xc30c8ea1L, 0x5a05df1bL, 0x2d02ef8dL }; /** IUPDC32 macro derived from article Copyright (C) 1986 Stephen Satchell. * NOTE: First argument must be in range 0 to255.* Second argument is referenced twice.* * Programmers may incorporate any or all code into their programs, *giving proper credit within the source. Publication of the * source routines is permitted so long as proper credit is given* to Stephen Satchell, Satchell Evaluations and Chuck Forsberg, * Omen Technology.*/ #define IUPDC32(b, ick) \ (aicrc32tab[((int) (ick) ^ (b)) & 0xff] ^ (((ick) >> 8) & 0x00ffffffL)) unsigned long icrc (z, c, ick) const char *z; size_t c; unsigned long ick; { while (c > 4) { ick = IUPDC32 (*z++, ick); ick = IUPDC32 (*z++, ick); ick = IUPDC32 (*z++, ick); ick= IUPDC32 (*z++, ick); c -= 4; } while (c-- != 0) ick = IUPDC32 (*z++, ick); return ick; }
"Horst Herb" <hherb@malleenet.net.au> writes: >> Surely you don't trust your TCP connection to the >> server, either? > TCP _IS_ heavily checksummed. Yes, and so are the disk drives that you are asserting you don't trust. My point is that in both cases, there are lots and lots of failure mechanisms that won't be caught by the transport or storage CRC. The same applies to anything other than an end-to-end check. regards, tom lane
On Fri, Dec 08, 2000 at 10:36:39AM -0800, Ian Lance Taylor wrote: > Incidentally, I benchmarked the previously mentioned 64-bit fingerprint, > the standard 32-bit CRC, MD5 and SHA, and the fastest algorithm on my > Celeron and on a PIII was MD5. The 64-bit fingerprint was only a hair > slower, the CRC was (quite surprisingly) about 40% slower, and the > implementation of SHA that I had available was a real dog. Taking an > arbitrary 32 bits of a MD5 would likely be less collision prone than > using a 32-bit CRC, and it appears faster as well. > > I just want to confirm that you used something like the fast 32-bit > CRC algorithm, appended. The one posted earlier was accurate but > slow. Yes. I just rebuilt the framework using this exact code, and it performed identically to the previous CRC code (which didn't have an unrolled inner loop). These were compiled with -O6 with egcs 1.1.2. -- Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
On Fri, Dec 08, 2000 at 01:58:12PM -0500, Tom Lane wrote: > Bruce Guenter <bruceg@em.ca> writes: > > ... Taking an > > arbitrary 32 bits of a MD5 would likely be less collision prone than > > using a 32-bit CRC, and it appears faster as well. > > ... but that would be an algorithm that you know NOTHING about the > properties of. What is your basis for asserting it's better than CRC? MD5 is a cryptographic hash, which means (AFAIK) that ideally it is impossible to produce a collision using any other method than brute force attempts. In other words, any stream of input to the hash that is longer than the hash length (8 bytes for MD5) is equally probable to produce a given hash code. > CRC is pretty well studied and its error-detection behavior is known > (and good). MD5 has been studied less thoroughly AFAIK, and in any > case what's known about its behavior is that the *entire* MD5 output > provides a good signature for a datastream. If you pick some ad-hoc > method like taking a randomly chosen subset of MD5's output bits, > you really don't know anything at all about what the error-detection > properties of the method are. Actually, in my reading reagarding the properties of MD5, I read an article that stated that if a smaller number of bits was desired, one could either (and here's where my memory fails me) just select the middle N bits from the resulting hash, or fold the hash using XOR until the desired number of bits was reached. I'll see if I can find a reference... RFC2289 (http://www.ietf.org/rfc/rfc2289.txt) includes an algorithm for folding MD5 digests down to 64 bits by XORing the top half with the bottom half. See appendix A. -- Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
On Fri, Dec 08, 2000 at 12:19:39PM -0600, Bruce Guenter wrote: > On Thu, Dec 07, 2000 at 04:01:23PM -0800, Nathan Myers wrote: > > 1. Computing a CRC-64 takes only about twice as long as a CRC-32, for > > 2^32 times the confidence. That's pretty cheap confidence. > > Incidentally, I benchmarked the previously mentioned 64-bit fingerprint, > the standard 32-bit CRC, MD5 and SHA, and the fastest algorithm on my > Celeron and on a PIII was MD5. The 64-bit fingerprint was only a hair > slower, the CRC was (quite surprisingly) about 40% slower, and the > implementation of SHA that I had available was a real dog. Taking an > arbitrary 32 bits of a MD5 would likely be less collision prone than > using a 32-bit CRC, and it appears faster as well. This is very interesting. MD4 is faster than MD5. (MD5, described as "MD4 with suspenders on", does some extra stuff to protect against more- obscure attacks, of no interest to us.) Which 64-bit CRC code did you use, Mark Mitchell's? Are you really saying MD5 was faster than CRC-32? I don't know of any reason to think that 32 bits of an MD5 would be better distributed than a CRC-32, or that having computed the 64 bits there would be any point in throwing away half. Current evidence suggests that MD4 would be a good choice for a hash algorithm. Nathan Myers ncm@zembu.com
Bruce Guenter <bruceg@em.ca> writes: > ... Taking an > arbitrary 32 bits of a MD5 would likely be less collision prone than > using a 32-bit CRC, and it appears faster as well. ... but that would be an algorithm that you know NOTHING about the properties of. What is your basis for asserting it's better than CRC? CRC is pretty well studied and its error-detection behavior is known (and good). MD5 has been studied less thoroughly AFAIK, and in any case what's known about its behavior is that the *entire* MD5 output provides a good signature for a datastream. If you pick some ad-hoc method like taking a randomly chosen subset of MD5's output bits, you really don't know anything at all about what the error-detection properties of the method are. I am reminded of Knuth's famous advice about random number generators: "Random numbers should not be generated with a method chosen at random. Some theory should be used." Error-detection codes, like random-number generators, have decades of theory behind them. Seat-of-the-pants tinkering, even if it starts with a known-good method, is not likely to produce an improvement. regards, tom lane
Bruce Guenter <bruceg@em.ca> writes: > MD5 is a cryptographic hash, which means (AFAIK) that ideally it is > impossible to produce a collision using any other method than brute > force attempts. True but irrelevant. What we need to worry about is the probability that a random error will be detected, not the computational effort that a malicious attacker would need in order to insert an undetectable error. MD5 is designed for a purpose that really doesn't have much to do with error detection, when you think about it. It says "you will have a hard time computing a different string that produces the same hash as some prespecified string". This is not the same as promising better-than-random odds against a damaged copy of some string having the same hash as the original. CRC, on the other hand, is specifically designed for error detection, and for localized errors (such as a corrupted byte or two) it does a provably better-than-random job. For nonlocalized errors you don't get a guarantee, but you do get same-as-random odds of detection (ie, 1 in 2^N for an N-bit CRC). I really doubt that MD5 can beat a CRC with the same number of output bits for the purpose of error detection; given the lack of guarantee about short burst errors, I doubt it's even as good. (Wild-pointer stomps on disk buffers are an example of the sort of thing that may look like a burst error.) Now, if you are worried about crypto-capable gremlins living in your file system, maybe what you want is MD5. But I'm inclined to think that CRC is more appropriate for the job at hand. regards, tom lane
On Fri, Dec 08, 2000 at 11:10:19AM -0800, I wrote: > Current evidence suggests that MD4 would be a good choice for a hash > algorithm. Thinking about it, I suspect that any CRC implementation that can't outrun MD5 by a wide margin is seriously sub-optimal. Can you post any more details about how the tests were run? I'd like to try it. Nathan Myers ncm@zembu.com
On Fri, Dec 08, 2000 at 11:10:19AM -0800, Nathan Myers wrote: > This is very interesting. MD4 is faster than MD5. (MD5, described as > "MD4 with suspenders on", does some extra stuff to protect against more- > obscure attacks, of no interest to us.) Which 64-bit CRC code did you > use, Mark Mitchell's? Yes. > Are you really saying MD5 was faster than CRC-32? Yes. I expect it's because the operations used in MD5 are easily parallelized, and operate on blocks of 64-bytes at a time, while the CRC is mostly non-parallelizable, uses a table lookup, and operates on single bytes. -- Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
ncm@zembu.com (Nathan Myers) writes: > Thinking about it, I suspect that any CRC implementation that can't outrun > MD5 by a wide margin is seriously sub-optimal. I was finding that hard to believe, too, at least for CRC-32 (CRC-64 would take more code, so I'm not so sure about it). Is that 64-bit code you pointed us to before actually a CRC, or something else? It doesn't call itself a CRC, and I was having a hard time extracting anything definite (like the polynomial) from all the bit-pushing underbrush :-( regards, tom lane
Bruce Guenter <bruceg@em.ca> writes: >> Are you really saying MD5 was faster than CRC-32? > Yes. I expect it's because the operations used in MD5 are easily > parallelized, and operate on blocks of 64-bytes at a time, while the CRC > is mostly non-parallelizable, uses a table lookup, and operates on > single bytes. What MD5 implementation did you use? The one I have handy (the original RSA reference version) sure looks like it's more computation per byte than a CRC. regards, tom lane
On Fri, Dec 08, 2000 at 04:21:21PM -0500, Tom Lane wrote: > ncm@zembu.com (Nathan Myers) writes: > > Thinking about it, I suspect that any CRC implementation that can't outrun > > MD5 by a wide margin is seriously sub-optimal. > I was finding that hard to believe, too, at least for CRC-32 (CRC-64 > would take more code, so I'm not so sure about it). Would you like to see the simple benchmarking setup I used? The amount of code involved (once all the hashes are factored in) is fairly large, so I'm somewhat hesitant to just send it to the mailing list. > Is that 64-bit code you pointed us to before actually a CRC, or > something else? It doesn't call itself a CRC, and I was having a hard > time extracting anything definite (like the polynomial) from all the > bit-pushing underbrush :-( It isn't a CRC. It's a fingerprint. As you've mentioned, it doesn't have the guarantees against burst errors that a CRC would have, but it does have as good as random collision avoidance over any random data corruption. At least, that's what the author claims. My math isn't nearly good enough to verify such claims. -- Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
On Fri, Dec 08, 2000 at 03:38:09PM -0500, Tom Lane wrote: > Bruce Guenter <bruceg@em.ca> writes: > > MD5 is a cryptographic hash, which means (AFAIK) that ideally it is > > impossible to produce a collision using any other method than brute > > force attempts. > True but irrelevant. What we need to worry about is the probability > that a random error will be detected, Which I indicated immediately after the sentence you quoted. The probability that a random error will be detected is the same as the probability of a collision in the hash given two different inputs. The brute force note means that the probability of a collision is as good as random. > MD5 is designed for a purpose that really doesn't have much to do with > error detection, when you think about it. It says "you will have a hard > time computing a different string that produces the same hash as some > prespecified string". This is not the same as promising > better-than-random odds against a damaged copy of some string having the > same hash as the original. It does provide as-good-as-random odds against a damaged copy of some string having the same hash as the original -- nobody has been able to exhibit any collisions in MD5 (see http://cr.yp.to/papers/hash127.ps, page 18 for notes on this). > CRC, on the other hand, is specifically > designed for error detection, and for localized errors (such as a > corrupted byte or two) it does a provably better-than-random job. > For nonlocalized errors you don't get a guarantee, but you do get > same-as-random odds of detection (ie, 1 in 2^N for an N-bit CRC). For the log, the CRC's primary function (as far as I understand it) would be to guard against inconsistent transaction being treated as consistent data. Such inconsistent transactions would be partially written, resulting in errors much larger than a small burst. For guarding the actual record data, I agree with you 100% -- what we're likely to see is a few localized bytes with flipped bits due to hardware failure of one kind or another. However, if the data is really critical, an ECC may be more appropriate, but that would make the data significantly larger (9/8 for the algorithms I've seen). > I really doubt that MD5 can beat a CRC with the same number of output > bits for the purpose of error detection; Agreed. However, MD5 provides four times as many bits as the standard 32-bit CRC. (I think I initially suggested you could take an arbitrary 32 bits out of MD5 to provide a check code "as good as CRC-32". I now take that back. Due to the burst error nature of CRCs, nothing else could be as good as it, unless the alternate algorithm also made some guarantees, which MD5 definitely doesn't.) > (Wild-pointer > stomps on disk buffers are an example of the sort of thing that may > look like a burst error.) Actually, wild-pointer incidents involving disk buffers at the kernel level, from my experience, are characterized by content from one file appearing in another, which is distinctly different than a burst error, and more like what would be seen if a log record were partially written. -- Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
On Fri, Dec 08, 2000 at 04:30:58PM -0500, Tom Lane wrote: > Bruce Guenter <bruceg@em.ca> writes: > >> Are you really saying MD5 was faster than CRC-32? > > Yes. I expect it's because the operations used in MD5 are easily > > parallelized, and operate on blocks of 64-bytes at a time, while the CRC > > is mostly non-parallelizable, uses a table lookup, and operates on > > single bytes. > What MD5 implementation did you use? I used the GPL'ed implementation written by Ulrich Drepper in 1995. The code from OpenSSL looks identical in terms of the operations performed. > The one I have handy (the original > RSA reference version) sure looks like it's more computation per byte > than a CRC. The algorithm itself does use more computation per byte. However, the algorithm works on blocks of 64 bytes at a time. As well, the operations should be easily pipelined. On the other hand, the CRC code is largely serial, and highly dependant on a table lookup operation. -- Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
Bruce Guenter <bruceg@em.ca> writes: > Would you like to see the simple benchmarking setup I used? The amount > of code involved (once all the hashes are factored in) is fairly large, > so I'm somewhat hesitant to just send it to the mailing list. I agree, don't send it to the whole list. But I'd like a copy. regards, tom lane
Bruce Guenter <bruceg@em.ca> writes: >> I agree, don't send it to the whole list. But I'd like a copy. > Here you go. As near as I could tell, the test as you have it (one CRC computation per fread) is purely I/O bound. I changed the main loop to this: int main() { static char buf[8192]; size_t rd; hash_t hash; while (rd = fread(buf, 1, sizeof buf, stdin)) { int i; for (i = 0; i < 1000; i++) { init(&hash); update(&hash,buf, rd); } } return 0; } so as to get a reasonable amount of computation per fread. On an otherwise idle HP 9000 C180 machine, I get the following numbers on a 1MB input file: time benchcrc <random32 real 35.3 user 35.0 sys 0.0 time benchmd5 <random32 real 37.6 user 37.3 sys 0.0 This is a lot closer than I'd have expected, but it sure ain't "MD5 40% faster" as you reported. I wonder why the difference in results between your platform and mine? BTW, I used gcc 2.95.2 to compile, -O6, no other switches. regards, tom lane
A couple further observations while playing with this benchmark --- 1. This MD5 implementation is not too robust. On my machine it dumps core if given a non-word-aligned data buffer. We could probably work around that, but it bespeaks a little *too* much hand optimization... 2. It's a bad idea to ignore the startup/termination costs of the algorithms. Of course startup/termination is trivial for CRC, but it's not so trivial for MD5. I changed the code so that the md5 update() routine also calls md5_finish_ctx(), so that each inner loop represents a complete MD5 calculation for a message of the size of the main routine's fread buffer. I then experimented with different buffer sizes. At a buffer size of 1K: time benchcrc <random32 real 35.4 user 35.1 sys 0.0 time benchmd5 <random32 real 41.4 user 41.1 sys 0.0 At a buffer size of 100 bytes: time benchcrc <random32 real 36.3 user 36.0 sys 0.0 time benchmd5 <random32 real 1:09.7 user 1:09.2 sys 0.0 (The total amount of data processed is 1000 MB in either case, but it's divided into more messages in the second case.) I'm not sure exactly what Vadim has in mind for computing CRCs on the WAL log. If he's thinking of a CRC for each log message, the MD5 stuff would be at a definite disadvantage. For disk-page checksums (8K or more) this isn't too much of an issue, however. regards, tom lane
On Fri, Dec 08, 2000 at 09:28:38PM -0500, Tom Lane wrote: > Bruce Guenter <bruceg@em.ca> writes: > >> I agree, don't send it to the whole list. But I'd like a copy. > > Here you go. > As near as I could tell, the test as you have it (one CRC computation per > fread) is purely I/O bound. Nope. They got 99-100% CPU time with the original version. > I changed the main loop to this: > [...hash each block repeatedly...] Good idea. Might have been even better to just read the block once and hash it even more times. > On an > otherwise idle HP 9000 C180 machine, I get the following numbers on a > 1MB input file: > > time benchcrc <random32 > real 35.3 > user 35.0 > sys 0.0 > > time benchmd5 <random32 > real 37.6 > user 37.3 > sys 0.0 > > This is a lot closer than I'd have expected, but it sure ain't > "MD5 40% faster" as you reported. I wonder why the difference > in results between your platform and mine? The difference is likely because PA-RISC (like most other RISC architectures) lack a "roll" opcode that is very prevalent in the MD5 algorithm. Intel CPUs have it. With a new version modified to repeat the inner loop 100,000 times, I got the following: time benchcrc <random 21.35user 0.01system 0:21.39elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (79major+11minor)pagefaults 0swaps time benchmd5 <random 12.79user 0.01system 0:12.79elapsed 100%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (80major+11minor)pagefaults 0swaps time benchcrc <random 21.32user 0.06system 0:21.52elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (79major+11minor)pagefaults 0swaps time benchmd5 <random 12.79user 0.01system 0:12.80elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (80major+11minor)pagefaults 0swaps -- Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
Bruce Guenter <bruceg@em.ca> writes: >> This is a lot closer than I'd have expected, but it sure ain't >> "MD5 40% faster" as you reported. I wonder why the difference >> in results between your platform and mine? > The difference is likely because PA-RISC (like most other RISC > architectures) lack a "roll" opcode that is very prevalent in the MD5 > algorithm. A good theory, but unfortunately not a correct theory. PA-RISC can do a circular shift in one cycle using the "shift right double" instruction, with the same register specified as both high and low halves of the 64-bit operand. And gcc does know about that. After some groveling through assembly code, it seems that the CRC-32 implementation is about as tight as it could get: two loads, two XORs, and two EXTRU's per byte (one used to implement the right shift, the other to implement masking with 0xFF). And the wall clock timing does indeed come out to just about six cycles per byte. The MD5 code also looks pretty tight. Each basic OP requires either two or three logical operations (and/or/xor/not) depending on which round you're looking at, plus four additions and a circular shift. PA-RISC needs two cycles to load an arbitrary 32-bit constant, but other than that I see no wasted cycles here: ldil L'-1444681467,%r20xor %r3,%r14,%r19ldo R'-1444681467(%r20),%r20and %r1,%r19,%r19addl %r15,%r20,%r20xor %r14,%r19,%r19addl%r19,%r26,%r19addl %r20,%r19,%r15shd %r15,%r15,27,%r15addl %r15,%r3,%r15 Note gcc has been smart enough to assign all the correct_words[] array elements to registers, else we'd lose another cycle to a load operation --- fortunately PA-RISC has lots of registers. There are 64 of these basic OPs needed in each round, and each round processes 64 input bytes, so basically you can figure one OP per byte. Ignoring loop overhead and so forth, it's nine or ten cycles per byte for MD5 versus six for CRC. I'm at a loss to see how a Pentium would arrive at a better result for MD5 than for CRC. For one thing, it's going to be at a disadvantage because it hasn't got enough registers. I'd be interested to see the assembly code... regards, tom lane
On Fri, Dec 08, 2000 at 10:17:00PM -0500, Tom Lane wrote: > A couple further observations while playing with this benchmark --- > > 1. This MD5 implementation is not too robust. On my machine it dumps > core if given a non-word-aligned data buffer. We could probably work > around that, but it bespeaks a little *too* much hand optimization... The operations in the MD5 core are based on word-sized chunks. Obviously, the implentation only does word-sized loads and stores for that data, and you got a bus error. > 2. It's a bad idea to ignore the startup/termination costs of the > algorithms. Yes. I had included the startup costs in my benchmark, but not the termination costs, which are large for MD5 as you point out. > Of course startup/termination is trivial for CRC, but > it's not so trivial for MD5. I changed the code so that the md5 > update() routine also calls md5_finish_ctx(), so that each inner > loop represents a complete MD5 calculation for a message of the > size of the main routine's fread buffer. I then experimented with > different buffer sizes. At a buffer size of 1K: On my Celeron, at 1K blocks MD5 is still significantly faster than CRC, but is slightly slower at 100 byte blocks. For comparison, I added RIPEMD-160, but it's far slower than any of them (twice as long as CRC). -- Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
On Sat, Dec 09, 2000 at 06:46:23PM -0500, Tom Lane wrote: > Bruce Guenter <bruceg@em.ca> writes: > > The difference is likely because PA-RISC (like most other RISC > > architectures) lack a "roll" opcode that is very prevalent in the MD5 > > algorithm. > > A good theory, but unfortunately not a correct theory. PA-RISC can do a > circular shift in one cycle using the "shift right double" instruction, > with the same register specified as both high and low halves of the > 64-bit operand. And gcc does know about that. Interesting. I was under the impression that virtually no RISC CPU had a rotate instruction. Do any others? > After some groveling through assembly code, it seems that the CRC-32 > implementation is about as tight as it could get: two loads, two XORs, > and two EXTRU's per byte (one used to implement the right shift, the > other to implement masking with 0xFF). Same with the x86 core: movb %dl,%al xorb (%ecx),%al andl $255,%eax shrl $8,%edx incl %ecx xorl (%esi,%eax,4),%edx > And the wall clock timing does > indeed come out to just about six cycles per byte. On my Celeron, the timing for those six opcodes is almost whopping 13 cycles per byte. Obviously there's some major performance hit to do the memory instructions, because there's no more than 4 cycles worth of dependant instructions in that snippet. BTW, for reference, P3 timings are almost identical to those of the Celeron, so it's not causing problems outside the built-in caches common to the two chips. > The MD5 code also > looks pretty tight. Each basic OP requires either two or three logical > operations (and/or/xor/not) depending on which round you're looking at, > plus four additions and a circular shift. PA-RISC needs two cycles to > load an arbitrary 32-bit constant, but other than that I see no wasted > cycles here: > > ldil L'-1444681467,%r20 > xor %r3,%r14,%r19 > ldo R'-1444681467(%r20),%r20 > and %r1,%r19,%r19 > addl %r15,%r20,%r20 > xor %r14,%r19,%r19 > addl %r19,%r26,%r19 > addl %r20,%r19,%r15 > shd %r15,%r15,27,%r15 > addl %r15,%r3,%r15 Here's the x86 assembly code for what appears to be the same basic OP: movl %edx,%eax xorl %esi,%eax andl%edi,%eax xorl %esi,%eax movl -84(%ebp),%ecx leal -1444681467(%ecx,%eax),%eax addl %eax,%ebx roll $5,%ebx addl %edx,%ebx This is a couple fewer instructions, mainly saving on doing any loads to use the constant value. This takes almost exactly 9 cycles per byte. > There are 64 of these basic OPs needed in each round, and each round > processes 64 input bytes, so basically you can figure one OP per byte. > Ignoring loop overhead and so forth, it's nine or ten cycles per byte > for MD5 versus six for CRC. On Celeron/P3, CRC scores almost 13 cycles per byte, MD4 is about 6 cycles per byte, and MD5 is about 9 cycles per byte. On Pentium MMX, CRC is 7.25, MD4 is 7.5 and MD5 is 10.25. So, the newer CPUs actually do worse on CRC than the older ones do. Weirder and weirder. > I'm at a loss to see how a Pentium would arrive at a better result for > MD5 than for CRC. For one thing, it's going to be at a disadvantage > because it hasn't got enough registers. I agree. It would appear that the table lookup is causing a major bubble in the pipelines on the newer Celeron/P2/P3 CPUs. -- Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
On Sat, Dec 09, 2000 at 06:46:23PM -0500, Tom Lane wrote: > I'm at a loss to see how a Pentium would arrive at a better result for > MD5 than for CRC. For one thing, it's going to be at a disadvantage > because it hasn't got enough registers. I'd be interested to see the > assembly code... Minutiae aside, it's clear that the MD5 and CRC are "comparable", regardless of CPU. For a 32-bit hash, the proven characteristics of CRCs are critical in some applications. With a good 64-bit hash, the probability of any collision whether from a burst error or otherwise becomes much lower than every other systematic source of error -- the details just don't matter any more. If you miss the confidence that CRCs gave you about burst errors, consider how easy it would be to construct a collision if you could just try changing a couple of adjacent bytes -- an exhaustive search would be easy. MD4 would be a better choice than MD5, despite that a theoretical attack on MD4 has been described (albeit never executed). We don't even care about real attacks, never mind theoretical ones. What matters is that MD4 is entirely good enough, and faster to compute than MD5. I find these results very encouraging. BSD-licensed MD4 code is readily available, e.g. from any of the BSDs themselves. Nathan Myers ncm@zembu.com
ncm@zembu.com (Nathan Myers) writes: > Minutiae aside, it's clear that the MD5 and CRC are "comparable", > regardless of CPU. We've established that the inner loops are pretty comparable. I'm still concerned about the startup/termination costs of MD5 on short records. The numbers Bruce and I were trading were mostly for records long enough to make the startup costs negligible, but they're not negligible for sub-100-byte records. > For a 32-bit hash, the proven characteristics of CRCs are critical in > some applications. With a good 64-bit hash, the probability of any > collision whether from a burst error or otherwise becomes much lower > than every other systematic source of error -- the details just don't > matter any more. That's a good point. Of course the critical phrase there is *good* hash, ie, one without any systematic weaknesses, but as long as we don't use a "method chosen at random" you're right, it hardly matters. However, this just begs the question: can't the same be said of a 32-bit checksum? My argument the other day essentially was that 32 bits is plenty for what we need to do, and I have not heard a refutation. One thing we should look at before going with a 64-bit method is the extra storage space for the larger checksum. We can clearly afford an extra 32 bits for a checksum on an 8K disk page, but if Vadim is envisioning checksumming each individual XLOG record then the extra space is more annoying. Also, there's the KISS issue. When it takes less than a dozen lines to do a CRC, versus pages to do MD5, you have to ask yourself what the extra code space is buying you... also whether you want to get into licensing issues by borrowing someone else's code. The MD5 code that Bruce was using is GPL, not BSD, and so couldn't go into the Postgres core anyway. > MD4 would be a better choice than MD5, despite that a theoretical attack > on MD4 has been described (albeit never executed). We don't even care > about real attacks, never mind theoretical ones. What matters is that > MD4 is entirely good enough, and faster to compute than MD5. > I find these results very encouraging. BSD-licensed MD4 code is readily > available, e.g. from any of the BSDs themselves. MD4 would be worth looking at, especially if it has less startup/shutdown overhead than MD5. I think a 64-bit true CRC might also be worth looking at, just for completeness. But I don't know where to find code for one. regards, tom lane
Bruce Guenter <bruceg@em.ca> writes: >> A good theory, but unfortunately not a correct theory. PA-RISC can do a >> circular shift in one cycle using the "shift right double" instruction, > Interesting. I was under the impression that virtually no RISC CPU had > a rotate instruction. Do any others? Darn if I know. A RISC purist would probably say that PA-RISC isn't all that reduced ... for example, the reason it needs six cycles not seven for the CRC inner loop is that the LOAD instruction has an option to postincrement the pointer register (like a C "*ptr++"). > Same with the x86 core: > movb %dl,%al > xorb (%ecx),%al > andl $255,%eax > shrl $8,%edx > incl %ecx > xorl (%esi,%eax,4),%edx > On my Celeron, the timing for those six opcodes is almost whopping 13 > cycles per byte. Obviously there's some major performance hit to do the > memory instructions, because there's no more than 4 cycles worth of > dependant instructions in that snippet. Yes. It looks like we're looking at pipeline stalls for the memory reads. I expect PA-RISC would have the same problem if it were not that the CRC table and data buffer are almost certainly loaded into level-2 cache memory. Curious that you don't get the same result --- what is the memory cache architecture on your box? As Nathan remarks nearby, this is just minutiae, but I'm interested anyway... regards, tom lane
* Tom Lane <tgl@sss.pgh.pa.us> [001210 12:00] wrote: > Bruce Guenter <bruceg@em.ca> writes: > >> A good theory, but unfortunately not a correct theory. PA-RISC can do a > >> circular shift in one cycle using the "shift right double" instruction, > > > Interesting. I was under the impression that virtually no RISC CPU had > > a rotate instruction. Do any others? > > Darn if I know. A RISC purist would probably say that PA-RISC isn't all > that reduced ... for example, the reason it needs six cycles not seven > for the CRC inner loop is that the LOAD instruction has an option to > postincrement the pointer register (like a C "*ptr++"). > > > Same with the x86 core: > > movb %dl,%al > > xorb (%ecx),%al > > andl $255,%eax > > shrl $8,%edx > > incl %ecx > > xorl (%esi,%eax,4),%edx > > > On my Celeron, the timing for those six opcodes is almost whopping 13 > > cycles per byte. Obviously there's some major performance hit to do the > > memory instructions, because there's no more than 4 cycles worth of > > dependant instructions in that snippet. > > Yes. It looks like we're looking at pipeline stalls for the memory > reads. I expect PA-RISC would have the same problem if it were not that > the CRC table and data buffer are almost certainly loaded into level-2 > cache memory. Curious that you don't get the same result --- what is > the memory cache architecture on your box? > > As Nathan remarks nearby, this is just minutiae, but I'm interested > anyway... I would try unrolling the loop some (if possible) and retesting. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk."
On Sun, Dec 10, 2000 at 02:53:43PM -0500, Tom Lane wrote: > > On my Celeron, the timing for those six opcodes is almost whopping 13 > > cycles per byte. Obviously there's some major performance hit to do the > > memory instructions, because there's no more than 4 cycles worth of > > dependant instructions in that snippet. > Yes. It looks like we're looking at pipeline stalls for the memory > reads. In particular, for the single-byte memory read. By loading in 32-bit words at a time, the cost drops to about 7 cycles per byte. I imagine on a 64-bit CPU, loading 64-bit words at a time would drop the cost even further. word1 = *(unsigned long*)z; while (c > 4) { z += 4; ick = IUPDC32 (word1, ick); word1 >>= 8; c -= 4; ick = IUPDC32 (word1, ick); word1 >>= 8; word1 = *(unsigned long*)z; ick = IUPDC32 (word1, ick);word1 >>= 8; ick = IUPDC32 (word1, ick); } I tried loading two words at a time, starting to load the second word well before it's used, but that didn't actually reduce the time taken. > As Nathan remarks nearby, this is just minutiae, but I'm interested > anyway... Yup. -- Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
On Sun, Dec 10, 2000 at 12:24:59PM -0800, Alfred Perlstein wrote: > I would try unrolling the loop some (if possible) and retesting. The inner loop was already unrolled, but was only processing single bytes at a time. By loading in 32-bit words at once, it reduced the cost to only 7 cycles per byte (from 13). -- Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
PG should include support for SHA1 anyway. MD5 is not being used in new stuff anymore, I think. I actually have an SHA1 implementation that links into PG if anyone is interested (especially if it could get included in a future release). e
> Interesting. I was under the impression that virtually no RISC CPU had > a rotate instruction. Do any others? (fyi; doesn't really contribute to the thread :/ Most or all do. There are no "pure RISC" chips in production; all have had some optimized complex operations added for performance and for code compactness. - Thomas
On Sun, Dec 10, 2000 at 02:36:38PM -0500, Tom Lane wrote: > > MD4 would be a better choice than MD5, despite that a theoretical attack > > on MD4 has been described (albeit never executed). We don't even care > > about real attacks, never mind theoretical ones. What matters is that > > MD4 is entirely good enough, and faster to compute than MD5. > > > I find these results very encouraging. BSD-licensed MD4 code is readily > > available, e.g. from any of the BSDs themselves. > > MD4 would be worth looking at, especially if it has less > startup/shutdown overhead than MD5. I think a 64-bit true CRC might > also be worth looking at, just for completeness. But I don't know > where to find code for one. The startup/shutdown for MD4 is identical to that of MD5, however the inner loop is much smaller (a total of 48 operations instead of 64, with fewer constants). The inner MD4 loop is about 1.5 times the speed of MD5. -- Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/
On Thu, Dec 07, 2000 at 07:36:33PM -0500, Tom Lane wrote: > ncm@zembu.com (Nathan Myers) writes: > > 2. I disagree with way the above statistics were computed. That eleven > > million-year figure gets whittled down pretty quickly when you > > factor in all the sources of corruption, even without crashes. > > (Power failures are only one of many sources of corruption.) They > > grow with the size and activity of the database. Databases are > > getting very large and busy indeed. > > Sure, but the argument still holds. If the net MTBF of your underlying > system is less than a day, it's too unreliable to run a database that > you want to trust. Doesn't matter what the contributing failure > mechanisms are. In practice, I'd demand an MTBF of a lot more than a > day before I'd accept a hardware system as satisfactory... In many intended uses (such as Landmark's original plan?) it is not just one box, but hundreds or thousands. With thousands of databases deployed, the MTBF (including power outages) for commodity hardware is well under a day, and there's not much you can do about that. In a large database (e.g. 64GB) you have 8M blocks. Each hash covers one block. With a 32-bit checksum, when you check one block, you have a 2^(-32) likelihood of missing an error, assuming there is one. With 8M blocks, you can only claim a 2^(-9) chance. This is what I meant by "whittling". A factor of ten or a thousand here, another there, and pretty soon the possibility of undetected corruption is something that can't reasonably be ruled out. > > 3. Many users clearly hope to be able to pull the plug on their hardware > > and get back up confidently. While we can't promise they won't have > > to go to their backups, we should at least be equipped to promise, > > with confidence, that they will know whether they need to. > > And the difference in odds between 2^32 and 2^64 matters here? I made > a numerical case that it doesn't, and you haven't refuted it. By your > logic, we might as well say that we should be using a 128-bit CRC, or > 256-bit, or heck, a few kilobytes. It only takes a little longer to go > up each step, right, so where should you stop? I say MTBF measured in > megayears ought to be plenty. Show me the numerical argument that 64 > bits is the right place on the curve. I agree that this is a reasonable question. However, the magic of exponential growth makes any dissatisfaction with a 64-bit checksum far less likely than with a 32-bit checksum. It would forestall any such problems to arrange a configure-time flag such as "--with-checksum crc-32" or "--with-checksum md4", and make it clear where to plug in the checksum of one's choice. Then, ship 7.2 with just crc-32 and let somebody else produce patches for alternatives if they want them. BTW, I have been looking for Free 64-bit CRC codes/polynomials and the closest thing I have found so far was Mark Mitchell's hash, translated from the Modula-3 system. All the tape drive makers advertise (but don't publish (AFAIK)) a 64-bit CRC. A reasonable approach would be to deliver CRC-32 in 7.2, and then reconsider the default later if anybody contributes good alternatives. Nathan Myers ncm@zembu.com
On Sat, Dec 09, 2000 at 12:03:52AM +1100, Horst Herb wrote: > AFAIK the thread for "built in" crcs referred only to CRCs in > the transaction log. We have been discussing checksums for both the table blocks and for the transaction log. > Always remember that a psotgres data base on the harddisk can be > manipulated accidentally / maliciously without postgres even running. > These are the cases where you need row level CRCs. "There is no security without physical security." If somebody can change the row contents, they can also change the row and/or block checksum to match. Nathan Myers ncm@zembu.com
O > > Always remember that a psotgres data base on the harddisk can be > > manipulated accidentally / maliciously without postgres even running. > > These are the cases where you need row level CRCs. > > "There is no security without physical security." If somebody can > change the row contents, they can also change the row and/or block > checksum to match. They may, but in a proper setup they won't be able to access the CRC log files. That way, you can still detect alterations. I presume anyway that most alterations would be rather accidental than malicious, and in that case the CRC is extremely helpful Horst