On Fri, Dec 08, 2000 at 01:58:12PM -0500, Tom Lane wrote:
> Bruce Guenter <bruceg@em.ca> writes:
> > ... Taking an
> > arbitrary 32 bits of a MD5 would likely be less collision prone than
> > using a 32-bit CRC, and it appears faster as well.
>
> ... but that would be an algorithm that you know NOTHING about the
> properties of. What is your basis for asserting it's better than CRC?
MD5 is a cryptographic hash, which means (AFAIK) that ideally it is
impossible to produce a collision using any other method than brute
force attempts. In other words, any stream of input to the hash that is
longer than the hash length (8 bytes for MD5) is equally probable to
produce a given hash code.
> CRC is pretty well studied and its error-detection behavior is known
> (and good). MD5 has been studied less thoroughly AFAIK, and in any
> case what's known about its behavior is that the *entire* MD5 output
> provides a good signature for a datastream. If you pick some ad-hoc
> method like taking a randomly chosen subset of MD5's output bits,
> you really don't know anything at all about what the error-detection
> properties of the method are.
Actually, in my reading reagarding the properties of MD5, I read an
article that stated that if a smaller number of bits was desired, one
could either (and here's where my memory fails me) just select the
middle N bits from the resulting hash, or fold the hash using XOR until
the desired number of bits was reached. I'll see if I can find a
reference...
RFC2289 (http://www.ietf.org/rfc/rfc2289.txt) includes an algorithm for
folding MD5 digests down to 64 bits by XORing the top half with the
bottom half. See appendix A.
--
Bruce Guenter <bruceg@em.ca> http://em.ca/~bruceg/