Re: cyclical redundancy checksum algorithm(s)? - Mailing list pgsql-general

From Teodor Sigaev
Subject Re: cyclical redundancy checksum algorithm(s)?
Date
Msg-id 451B7F8E.30303@sigaev.ru
Whole thread Raw
In response to Re: cyclical redundancy checksum algorithm(s)?  ("Karen Hill" <karen_hill22@yahoo.com>)
List pgsql-general
>> You sure that's actually what he said?  A change in CRC proves the data
>> changed, but lack of a change does not prove it didn't.
>
> "To quickly determine if rows have changed, we rely on a cyclic
> redundancy checksum (CRC) algorithm.   If the CRC is identical for the
 >
>> "summary" functions, such as an MD5 hash.  I wouldn't trust it at all
>> with a 32-bit CRC, and not much with a 64-bit CRC.  Too much risk of
>> collision.

Small example of collisions for crc32:
0x38ee5531
         Hundemarke
         92294
0x59471e4f
         raciner
         tranchefiler
0x947bb6c0
         Betriebsteile
         4245


I had make a lot of work when choosing hash function for tsearch2. Also, I had
find that popular hash algorithms produce more collision for non-ascii
languages... CRC32 is more "smooth".
On dictionary with 332296 unique words CRC32 produces 11 collisions, perl's hash
function - 35, pgsql's hash_any - 12.

--
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
                                                    WWW: http://www.sigaev.ru/

pgsql-general by date:

Previous
From: Matthias.Pitzl@izb.de
Date:
Subject: Definition of return types for own functions?
Next
From: Teodor Sigaev
Date:
Subject: Re: Full Text fuzzy search