On Sat, Sep 13, 2014 at 1:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Andres Freund <andres@2ndquadrant.com> writes: > On 2014-09-13 08:52:33 +0300, Ants Aasma wrote: >> On Sat, Sep 13, 2014 at 6:59 AM, Arthur Silva <arthurprs@gmail.com> wrote: >>> That's not entirely true. CRC-32C beats pretty much everything with the same >>> length quality-wise and has both hardware implementations and highly >>> optimized software versions.
>> For better or for worse CRC is biased by detecting all single bit >> errors, the detection capability of larger errors is slightly >> diminished. The quality of the other algorithms I mentioned is also >> very good, while producing uniformly varying output.
> There's also much more literature about the various CRCs in comparison > to some of these hash allgorithms.
Indeed. CRCs have well-understood properties for error detection. Have any of these new algorithms been analyzed even a hundredth as thoroughly? No. I'm unimpressed by evidence-free claims that something else is "also very good".
Now, CRCs are designed for detecting the sorts of short burst errors that are (or were, back in the day) common on phone lines. You could certainly make an argument that that's not the type of threat we face for PG data. However, I've not seen anyone actually make such an argument, let alone demonstrate that some other algorithm would be better. To start with, you'd need to explain precisely what other error pattern is more important to defend against, and why.
regards, tom lane
Mysql went this way as well, changing the CRC polynomial in 5.6.
What we are looking for here is uniqueness thus better error detection. Not avalanche effect, nor cryptographically secure, nor bit distribution.
As far as I'm aware CRC32C is unbeaten collision wise and time proven.
I couldn't find tests with xxhash and crc32 on the same hardware so I spent some time putting together a benchmark (see attachment, to run it just start run.sh)
I included a crc32 implementation using ssr4.2 instructions (which works on pretty much any Intel processor built after 2008 and AMD built after 2012),
a portable Slice-By-8 software implementation and xxhash since it's the fastest software 32bit hash I know of.
Here're the results running the test program on my i5-4200M
The hardware version is insanely and works on the majority of Postgres setups and the fallback software implementations is 2.8x slower than the fastest 32bit hash around.