Home > mailing lists

Re: [REVIEW] Re: Compression of full-page-writes - Mailing list pgsql-hackers

From	Arthur Silva
Subject	Re: [REVIEW] Re: Compression of full-page-writes
Date	September 14, 2014 03:51:44
Msg-id	CAO_YK0W5Jp2m2jhn20oF80nvxHWfAnhdQ-1ZABrrG4MKZuAiQg@mail.gmail.com Whole thread Raw
In response to	Re: [REVIEW] Re: Compression of full-page-writes (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: [REVIEW] Re: Compression of full-page-writes
List	pgsql-hackers

Tree view

On Sat, Sep 13, 2014 at 1:55 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Andres Freund <andres@2ndquadrant.com> writes:
> On 2014-09-13 08:52:33 +0300, Ants Aasma wrote:
>> On Sat, Sep 13, 2014 at 6:59 AM, Arthur Silva <arthurprs@gmail.com> wrote:
>>> That's not entirely true. CRC-32C beats pretty much everything with the same
>>> length quality-wise and has both hardware implementations and highly
>>> optimized software versions.

>> For better or for worse CRC is biased by detecting all single bit
>> errors, the detection capability of larger errors is slightly
>> diminished. The quality of the other algorithms I mentioned is also
>> very good, while producing uniformly varying output.

> There's also much more literature about the various CRCs in comparison
> to some of these hash allgorithms.

Indeed. CRCs have well-understood properties for error detection.
Have any of these new algorithms been analyzed even a hundredth as
thoroughly? No. I'm unimpressed by evidence-free claims that
something else is "also very good".

Now, CRCs are designed for detecting the sorts of short burst errors
that are (or were, back in the day) common on phone lines. You could
certainly make an argument that that's not the type of threat we face
for PG data. However, I've not seen anyone actually make such an
argument, let alone demonstrate that some other algorithm would be better.
To start with, you'd need to explain precisely what other error pattern
is more important to defend against, and why.

regards, tom lane

Mysql went this way as well, changing the CRC polynomial in 5.6.

What we are looking for here is uniqueness thus better error detection. Not avalanche effect, nor cryptographically secure, nor bit distribution.

As far as I'm aware CRC32C is unbeaten collision wise and time proven.

I couldn't find tests with xxhash and crc32 on the same hardware so I spent some time putting together a benchmark (see attachment, to run it just start run.sh)

I included a crc32 implementation using ssr4.2 instructions (which works on pretty much any Intel processor built after 2008 and AMD built after 2012),

a portable Slice-By-8 software implementation and xxhash since it's the fastest software 32bit hash I know of.

Here're the results running the test program on my i5-4200M

crc sb8: 90444623
elapsed: 0.513688s
speed: 1.485220 GB/s

crc hw: 90444623
elapsed: 0.048327s
speed: 15.786877 GB/s

xxhash: 7f4a8d5
elapsed: 0.182100s
speed: 4.189663 GB/s

The hardware version is insanely and works on the majority of Postgres setups and the fallback software implementations is 2.8x slower than the fastest 32bit hash around.

Hopefully it'll be useful in the discussion.

Attachment

bench.zip

pgsql-hackers by date:

From: Peter Eisentraut
Date: 14 September 2014, 03:46:53
Subject: Re: run xmllint during build (was Re: need xmllint on borka)

From: Tom Lane
Date: 14 September 2014, 04:06:35
Subject: Re: Audit of logout

Re: [REVIEW] Re: Compression of full-page-writes - Mailing list pgsql-hackers

Attachment

Previous

Next