Re: [REVIEW] Re: Compression of full-page-writes - Mailing list pgsql-hackers

From Claudio Freire
Subject Re: [REVIEW] Re: Compression of full-page-writes
Date
Msg-id CAGTBQpYPBaxWKSUMab4PHo-jgs7nqXSte772BsdZghE_Odd-Dw@mail.gmail.com
Whole thread Raw
In response to Re: [REVIEW] Re: Compression of full-page-writes  ("ktm@rice.edu" <ktm@rice.edu>)
List pgsql-hackers
On Sat, Sep 13, 2014 at 10:27 PM, ktm@rice.edu <ktm@rice.edu> wrote:
>> Here're the results running the test program on my i5-4200M
>>
>> crc sb8: 90444623
>> elapsed: 0.513688s
>> speed: 1.485220 GB/s
>>
>> crc hw: 90444623
>> elapsed: 0.048327s
>> speed: 15.786877 GB/s
>>
>> xxhash: 7f4a8d5
>> elapsed: 0.182100s
>> speed: 4.189663 GB/s
>>
>> The hardware version is insanely and works on the majority of Postgres
>> setups and the fallback software implementations is 2.8x slower than the
>> fastest 32bit hash around.
>>
>> Hopefully it'll be useful in the discussion.
>
> Thank you for running this sample benchmark. It definitely shows that the
> hardware version of the CRC is very fast, unfortunately it is really only
> available on x64 Intel/AMD processors which leaves all the rest lacking.
> For current 64-bit hardware, it might be instructive to also try using
> the XXH64 version and just take one half of the hash. It should come in
> at around 8.5 GB/s, or very nearly the speed of the hardware accelerated
> CRC. Also, while I understand that CRC has a very venerable history and
> is well studied for transmission type errors, I have been unable to find
> any research on its applicability to validating file/block writes to a
> disk drive. While it is to quote you "unbeaten collision wise", xxhash,
> both the 32-bit and 64-bit version are its equal. Since there seems to
> be a lack of research on disk based error detection versus CRC polynomials,
> it seems likely that any of the proposed hash functions are on an equal
> footing in this regard. As Andres commented up-thread, xxhash comes along
> for "free" with lz4.


Bear in mind that

a) taking half of the CRC will invalidate all error detection
capability research, and it may also invalidate its properties,
depending on the CRC itself.

b) bit corruption as is the target kind of error for CRC are resurging
in SSDs, as can be seen in table 4 of a link that I think appeared on
this same list:
https://www.usenix.org/system/files/conference/fast13/fast13-final80.pdf

I would totally forget of taking half of whatever CRC. That's looking
for pain, in that it will invalidate all existing and future research
on that hash/CRC type.



pgsql-hackers by date:

Previous
From: Arthur Silva
Date:
Subject: Re: [REVIEW] Re: Compression of full-page-writes
Next
From: Peter Geoghegan
Date:
Subject: Re: PoC: Partial sort