On 3/15/13 5:32 AM, Ants Aasma wrote:
> Best case using the CRC32 instruction would be 6.8 bytes/cycle [1].
> But this got me thinking about how to do this faster...
> [1] http://www.drdobbs.com/parallel/fast-parallelized-crc-computation-using/229401411
The optimization work you went through here looked very nice.
Unfortunately, a few things seem pushing toward using a CRC16 instead of
the Fletcher approach. It seems possible to execute a CRC16 in a
reasonable enough time, in the same neighborhood as the Fletcher one.
And there is some hope that hardware acceleration for CRCs will be
available in a system API/compiler feature one day, making them even
cheaper.
Ants, do you think you could take a similar look at optimizing a CRC16
calculation? I'm back to where I can do a full performance comparison
run again starting tomorrow, with the latest version of this patch, and
I'd like to do that with a CRC16 implementation or two. I'm not sure if
it's possible to get a quicker implementation because the target is a
CRC16, or whether it's useful to consider truncating a CRC32 into a CRC16.
--
Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com