Hi Heikki.
I've attached two regenerated CRC patches, split up as before.
1. The slicing-by-8 patch contains numerous changes:
a. A configure test for __builtin_bswap32
b. A comment referencing the slicing-by-8 paper (which is behind a
paywall, unfortunately, so I haven't even read it). Are more
comments needed? If so, where/what kind?
c. A byte-reversed big-endian version of the 8*256 table. In Linux,
there's only one table that uses __constant_swab32, but for us
it's simpler to have two tables.
d. Thanks to (c), we can avoid the bswap32 in the hot loop.
e. On big-endian systems, FIN_CRC32C now bswap32()s the CRC before
finalising it. (We don't need to do this in INIT_CRC32C only
because the initialiser is 0xFFFFFFFF.)
2. The sse4.2 patch has only some minor compile fixes.
I have built and tested both patches individually on little-endian
(amd64) and big-endian (ppc) systems. I verified that the _sse is
chosen at startup on the former, and _sb8 on the latter, and that
both implementations function correctly with respect to HEAD.
Please let me know if there's anything else I need to do.
-- Abhijit