Hi.
OK, here are the patches with the various suggestions applied.
I found that the alignment didn't seem to make much difference for the
CRC32* instructions, so I changed to process (len/8)*8bytes followed by
(len%8)*1bytes, the way the Linux kernel does.
-- Abhijit