Substituting Checksum Algorithm (was: Enabling Checksums) - Mailing list pgsql-hackers

Starting a new thread to more narrowly address the topic.

Attached is my reorganization of Ants's patch here:

http://www.postgresql.org/message-id/CA
+CSw_vinyf-w45i=M1m__MpJZY=e8S4Nt_KNnpEbtWjTOaSUA@mail.gmail.com

My changes:

* wrest the core FNV algorithm from the specific concerns of a data page
   - PageCalcChecksum16 mixes the block number, reduces to 16 bits,
     and avoids the pd_checksum field
   - the checksum algorithm is just a pure block checksum with a 32-bit
     result
* moved the FNV checksum into a separate file, checksum.c
* added Ants's suggested compilation flags for better optimization
* slight update to the paragraph in the README that discusses concerns
specific to a data page

I do have a couple questions/concerns about Ants's patch though:

* The README mentions a slight bias; does that come from the mod
(2^16-1)? That's how I updated the README, so I wanted to make sure.
* How was the FNV_PRIME chosen?
* I can't match the individual algorithm step as described in the README
to the actual code. And the comments in the README don't make it clear
enough which one is right (or maybe they both are, and I'm just tired).

The README says:

   hash = (hash ^ value) * ((hash ^ value) >> 3)

(which is obviously missing the FNV_PRIME factor) and the code says:

   +#define CHECKSUM_COMP(checksum, value) do {\
   +       uint32 __tmp = (checksum) ^ (value);\
   +       (checksum) = __tmp * FNV_PRIME ^ (__tmp >> 3);\
   +} while (0)

I'm somewhat on the fence about the "shift right". It was discussed in
this part of the thread:

http://www.postgresql.org/message-id/99343716-5F5A-45C8-B2F6-74B9BA357396@phlo.org

I think we should be able to state with a little more clarity in the
README why there is a problem with plain FNV-1a, and why this
modification is both effective and safe.

I'd lean toward simplicity and closer adherence to the published version
of the algorithm rather than detecting a few more obscure error
patterns. It looks like the modification slows down the algorithm, too.

Regards,
    Jeff Davis


Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Enabling Checksums
Next
From: Florian Pflug
Date:
Subject: Re: Substituting Checksum Algorithm (was: Enabling Checksums)