Re: Substituting Checksum Algorithm (was: Enabling Checksums) - Mailing list pgsql-hackers

From Ants Aasma
Subject Re: Substituting Checksum Algorithm (was: Enabling Checksums)
Date
Msg-id CA+CSw_temxBkeyQYykJfLea11LvTT0vYVjp4y6sVjnAraVWbRA@mail.gmail.com
Whole thread Raw
In response to Substituting Checksum Algorithm (was: Enabling Checksums)  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Substituting Checksum Algorithm (was: Enabling Checksums)
List pgsql-hackers
<p dir="ltr">On Apr 23, 2013 10:17 AM, "Jeff Davis" <<a href="mailto:pgsql@j-davis.com">pgsql@j-davis.com</a>>
wrote:<br/> > Attached is my reorganization of Ants's patch here:<br /> ><br /> > <a
href="http://www.postgresql.org/message-id/CA">http://www.postgresql.org/message-id/CA</a><br/> >
+CSw_vinyf-w45i=M1m__MpJZY=<a
href="mailto:e8S4Nt_KNnpEbtWjTOaSUA@mail.gmail.com">e8S4Nt_KNnpEbtWjTOaSUA@mail.gmail.com</a><pdir="ltr">Thanks for
yourhelp. Some notes below.<p dir="ltr">> My changes:<br /> ><br /> > * wrest the core FNV algorithm from the
specificconcerns of a data page<br /> >    - PageCalcChecksum16 mixes the block number, reduces to 16 bits,<br />
>     and avoids the pd_checksum field<br /> >    - the checksum algorithm is just a pure block checksum with a
32-bit<br/> >      result<br /> > * moved the FNV checksum into a separate file, checksum.c<p dir="ltr">I think
thefunction should not be called checksum_fnv as it implies that we use the well known straightforward implementation.
Maybechecksum_block or some other generic name.<p dir="ltr">> * added Ants's suggested compilation flags for better
optimization<pdir="ltr">-msse4.1 is not safe to use in default builds. On the other hand it doesn't hurt to just
specifyit in CFLAGS for the whole compile (possibly as -march=native). We should just omit it and mention somewhere
thatSSE4.1 enabled builds will have better checksum performance.<p dir="ltr">> * slight update to the paragraph in
theREADME that discusses concerns<br /> > specific to a data page<br /> ><br /> > I do have a couple
questions/concernsabout Ants's patch though:<br /> ><br /> > * The README mentions a slight bias; does that come
fromthe mod<br /> > (2^16-1)? That's how I updated the README, so I wanted to make sure.<p dir="ltr">Yes.<p
dir="ltr">>* How was the FNV_PRIME chosen?<p dir="ltr">I still haven't found the actual source for this value. It's
specifiedas the value to use for 32bit FNV-1a.<p dir="ltr">> * I can't match the individual algorithm step as
describedin the README<br /> > to the actual code. And the comments in the README don't make it clear<br /> >
enoughwhich one is right (or maybe they both are, and I'm just tired).<p dir="ltr">I will try to reword.<p
dir="ltr">>The README says:<br /> ><br /> >    hash = (hash ^ value) * ((hash ^ value) >> 3)<br />
><br/> > (which is obviously missing the FNV_PRIME factor) and the code says:<br /> ><br /> >    +#define
CHECKSUM_COMP(checksum,value) do {\<br /> >    +       uint32 __tmp = (checksum) ^ (value);\<br /> >    +      
(checksum)= __tmp * FNV_PRIME ^ (__tmp >> 3);\<br /> >    +} while (0)<br /> ><br /> > I'm somewhat on
thefence about the "shift right". It was discussed in<br /> > this part of the thread:<br /> ><br /> > <a
href="http://www.postgresql.org/message-id/99343716-5F5A-45C8-B2F6-74B9BA357396@phlo.org">http://www.postgresql.org/message-id/99343716-5F5A-45C8-B2F6-74B9BA357396@phlo.org</a><br
/>><br /> > I think we should be able to state with a little more clarity in the<br /> > README why there is a
problemwith plain FNV-1a, and why this<br /> > modification is both effective and safe.<p dir="ltr">Florian already
mentionedwhy it's effective. I have an intuition why it's safe, will try to come up with a well reasoned argument.<p
dir="ltr">Regards,<br/> Antd Aasma 

pgsql-hackers by date:

Previous
From: Florian Pflug
Date:
Subject: Re: Substituting Checksum Algorithm (was: Enabling Checksums)
Next
From: Andres Freund
Date:
Subject: Re: Substituting Checksum Algorithm (was: Enabling Checksums)