Re: Substituting Checksum Algorithm (was: Enabling Checksums) - Mailing list pgsql-hackers
From | Ants Aasma |
---|---|
Subject | Re: Substituting Checksum Algorithm (was: Enabling Checksums) |
Date | |
Msg-id | CA+CSw_temxBkeyQYykJfLea11LvTT0vYVjp4y6sVjnAraVWbRA@mail.gmail.com Whole thread Raw |
In response to | Substituting Checksum Algorithm (was: Enabling Checksums) (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Substituting Checksum Algorithm (was: Enabling
Checksums)
|
List | pgsql-hackers |
<p dir="ltr">On Apr 23, 2013 10:17 AM, "Jeff Davis" <<a href="mailto:pgsql@j-davis.com">pgsql@j-davis.com</a>> wrote:<br/> > Attached is my reorganization of Ants's patch here:<br /> ><br /> > <a href="http://www.postgresql.org/message-id/CA">http://www.postgresql.org/message-id/CA</a><br/> > +CSw_vinyf-w45i=M1m__MpJZY=<a href="mailto:e8S4Nt_KNnpEbtWjTOaSUA@mail.gmail.com">e8S4Nt_KNnpEbtWjTOaSUA@mail.gmail.com</a><pdir="ltr">Thanks for yourhelp. Some notes below.<p dir="ltr">> My changes:<br /> ><br /> > * wrest the core FNV algorithm from the specificconcerns of a data page<br /> > - PageCalcChecksum16 mixes the block number, reduces to 16 bits,<br /> > and avoids the pd_checksum field<br /> > - the checksum algorithm is just a pure block checksum with a 32-bit<br/> > result<br /> > * moved the FNV checksum into a separate file, checksum.c<p dir="ltr">I think thefunction should not be called checksum_fnv as it implies that we use the well known straightforward implementation. Maybechecksum_block or some other generic name.<p dir="ltr">> * added Ants's suggested compilation flags for better optimization<pdir="ltr">-msse4.1 is not safe to use in default builds. On the other hand it doesn't hurt to just specifyit in CFLAGS for the whole compile (possibly as -march=native). We should just omit it and mention somewhere thatSSE4.1 enabled builds will have better checksum performance.<p dir="ltr">> * slight update to the paragraph in theREADME that discusses concerns<br /> > specific to a data page<br /> ><br /> > I do have a couple questions/concernsabout Ants's patch though:<br /> ><br /> > * The README mentions a slight bias; does that come fromthe mod<br /> > (2^16-1)? That's how I updated the README, so I wanted to make sure.<p dir="ltr">Yes.<p dir="ltr">>* How was the FNV_PRIME chosen?<p dir="ltr">I still haven't found the actual source for this value. It's specifiedas the value to use for 32bit FNV-1a.<p dir="ltr">> * I can't match the individual algorithm step as describedin the README<br /> > to the actual code. And the comments in the README don't make it clear<br /> > enoughwhich one is right (or maybe they both are, and I'm just tired).<p dir="ltr">I will try to reword.<p dir="ltr">>The README says:<br /> ><br /> > hash = (hash ^ value) * ((hash ^ value) >> 3)<br /> ><br/> > (which is obviously missing the FNV_PRIME factor) and the code says:<br /> ><br /> > +#define CHECKSUM_COMP(checksum,value) do {\<br /> > + uint32 __tmp = (checksum) ^ (value);\<br /> > + (checksum)= __tmp * FNV_PRIME ^ (__tmp >> 3);\<br /> > +} while (0)<br /> ><br /> > I'm somewhat on thefence about the "shift right". It was discussed in<br /> > this part of the thread:<br /> ><br /> > <a href="http://www.postgresql.org/message-id/99343716-5F5A-45C8-B2F6-74B9BA357396@phlo.org">http://www.postgresql.org/message-id/99343716-5F5A-45C8-B2F6-74B9BA357396@phlo.org</a><br />><br /> > I think we should be able to state with a little more clarity in the<br /> > README why there is a problemwith plain FNV-1a, and why this<br /> > modification is both effective and safe.<p dir="ltr">Florian already mentionedwhy it's effective. I have an intuition why it's safe, will try to come up with a well reasoned argument.<p dir="ltr">Regards,<br/> Antd Aasma
pgsql-hackers by date: