On Fri, Apr 26, 2013 at 10:57 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Fri, Apr 26, 2013 at 7:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> I'm expecting to spend some time on this over the weekend, once I've
>> re-read the thread and patches to see if there is something to commit.
>>
>> That's my last time window, so this looks like the last chance to make
>> changes before beta.
>
> I updated the patch and split it into two parts (attached).
>
> The first patch is the checksum algorithm itself. I have done
> some documentation updates and moved it into the C file (rather
> than the README), but further explanation of the "shift right 3"
> modification will need to be by Ants or Florian.
>
> The second patch adds the configure-time check for the right
> compilation flags, and uses them when compiling checksum.c. I
> called the new variable CFLAGS_EXTRA, for lack of a better idea,
> so feel free to come up with a new name. It doesn't check for, or
> use, -msse4.1, but that can be specified by the user by
> configuring with CFLAGS_EXTRA="-msse4.1".
>
> I don't know of any more required changes, aside from
> documentation improvements.
I have updated the base patch. This is supposed to go under the
cflags-vector patch that Jeff posted yesterday.
I had the opportunity to run a few more tests of the hash. Based on
the tests I switched the shift-right operation from 3 to 17bits (the
original value was chosen by gut feel). Avalanche tests showed that
this value removed bias the quickest. You can see the difference in
the attached image, colors are still black 0% bias, blue 5%, green
33%, yellow 75%, red 100%. The final box in the diagram is covered by
the final mixing iteration. The take away from these diagrams is 17
mixes better than 3. 17 still has some residual bias for the final
iteration on the page. The effective information content values in
checksum for 16 high order bits on final 32 32bit words on the page
are: 16.0 15.1 14.1 13.3 12.6 12.1 11.8 11.7 11.5 11.5 11.1 11.0 10.9
10.6 10.6 10.4. Error detection capability for the highest bit is
therefore 1:1351. Based on this I also switched to using two
iterations of zeroes at the end, this way the lowest information
content is 15.976bits or 1:64473.
Documentation changes:
* reworded the algorithm description so the order of steps is more apparent.
* added a link to the FNV reference page.
* fixed note about FNV being 4 bytes at a time. Official variant is 1
byte at a time.
* added a segment of why the algorithm was chosen and its error
detection capabilities.
* added a segment about how the code affects vectorization.
Issue to decide before commiting:
* Whether to use 32 or 64 parallel checksums. The tradeoff for 64 is a
small performance hit (10-20%) on todays CPUs for a small performance
gain on Haswell processors coming to market this year and up to a
theoretical 2x performance gain on future CPUs. Changing this is just
a matter of changing N_SUMS and updating documentation to match.
Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de