Thread: Substituting Checksum Algorithm (was: Enabling Checksums)

Substituting Checksum Algorithm (was: Enabling Checksums)

From
Jeff Davis
Date:
Starting a new thread to more narrowly address the topic.

Attached is my reorganization of Ants's patch here:

http://www.postgresql.org/message-id/CA
+CSw_vinyf-w45i=M1m__MpJZY=e8S4Nt_KNnpEbtWjTOaSUA@mail.gmail.com

My changes:

* wrest the core FNV algorithm from the specific concerns of a data page
   - PageCalcChecksum16 mixes the block number, reduces to 16 bits,
     and avoids the pd_checksum field
   - the checksum algorithm is just a pure block checksum with a 32-bit
     result
* moved the FNV checksum into a separate file, checksum.c
* added Ants's suggested compilation flags for better optimization
* slight update to the paragraph in the README that discusses concerns
specific to a data page

I do have a couple questions/concerns about Ants's patch though:

* The README mentions a slight bias; does that come from the mod
(2^16-1)? That's how I updated the README, so I wanted to make sure.
* How was the FNV_PRIME chosen?
* I can't match the individual algorithm step as described in the README
to the actual code. And the comments in the README don't make it clear
enough which one is right (or maybe they both are, and I'm just tired).

The README says:

   hash = (hash ^ value) * ((hash ^ value) >> 3)

(which is obviously missing the FNV_PRIME factor) and the code says:

   +#define CHECKSUM_COMP(checksum, value) do {\
   +       uint32 __tmp = (checksum) ^ (value);\
   +       (checksum) = __tmp * FNV_PRIME ^ (__tmp >> 3);\
   +} while (0)

I'm somewhat on the fence about the "shift right". It was discussed in
this part of the thread:

http://www.postgresql.org/message-id/99343716-5F5A-45C8-B2F6-74B9BA357396@phlo.org

I think we should be able to state with a little more clarity in the
README why there is a problem with plain FNV-1a, and why this
modification is both effective and safe.

I'd lean toward simplicity and closer adherence to the published version
of the algorithm rather than detecting a few more obscure error
patterns. It looks like the modification slows down the algorithm, too.

Regards,
    Jeff Davis


Attachment

Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Florian Pflug
Date:
On Apr23, 2013, at 09:17 , Jeff Davis <pgsql@j-davis.com> wrote:
> I'd lean toward simplicity and closer adherence to the published version
> of the algorithm rather than detecting a few more obscure error
> patterns. It looks like the modification slows down the algorithm, too.

The pattern that plain FNV1 misses are not that obscure, unfortunately.
With plain FNV1A, the n-th bit of an input word (i.e. 32-bit block) only
affects bits n through 31 of the checksum. In particular, the most
significant bit of every 32-bit block only affects the MSB of the checksum,
making the algorithm miss any even number of flipped MSBs. More generally,
any form of data corruption that affects only the top N bits are missed
at least once out of 2^N times, since changing only those bits cannot
yield more than 2^N different checksum values.

Such corruption pattern may not be especially likely, given that we're
mainly protecting against disk corruption, not memory corruption. But
quantifying how unlikely exactly seems hard, thus providing at least some
form of protection against such errors seems prudent.

In addition, even with the shift-induced slowdown, FNV1+SHIFT still
performs similarly to hardware-accelerated CRC, reaching about 6bytes/cycle
on modern Intel CPUS. This really is plenty fast - if I'm not mistaken, it
translates to well over 10 GB/s.

So overall -1 for removing the shift.

best regards,
Florian Pflug




Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Ants Aasma
Date:
<p dir="ltr">On Apr 23, 2013 10:17 AM, "Jeff Davis" <<a href="mailto:pgsql@j-davis.com">pgsql@j-davis.com</a>>
wrote:<br/> > Attached is my reorganization of Ants's patch here:<br /> ><br /> > <a
href="http://www.postgresql.org/message-id/CA">http://www.postgresql.org/message-id/CA</a><br/> >
+CSw_vinyf-w45i=M1m__MpJZY=<a
href="mailto:e8S4Nt_KNnpEbtWjTOaSUA@mail.gmail.com">e8S4Nt_KNnpEbtWjTOaSUA@mail.gmail.com</a><pdir="ltr">Thanks for
yourhelp. Some notes below.<p dir="ltr">> My changes:<br /> ><br /> > * wrest the core FNV algorithm from the
specificconcerns of a data page<br /> >    - PageCalcChecksum16 mixes the block number, reduces to 16 bits,<br />
>     and avoids the pd_checksum field<br /> >    - the checksum algorithm is just a pure block checksum with a
32-bit<br/> >      result<br /> > * moved the FNV checksum into a separate file, checksum.c<p dir="ltr">I think
thefunction should not be called checksum_fnv as it implies that we use the well known straightforward implementation.
Maybechecksum_block or some other generic name.<p dir="ltr">> * added Ants's suggested compilation flags for better
optimization<pdir="ltr">-msse4.1 is not safe to use in default builds. On the other hand it doesn't hurt to just
specifyit in CFLAGS for the whole compile (possibly as -march=native). We should just omit it and mention somewhere
thatSSE4.1 enabled builds will have better checksum performance.<p dir="ltr">> * slight update to the paragraph in
theREADME that discusses concerns<br /> > specific to a data page<br /> ><br /> > I do have a couple
questions/concernsabout Ants's patch though:<br /> ><br /> > * The README mentions a slight bias; does that come
fromthe mod<br /> > (2^16-1)? That's how I updated the README, so I wanted to make sure.<p dir="ltr">Yes.<p
dir="ltr">>* How was the FNV_PRIME chosen?<p dir="ltr">I still haven't found the actual source for this value. It's
specifiedas the value to use for 32bit FNV-1a.<p dir="ltr">> * I can't match the individual algorithm step as
describedin the README<br /> > to the actual code. And the comments in the README don't make it clear<br /> >
enoughwhich one is right (or maybe they both are, and I'm just tired).<p dir="ltr">I will try to reword.<p
dir="ltr">>The README says:<br /> ><br /> >    hash = (hash ^ value) * ((hash ^ value) >> 3)<br />
><br/> > (which is obviously missing the FNV_PRIME factor) and the code says:<br /> ><br /> >    +#define
CHECKSUM_COMP(checksum,value) do {\<br /> >    +       uint32 __tmp = (checksum) ^ (value);\<br /> >    +      
(checksum)= __tmp * FNV_PRIME ^ (__tmp >> 3);\<br /> >    +} while (0)<br /> ><br /> > I'm somewhat on
thefence about the "shift right". It was discussed in<br /> > this part of the thread:<br /> ><br /> > <a
href="http://www.postgresql.org/message-id/99343716-5F5A-45C8-B2F6-74B9BA357396@phlo.org">http://www.postgresql.org/message-id/99343716-5F5A-45C8-B2F6-74B9BA357396@phlo.org</a><br
/>><br /> > I think we should be able to state with a little more clarity in the<br /> > README why there is a
problemwith plain FNV-1a, and why this<br /> > modification is both effective and safe.<p dir="ltr">Florian already
mentionedwhy it's effective. I have an intuition why it's safe, will try to come up with a well reasoned argument.<p
dir="ltr">Regards,<br/> Antd Aasma 

Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Andres Freund
Date:
On 2013-04-23 00:17:28 -0700, Jeff Davis wrote:
> + # important optimization flags for checksum.c
> + ifeq ($(GCC),yes)
> + checksum.o: CFLAGS += -msse4.1 -funroll-loops -ftree-vectorize
> + endif

I am pretty sure we can't do those unconditionally:
- -funroll-loops and -ftree-vectorize weren't always part of gcc afair, so we would need a configure check for those
- SSE4.1 looks like a total no-go, its not available everywhere. We *can* add runtime detection of that with gcc fairly
easilyand one-time if we wan't to go there (later?) using 'ifunc's, but that needs a fair amount of infrastructure
work.
- We can rely on SSE1/2 on amd64, but I think thats automatically enabled there.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Ants Aasma
Date:
On Tue, Apr 23, 2013 at 11:47 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-04-23 00:17:28 -0700, Jeff Davis wrote:
>> + # important optimization flags for checksum.c
>> + ifeq ($(GCC),yes)
>> + checksum.o: CFLAGS += -msse4.1 -funroll-loops -ftree-vectorize
>> + endif
>
> I am pretty sure we can't do those unconditionally:
> - -funroll-loops and -ftree-vectorize weren't always part of gcc afair,
>   so we would need a configure check for those

-funroll-loops is available from at least GCC 2.95. -ftree-vectorize
is GCC 4.0+. From what I read from the documentation on ICC -axSSE4.1
should generate a plain and accelerated version and do a runtime
check., I don't know if ICC vectorizes the specific loop in the patch,
but I would expect it to given that Intels vectorization has generally
been better than GCCs and the loop is about as simple as it gets. I
don't know the relevant options for other compilers.

> - SSE4.1 looks like a total no-go, its not available everywhere. We
>   *can* add runtime detection of that with gcc fairly easily and
>   one-time if we wan't to go there (later?) using 'ifunc's, but that
>   needs a fair amount of infrastructure work.
> - We can rely on SSE1/2 on amd64, but I think thats automatically
>   enabled there.

This is why I initially went for the lower strength 16bit checksum
calculation - requiring only SSE2 would have made supporting the
vectorized version on amd64 trivial. By now my feeling is that it's
not prudent to compromise in quality to save some infrastructure
complexity. If we set a hypothetical VECTORIZATION_FLAGS variable at
configure time, the performance is still there for those who need it
and can afford CPU specific builds.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de



Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Jeff Davis
Date:
On Tue, 2013-04-23 at 11:44 +0300, Ants Aasma wrote:
> I will try to reword.

Did you have a chance to clarify this, as well as some of the other
documentation issues Simon mentioned here?

http://www.postgresql.org/message-id/CA+U5nMKVEu8UDXQe
+Nk=d7Nqm4ypFSzaEf0esaK4j31LyQCOBQ@mail.gmail.com

I'm not sure if others are waiting on me for a new patch or not. I can
give the documentation issues a try, but I was hesitant to do so because
you've done the research.

The problems that I can correct are fairly trivial.

Regards,Jeff Davis




Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Ants Aasma
Date:
<p dir="ltr">On Apr 25, 2013 10:38 PM, "Jeff Davis" <<a href="mailto:pgsql@j-davis.com">pgsql@j-davis.com</a>>
wrote:<br/> ><br /> > On Tue, 2013-04-23 at 11:44 +0300, Ants Aasma wrote:<br /> > > I will try to
reword.<br/> ><br /> > Did you have a chance to clarify this, as well as some of the other<br /> >
documentationissues Simon mentioned here?<br /> ><br /> > <a
href="http://www.postgresql.org/message-id/CA+U5nMKVEu8UDXQe">http://www.postgresql.org/message-id/CA+U5nMKVEu8UDXQe</a><br
/>> +Nk=<a
href="mailto:d7Nqm4ypFSzaEf0esaK4j31LyQCOBQ@mail.gmail.com">d7Nqm4ypFSzaEf0esaK4j31LyQCOBQ@mail.gmail.com</a><p
dir="ltr">Notyet. I am busy at the moment due to events in private life. I might be able to find some time over the
weekendto go over the documentation but no guarantees.<p dir="ltr">> I'm not sure if others are waiting on me for a
newpatch or not. I can<br /> > give the documentation issues a try, but I was hesitant to do so because<br /> >
you'vedone the research.<br /> ><br /> > The problems that I can correct are fairly trivial.<p dir="ltr">The
unresolvedcode issue that I know of is moving the compiler flags behind a configure check. I would greatly appreciate
itif you could take a look at that. My config-fu is weak and it would take me some time to figure out how to do that.<p
dir="ltr">Regards,<br/> Ants Aasma 

Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Florian Pflug
Date:
On Apr26, 2013, at 10:28 , Ants Aasma <ants.aasma@eesti.ee> wrote:
> On Apr 25, 2013 10:38 PM, "Jeff Davis" <pgsql@j-davis.com> wrote:
> >
> > On Tue, 2013-04-23 at 11:44 +0300, Ants Aasma wrote:
> > > I will try to reword.
> >
> > Did you have a chance to clarify this, as well as some of the other
> > documentation issues Simon mentioned here?
> >
> > http://www.postgresql.org/message-id/CA+U5nMKVEu8UDXQe
> > +Nk=d7Nqm4ypFSzaEf0esaK4j31LyQCOBQ@mail.gmail.com
>
> Not yet. I am busy at the moment due to events in private life. I might be able to find some time over the weekend to
goover the documentation but no guarantees. 

I can try to write up the reasoning behind the choice of FNV1+SHIFT3 as a checksum function, but I'm quite busy too so
I'mnot 100% certain I'll get to it. If that's OK with you Ants, that is. 

> > I'm not sure if others are waiting on me for a new patch or not. I can
> > give the documentation issues a try, but I was hesitant to do so because
> > you've done the research.
> >
> > The problems that I can correct are fairly trivial.
>
> The unresolved code issue that I know of is moving the compiler flags behind a configure check. I would greatly
appreciateit if you could take a look at that. My config-fu is weak and it would take me some time to figure out how to
dothat. 

Do we necessarily have to do that before beta? If not, let's concentrate on getting the basic path in, and let's add
thegcc-specific compiler options later. 

best regards,
Florian Pflug




Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Andres Freund
Date:
On 2013-04-26 13:11:00 +0200, Florian Pflug wrote:
> > The unresolved code issue that I know of is moving the compiler flags behind a configure check. I would greatly
appreciateit if you could take a look at that. My config-fu is weak and it would take me some time to figure out how to
dothat.
 
> 
> Do we necessarily have to do that before beta? If not, let's concentrate on getting the basic path in, and let's add
thegcc-specific compiler options later.
 

If we want them we should do it before beta, otherwise we won't notice
problems that the flags cause (misoptimizations, problems on compiler
versions, ...). So either now or in 9.4.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Tom Lane
Date:
Andres Freund <andres@2ndquadrant.com> writes:
>> On 2013-04-26 13:11:00 +0200, Florian Pflug wrote:
>>> The unresolved code issue that I know of is moving the compiler flags behind a configure check. I would greatly
appreciateit if you could take a look at that. My config-fu is weak and it would take me some time to figure out how to
dothat.
 

>> Do we necessarily have to do that before beta? If not, let's concentrate on getting the basic path in, and let's add
thegcc-specific compiler options later.
 

> If we want them we should do it before beta, otherwise we won't notice
> problems that the flags cause (misoptimizations, problems on compiler
> versions, ...). So either now or in 9.4.

Yeah, beta1 is the point in the cycle where we have the most leverage
for discovering portability problems.  We should not be leaving anything
involving configure tests as "to fix later".
        regards, tom lane



Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Simon Riggs
Date:
On 26 April 2013 14:40, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
>>> On 2013-04-26 13:11:00 +0200, Florian Pflug wrote:
>>>> The unresolved code issue that I know of is moving the compiler flags behind a configure check. I would greatly
appreciateit if you could take a look at that. My config-fu is weak and it would take me some time to figure out how to
dothat.
 
>
>>> Do we necessarily have to do that before beta? If not, let's concentrate on getting the basic path in, and let's
addthe gcc-specific compiler options later.
 
>
>> If we want them we should do it before beta, otherwise we won't notice
>> problems that the flags cause (misoptimizations, problems on compiler
>> versions, ...). So either now or in 9.4.
>
> Yeah, beta1 is the point in the cycle where we have the most leverage
> for discovering portability problems.  We should not be leaving anything
> involving configure tests as "to fix later".

I'm expecting to spend some time on this over the weekend, once I've
re-read the thread and patches to see if there is something to commit.

That's my last time window, so this looks like the last chance to make
changes before beta.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Jeff Davis
Date:
On Fri, Apr 26, 2013 at 7:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I'm expecting to spend some time on this over the weekend, once I've
> re-read the thread and patches to see if there is something to commit.
>
> That's my last time window, so this looks like the last chance to make
> changes before beta.

I updated the patch and split it into two parts (attached).

The first patch is the checksum algorithm itself. I have done
some documentation updates and moved it into the C file (rather
than the README), but further explanation of the "shift right 3"
modification will need to be by Ants or Florian.

The second patch adds the configure-time check for the right
compilation flags, and uses them when compiling checksum.c. I
called the new variable CFLAGS_EXTRA, for lack of a better idea,
so feel free to come up with a new name. It doesn't check for, or
use, -msse4.1, but that can be specified by the user by
configuring with CFLAGS_EXTRA="-msse4.1".

I don't know of any more required changes, aside from
documentation improvements.

Regards,
     Jeff Davis

Attachment

Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Greg Smith
Date:
On 4/26/13 3:57 PM, Jeff Davis wrote:
> The second patch adds the configure-time check for the right
> compilation flags, and uses them when compiling checksum.c. I
> called the new variable CFLAGS_EXTRA, for lack of a better idea,
> so feel free to come up with a new name. It doesn't check for, or
> use, -msse4.1, but that can be specified by the user by
> configuring with CFLAGS_EXTRA="-msse4.1".

Thank you, that is the last piece I was looking at but couldn't nail 
down on my own.  With that I should be able to duplicate both the 
slicing by 8 CRC speedup Ants sent over (which also expected some 
optimization changes) and trying something FNV based this weekend.

I think I need to do two baselines:  master without checksums, and 
master with extra optimizations but still without checksums.  It may be 
the case that using better compile time optimizations gives a general 
speedup that's worth considering regardless.  The optimizations seem to 
have a very significant impact on the checksum feature, but I'd like to 
quantify how they change the code a little bit before even getting into 
that.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Jeff Davis
Date:
On Fri, 2013-04-26 at 16:40 -0400, Greg Smith wrote:
> I think I need to do two baselines:  master without checksums, and 
> master with extra optimizations but still without checksums.  It may be 
> the case that using better compile time optimizations gives a general 
> speedup that's worth considering regardless.  The optimizations seem to 
> have a very significant impact on the checksum feature, but I'd like to 
> quantify how they change the code a little bit before even getting into 
> that.

The patch only affects optimization flags used when compiling
checksum.c, so it should have no effect on other areas of the code.

If you want to compile the whole source with those flags, then just do:
 CFLAGS="-msse4.1 -funroll-loops -ftree-vectorize" ./configure

Changing the optimization flags for existing code will have a larger
impact and should be considered separately from checksums.

Regards,Jeff Davis





Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Andres Freund
Date:
On 2013-04-26 12:57:09 -0700, Jeff Davis wrote:
> I updated the patch and split it into two parts (attached).

> The second patch adds the configure-time check for the right
> compilation flags, and uses them when compiling checksum.c. I
> called the new variable CFLAGS_EXTRA, for lack of a better idea,
> so feel free to come up with a new name. It doesn't check for, or
> use, -msse4.1, but that can be specified by the user by
> configuring with CFLAGS_EXTRA="-msse4.1".

CFLAGS_VECTORIZATION? EXTRA sounds to generic to me.

> --- a/config/c-compiler.m4
> +++ b/config/c-compiler.m4
> @@ -242,6 +242,30 @@ undefine([Ac_cachevar])dnl
>  
>  
>  
> +# PGAC_PROG_CC_CFLAGS_EXTRA_OPT
> +# -----------------------
> +# Given a string, check if the compiler supports the string as a
> +# command-line option. If it does, add the string to CFLAGS_EXTRA.
> +AC_DEFUN([PGAC_PROG_CC_CFLAGS_EXTRA_OPT],
> +[define([Ac_cachevar], [AS_TR_SH([pgac_cv_prog_cc_cflags_extra_$1])])dnl
> +AC_CACHE_CHECK([whether $CC supports $1], [Ac_cachevar],
> +[pgac_save_CFLAGS_EXTRA=$CFLAGS_EXTRA
> +CFLAGS_EXTRA="$pgac_save_CFLAGS_EXTRA $1"
> +ac_save_c_werror_flag=$ac_c_werror_flag
> +ac_c_werror_flag=yes
> +_AC_COMPILE_IFELSE([AC_LANG_PROGRAM()],
> +                   [Ac_cachevar=yes],
> +                   [Ac_cachevar=no])
> +ac_c_werror_flag=$ac_save_c_werror_flag
> +CFLAGS_EXTRA="$pgac_save_CFLAGS_EXTRA"])
> +if test x"$Ac_cachevar" = x"yes"; then
> +  CFLAGS_EXTRA="$CFLAGS_EXTRA $1"
> +fi
> +undefine([Ac_cachevar])dnl
> +])# PGAC_PROG_CC_CFLAGS_EXTRA_OPT

I think it would be better to have a PGAC_PROG_CC_VAR_OPT or so which
assigns the flag to some passed variable name. Then we can reuse it for
different vars and I have the feeling those will come. And having a
CFLAGS_VECTOR_OPT would just be stupid ;)

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Jeff Davis
Date:
On Sat, 2013-04-27 at 00:20 +0200, Andres Freund wrote:
> CFLAGS_VECTORIZATION? EXTRA sounds to generic to me.

I went with CFLAGS_VECTOR to be a little shorter while still keeping
some meaning.

> I think it would be better to have a PGAC_PROG_CC_VAR_OPT or so which
> assigns the flag to some passed variable name. Then we can reuse it for
> different vars and I have the feeling those will come. And having a
> CFLAGS_VECTOR_OPT would just be stupid ;)

Good suggestion; done.

Thank you for the review. New renamed patch attached for the build
options only (the other patch for the FNV checksum algorithm is
unchanged).

Regards,
    Jeff Davis

Attachment

Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Ants Aasma
Date:
On Fri, Apr 26, 2013 at 10:57 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Fri, Apr 26, 2013 at 7:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> I'm expecting to spend some time on this over the weekend, once I've
>> re-read the thread and patches to see if there is something to commit.
>>
>> That's my last time window, so this looks like the last chance to make
>> changes before beta.
>
> I updated the patch and split it into two parts (attached).
>
> The first patch is the checksum algorithm itself. I have done
> some documentation updates and moved it into the C file (rather
> than the README), but further explanation of the "shift right 3"
> modification will need to be by Ants or Florian.
>
> The second patch adds the configure-time check for the right
> compilation flags, and uses them when compiling checksum.c. I
> called the new variable CFLAGS_EXTRA, for lack of a better idea,
> so feel free to come up with a new name. It doesn't check for, or
> use, -msse4.1, but that can be specified by the user by
> configuring with CFLAGS_EXTRA="-msse4.1".
>
> I don't know of any more required changes, aside from
> documentation improvements.

I have updated the base patch. This is supposed to go under the
cflags-vector patch that Jeff posted yesterday.

I had the opportunity to run a few more tests of the hash. Based on
the tests I switched the shift-right operation from 3 to 17bits (the
original value was chosen by gut feel). Avalanche tests showed that
this value removed bias the quickest. You can see the difference in
the attached image, colors are still black 0% bias, blue 5%, green
33%, yellow 75%, red 100%. The final box in the diagram is covered by
the final mixing iteration. The take away from these diagrams is 17
mixes better than 3. 17 still has some residual bias for the final
iteration on the page. The effective information content values in
checksum for 16 high order bits on final 32 32bit words on the page
are: 16.0 15.1 14.1 13.3 12.6 12.1 11.8 11.7 11.5 11.5 11.1 11.0 10.9
10.6 10.6 10.4. Error detection capability for the highest bit is
therefore 1:1351. Based on this I also switched to using two
iterations of zeroes at the end, this way the lowest information
content is 15.976bits or 1:64473.

Documentation changes:
* reworded the algorithm description so the order of steps is more apparent.
* added a link to the FNV reference page.
* fixed note about FNV being 4 bytes at a time. Official variant is 1
byte at a time.
* added a segment of why the algorithm was chosen and its error
detection capabilities.
* added a segment about how the code affects vectorization.

Issue to decide before commiting:
* Whether to use 32 or 64 parallel checksums. The tradeoff for 64 is a
small performance hit (10-20%) on todays CPUs for a small performance
gain on Haswell processors coming to market this year and up to a
theoretical 2x performance gain on future CPUs. Changing this is just
a matter of changing N_SUMS and updating documentation to match.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

Attachment

Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Simon Riggs
Date:
On 28 April 2013 13:22, Ants Aasma <ants@cybertec.at> wrote:

> I have updated the base patch. This is supposed to go under the
> cflags-vector patch that Jeff posted yesterday.


I've committed your patch, with two changes
* one comment extended
* adding the .h file from Jeff's last main patch

Please can Ants and Jeff review the commit and confirm that is what we
wanted. Thanks.

I'm about to light up the build farm with a trial commit of the
compiler instructions stuff.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Simon Riggs
Date:
On 30 April 2013 06:57, Simon Riggs <simon@2ndquadrant.com> wrote:

> I'm about to light up the build farm with a trial commit of the
> compiler instructions stuff.

Amazingly that seemed to work.

ISTM that we also need this patch to put memory barriers in place
otherwise the code might be rearranged.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Andres Freund
Date:
On 2013-04-30 11:55:29 +0100, Simon Riggs wrote:
> ISTM that we also need this patch to put memory barriers in place
> otherwise the code might be rearranged.
> 
> --
>  Simon Riggs                   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services

> --- a/src/backend/storage/page/bufpage.c
> +++ b/src/backend/storage/page/bufpage.c
> @@ -960,11 +960,14 @@ PageCalcChecksum16(Page page, BlockNumber blkno)
>       * Save pd_checksum and set it to zero, so that the checksum calculation
>       * isn't affected by the checksum stored on the page. We do this to
>       * allow optimization of the checksum calculation on the whole block
> -     * in one go.
> +     * in one go. Memory barriers are required to avoid rearrangement here.
>       */
>      save_checksum = phdr->pd_checksum;
> +    pg_memory_barrier();
>      phdr->pd_checksum = 0;
> +    pg_memory_barrier();
>      checksum = checksum_block(page, BLCKSZ);
> +    pg_memory_barrier();
>      phdr->pd_checksum = save_checksum;
>  
>      /* mix in the block number to detect transposed pages */

Why? I am not sure which rearrangement you're fearing? In all cases
where there is danger of concurrent write access to the page we should
already be working on a copy?
Also, if we need a memory barrier I can only see a point in the 2nd
one. The first and third shouldn't ever be able to change anything since
we are only writing to local memory?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Ants Aasma
Date:
On Tue, Apr 30, 2013 at 1:55 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 30 April 2013 06:57, Simon Riggs <simon@2ndquadrant.com> wrote:
>
>> I'm about to light up the build farm with a trial commit of the
>> compiler instructions stuff.
>
> Amazingly that seemed to work.

Thanks for committing. Sorry about missing the .h file from the patch.
The two commits look good to me.

I can confirm that compiling with CFLAGS="-O2 -march=native" will
vectorize the committed code on GCC 4.7.

I also checked the situation on clang. clang-3.2 isn't able to
vectorize the loop even with vectorization options. I will check what
is stopping it. If any volunteer has a working build setup with ICC or
MSVC and is willing to run a couple of test compiles, I think we can
achieve vectorization there too.

> ISTM that we also need this patch to put memory barriers in place
> otherwise the code might be rearranged.

The compiler and CPU both have to preserve correctness when
rearranging code, so I don't think we care about it here. It might
matter if these routine could be called concurrently by multiple
backends for a single buffer, but in that case memory barriers won't
be enough, we'd need full exclusion.

Regards,
Ants Aasma
--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de



Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Simon Riggs
Date:
On 30 April 2013 12:23, Ants Aasma <ants@cybertec.at> wrote:

>> ISTM that we also need this patch to put memory barriers in place
>> otherwise the code might be rearranged.
>
> The compiler and CPU both have to preserve correctness when
> rearranging code, so I don't think we care about it here. It might
> matter if these routine could be called concurrently by multiple
> backends for a single buffer, but in that case memory barriers won't
> be enough, we'd need full exclusion.

Certainly happy not to commit anything else...

Thanks to Ants and Andres.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> ISTM that we also need this patch to put memory barriers in place
> otherwise the code might be rearranged.

This is simply silly.
        regards, tom lane



Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Simon Riggs
Date:
On 30 April 2013 15:51, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> ISTM that we also need this patch to put memory barriers in place
>> otherwise the code might be rearranged.
>
> This is simply silly.

You crack me up sometimes. Yes, it is; seem to be having a bad day for thinkos.

--Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Greg Smith
Date:
I re-ran the benchmark that's had me most worried against the committed
code and things look good so far.  I've been keeping quiet because my
tests recently have all agreed with what Ants already described.  This
is more a confirmation summary than new data.

The problem case has been Jeff's test 2 "worst-case overhead for
calculating checksum while reading data" from the OS cache.  I wrapped
that into a test harness and gave results similar to Jeff's at
http://www.postgresql.org/message-id/5133D732.4090801@2ndQuadrant.com
based on the originally proposed Fletcher-16 checksum.

I made some system improvements since then such that the absolute
runtime improved for most of the tests I'm running.  But the percentage
changes didn't seem off enough to bother re-running the Fletcher tests
again.  Details are in attached spreadsheet, to summarize:

-The original Fletcher-16 code slowed this test case down 24 to 32%,
depending on whether you look at the average of 3 runs or the median.

-The initial checksum commit with the truncated WAL CRC was almost an
order of magnitude worse:  146% to 224% slowdown.  The test case that
took ~830ms was taking as much as 2652ms with that method.  I'm still
not sure why the first run of this test is always so much faster than
the second and third.  But since it happens so often I think it's fair
to consider that worst case really important.

-Committed FNV-1a implementation is now slightly better than Fletcher-16
speed wise:  19 to 27% slowdown.

-Slicing by 8 CRC I didn't test because once I'd fully come around to
agree with Ants's position it didn't seem likely to be useful.  I don't
want to lose track of that idea though, it might be the right path for a
future implementation with 32 bit checksums.

Since the >=25% slowdown on this test with Fletcher-16 turned into more
like a 2% drop on more mixed workloads, I'd expect we're back to where
that's again the case with the new FNV-1a.  I plan to step back to
looking at more of those cases, but it will take a few days at least to
start sorting that out.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com

Attachment

Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Martijn van Oosterhout
Date:
On Tue, Apr 30, 2013 at 01:05:30PM -0400, Greg Smith wrote:
> I re-ran the benchmark that's had me most worried against the
> committed code and things look good so far.  I've been keeping quiet
> because my tests recently have all agreed with what Ants already
> described.  This is more a confirmation summary than new data.

I came across this today: Data Integrity Extensions, basically a
standard for have an application calculate a checksum of a block and
submitting it together with the block so that the disk can verify that
the block it is writing matches what the application sent.

It appears SCSI has standardised on a CRC-16 checksum with polynomial
0x18bb7 .

http://www.t10.org/ftp/t10/document.03/03-290r0.pdf
https://oss.oracle.com/~mkp/docs/dix-draft.pdf

Not directly relavent to PostgreSQL now, but possibly in the future.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.  -- Arthur Schopenhauer

Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Greg Smith
Date:
On 4/30/13 5:26 PM, Martijn van Oosterhout wrote:
> I came across this today: Data Integrity Extensions, basically a
> standard for have an application calculate a checksum of a block and
> submitting it together with the block so that the disk can verify that
> the block it is writing matches what the application sent.
>
> It appears SCSI has standardised on a CRC-16 checksum with polynomial
> 0x18bb7 .

To be pedantic for a minute (for the first time *ever* on pgsql-hackers) 
it's not quite all of SCSI.  iSCSI has joined btrfs by settling on 
CRC-32C with the Castagnoli polynomial, as mentioned in that first 
reference.  CRC-32C is also the one with the SSE4.2 instructions to help 
too.  All the work around the T10/Data Integrity Field standard that's 
going on is nice.  I think it's going to leave a lot of PostgreSQL users 
behind though.  I'd bet a large sum of money that five years from now, 
there will still be more than 10X as many PostgreSQL servers on EC2 as 
on T10/DIF capable hardware.

I feel pretty good that this new FNV-1a implementation is a good 
trade-off spot that balances error detection and performance impact.  If 
you want a 16 bit checksum that seems ready for beta today, we can't do 
much better.  Fletcher-16 had too many detection holes, the WAL checksum 
was way too expensive.  Optimized FNV-1a is even better than unoptimized 
Fletcher-16 without as many detection issues.  Can't even complain about 
the code bloat for this part either--checksum.c is only 68 lines if you 
take out its documentation.

The WAL logging of hint bits is where the scary stuff to me for this 
feature has always been at.  My gut feel is that doing that needed to 
start being available as an option anyway.  Just this month we've had 
two customer issues pop up where we had to look for block differences 
between a master and a standby.  The security update forced some normal 
update stragglers to where they now have the 9.1.6 index corruption fix, 
and we're looking for cases where standby indexes might have been 
corrupted by it.  In this case the comparisons can just avoid anything 
but indexes, so hint bits are thankfully not involved.

But having false positives pop out of comparing a master and standby due 
to hint bits makes this sort of process much harder in general.  Being 
able to turn checksums on, and then compare more things between master 
and standby without expecting any block differences, that will make both 
routine quality auditing and forensics of broken clusters so much easier.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Noah Misch
Date:
Orthogonal to this thread, but:

On Tue, Apr 30, 2013 at 06:39:09PM -0400, Greg Smith wrote:
> The WAL logging of hint bits is where the scary stuff to me for this  
> feature has always been at.  My gut feel is that doing that needed to  
> start being available as an option anyway.  Just this month we've had  
> two customer issues pop up where we had to look for block differences  
> between a master and a standby.  The security update forced some normal  
> update stragglers to where they now have the 9.1.6 index corruption fix,  
> and we're looking for cases where standby indexes might have been  
> corrupted by it.  In this case the comparisons can just avoid anything  
> but indexes, so hint bits are thankfully not involved.

B-tree indexes have hints; see callers of ItemIdMarkDead().

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



Re: Substituting Checksum Algorithm (was: Enabling Checksums)

From
Andres Freund
Date:
On 2013-04-30 18:39:09 -0400, Greg Smith wrote:
> The WAL logging of hint bits is where the scary stuff to me for this feature
> has always been at.  My gut feel is that doing that needed to start being
> available as an option anyway.  Just this month we've had two customer
> issues pop up where we had to look for block differences between a master
> and a standby.  The security update forced some normal update stragglers to
> where they now have the 9.1.6 index corruption fix, and we're looking for
> cases where standby indexes might have been corrupted by it.  In this case
> the comparisons can just avoid anything but indexes, so hint bits are
> thankfully not involved.
> 
> But having false positives pop out of comparing a master and standby due to
> hint bits makes this sort of process much harder in general.  Being able to
> turn checksums on, and then compare more things between master and standby
> without expecting any block differences, that will make both routine quality
> auditing and forensics of broken clusters so much easier.

I don't think the current implementation helps you with that. We only
log the first hint bit set after a checkpoint, you will still get
inconsistent bits set after that. So you might have some fewer
inconsistencies but not enough to weed them out manually or such.
c.f. MarkBufferDirtyHint() and XLogSaveBufferForHint().

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services