Re: Enabling Checksums - Mailing list pgsql-hackers

From Florian Pflug
Subject Re: Enabling Checksums
Date
Msg-id 21E23B6A-C5A7-4A54-AB24-D6EA7FD405FE@phlo.org
Whole thread Raw
In response to Re: Enabling Checksums  (Ants Aasma <ants@cybertec.at>)
Responses Re: Enabling Checksums
List pgsql-hackers
On Apr13, 2013, at 17:14 , Ants Aasma <ants@cybertec.at> wrote:
> Based on current analysis, it is particularly good at detecting single
> bit errors, as good at detecting burst errors as can be expected from
> 16 bits and not horrible at detecting burst writes of zeroes. It is
> quite bad at detecting multiple uncorrelated single bit errors and
> extremely bad at detecting repeating patterns of errors in low order
> bits.

I've read the patch and tried to understand why it's that bad at
detecting repeating patterns of errors in low order bits, and to see
if there might be a way to fix that without too much of a performance
impact.

Here's what I gather the algorithm does:
 It treats the input data, a page of L bytes, as a Nx64 matrix V of 16-bit quantities (N = L/64, obviously). It then
firstcomputes (using two primes p (PRIME1) and q (PRIME2)) 
   S = V[1,1]*p^63*q^63 + V[1,2]*p^63*q^62 + … + V[1,64]*p^63*q^0     + V[2,1]*p^62*q^63 + V[2,2]*p^62*q^62 + … +
V[2,64]*p^62*q^0    + …     + V[N,1]*p^0 *q^63 + V[N,2]*p^0 *q^62 + … + V[N,64]*p^0 *q^0     (mod 2^16)     = sum
V[i,j]*p^(64-i)*q^(64-j)
  Note that it does that by first computing the row-wise sums without  the q^i coefficient, and then (in what the code
callsthe aggregation  phase) combines those row-wise sums into a total, adding the q^i-  coefficients along the way. 
  The final hash value is then
    H = S * p + B * q mod 2^16
  where B is a salt value intended to detect swapped pages (currently  B is simply the page index)

This raises two question. First, why are there two primes? You could
just as well using a single prime q and set p=q^64 mod 2^16. You then
get S = sum V[i,j] * q^(64*(64-i) + (64-j)   = sum V[i,j] * q^(4096 - 64*(i-1) - j)
You get higher prime powers that way, but you can easily choose a prime
that yields distinct values mod 2^16 for exponents up to 16383. Your
PRIME2, for example, does. (It wraps around for 16384, i.e.
PRIME2^16384 = 1 mod 2^16, but that's true for every possible prime since
16384 is the Carmichael function's value at 2^16)

Second, why does it use addition instead of XOR? It seems that FNV
usually XORs the terms together instead of adding them?

Regarding the bad behaviour for multiple low-bit errors - can you
explain why it behaves badly in that case? I currently fail to see
why that would be. I *can* see that the lowest bit of the hash depends
only on the lowest bit of the input words, but as long as the lowest
bits of the input words also affect other bits of the hash, that shouldn't
matter. Which I think they do, but I might be missing something...

Here, btw, is a page on FNV hashing. It mentions a few rules for
picking suitable primes

http://www.isthe.com/chongo/tech/comp/fnv

best regards,
Florian Pflug




pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Enabling Checksums
Next
From: Bruce Momjian
Date:
Subject: Re: [GENERAL] currval and DISCARD ALL