Home > mailing lists

Re: Enabling Checksums - Mailing list pgsql-hackers

From	Florian Pflug
Subject	Re: Enabling Checksums
Date	April 16, 2013 23:20:40
Msg-id	21E23B6A-C5A7-4A54-AB24-D6EA7FD405FE@phlo.org Whole thread Raw
In response to	Re: Enabling Checksums (Ants Aasma <ants@cybertec.at>)
Responses	Re: Enabling Checksums
List	pgsql-hackers

Tree view

On Apr13, 2013, at 17:14 , Ants Aasma <ants@cybertec.at> wrote:
> Based on current analysis, it is particularly good at detecting single
> bit errors, as good at detecting burst errors as can be expected from
> 16 bits and not horrible at detecting burst writes of zeroes. It is
> quite bad at detecting multiple uncorrelated single bit errors and
> extremely bad at detecting repeating patterns of errors in low order
> bits.

I've read the patch and tried to understand why it's that bad at
detecting repeating patterns of errors in low order bits, and to see
if there might be a way to fix that without too much of a performance
impact.

Here's what I gather the algorithm does:
 It treats the input data, a page of L bytes, as a Nx64 matrix V of 16-bit quantities (N = L/64, obviously). It then
firstcomputes (using two primes p (PRIME1) and q (PRIME2)) 
   S = V[1,1]*p^63*q^63 + V[1,2]*p^63*q^62 + … + V[1,64]*p^63*q^0     + V[2,1]*p^62*q^63 + V[2,2]*p^62*q^62 + … +
V[2,64]*p^62*q^0    + …     + V[N,1]*p^0 *q^63 + V[N,2]*p^0 *q^62 + … + V[N,64]*p^0 *q^0     (mod 2^16)     = sum
V[i,j]*p^(64-i)*q^(64-j)
  Note that it does that by first computing the row-wise sums without  the q^i coefficient, and then (in what the code
callsthe aggregation  phase) combines those row-wise sums into a total, adding the q^i-  coefficients along the way. 
  The final hash value is then
    H = S * p + B * q mod 2^16
  where B is a salt value intended to detect swapped pages (currently  B is simply the page index)

This raises two question. First, why are there two primes? You could
just as well using a single prime q and set p=q^64 mod 2^16. You then
get S = sum V[i,j] * q^(64*(64-i) + (64-j)   = sum V[i,j] * q^(4096 - 64*(i-1) - j)
You get higher prime powers that way, but you can easily choose a prime
that yields distinct values mod 2^16 for exponents up to 16383. Your
PRIME2, for example, does. (It wraps around for 16384, i.e.
PRIME2^16384 = 1 mod 2^16, but that's true for every possible prime since
16384 is the Carmichael function's value at 2^16)

Second, why does it use addition instead of XOR? It seems that FNV
usually XORs the terms together instead of adding them?

Regarding the bad behaviour for multiple low-bit errors - can you
explain why it behaves badly in that case? I currently fail to see
why that would be. I *can* see that the lowest bit of the hash depends
only on the lowest bit of the input words, but as long as the lowest
bits of the input words also affect other bits of the hash, that shouldn't
matter. Which I think they do, but I might be missing something...

Here, btw, is a page on FNV hashing. It mentions a few rules for
picking suitable primes

http://www.isthe.com/chongo/tech/comp/fnv

best regards,
Florian Pflug

pgsql-hackers by date:

From: Simon Riggs
Date: 16 April 2013, 22:49:37
Subject: Re: Enabling Checksums

From: Bruce Momjian
Date: 16 April 2013, 23:48:37
Subject: Re: [GENERAL] currval and DISCARD ALL

Re: Enabling Checksums - Mailing list pgsql-hackers

Previous

Next