Re: Enabling Checksums - Mailing list pgsql-hackers

From Garick Hamlin
Subject Re: Enabling Checksums
Date
Msg-id 20130306162121.GB12808@isc.upenn.edu
Whole thread Raw
In response to Re: Enabling Checksums  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Enabling Checksums  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Wed, Mar 06, 2013 at 01:34:21PM +0200, Heikki Linnakangas wrote:
> On 06.03.2013 10:41, Simon Riggs wrote:
>> On 5 March 2013 18:02, Jeff Davis<pgsql@j-davis.com>  wrote:
>>
>>> Fletcher is probably significantly faster than CRC-16, because I'm just
>>> doing int32 addition in a tight loop.
>>>
>>> Simon originally chose Fletcher, so perhaps he has more to say.
>>
>> IIRC the research showed Fletcher was significantly faster for only a
>> small loss in error detection rate.
>>
>> It was sufficient to make our error detection>  1 million times
>> better, possibly more. That seems sufficient to enable early detection
>> of problems, since if we missed the first error, a second is very
>> likely to be caught (etc). So I am assuming that we're trying to catch
>> a pattern of errors early, rather than guarantee we can catch the very
>> first error.
>
> Fletcher's checksum is good in general, I was mainly worried about  
> truncating the Fletcher-64 into two 8-bit values. I can't spot any  
> obvious weakness in it, but if it's indeed faster and as good as a  
> straightforward Fletcher-16, I wonder why that method is not more widely  
> used.

I was wondering about the effectiveness of this resulting truncated hash
function as well.

> Another thought is that perhaps something like CRC32C would be faster to  
> calculate on modern hardware, and could be safely truncated to 16-bits  
> using the same technique you're using to truncate the Fletcher's  
> Checksum. Greg's tests showed that the overhead of CRC calculation is  
> significant in some workloads, so it would be good to spend some time to  
> optimize that. It'd be difficult to change the algorithm in a future  
> release without breaking on-disk compatibility, so let's make sure we  
> pick the best one.

If picking a CRC why not a short optimal one rather than truncate CRC32C?

I've been reading about optimal checksum for small messages for other 
reasons and found this paper quite good.

http://www.ece.cmu.edu/~koopman/roses/dsn04/koopman04_crc_poly_embedded.pdf

I was interested in small messages and small checksums so this paper may not be
as much help here.

Other than CRCs and fletcher sums, Pearson hashing with a 16-bit block might
be worth considering.  Either a pearson hash or a 16-CRC is small enough to
implement with a lookup table rather than a formula.  

I've been wondering what kind of errors we expect?  Single bit flips?  Large
swaths of bytes corrupted?  Are we more worried about collisions (the odds 
total garbage has the same checksum) or the odds we detect a flip of n-bits.
I would think since the message is large and a write to the wrong location
seems about as likely as a bit flip a pearson hash be good.

Any choice seems like it would be a nice improvement of noticing a storage stack
problem.  The difference would be subtle.  Can I estimate the odds of 
undetected corruption that occurred since the condition was first detected
accurately or does the checksum/hash perform poorly?

Garick

>
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: Materialized views WIP patch
Next
From: Andres Freund
Date:
Subject: Re: Optimizing pglz compressor