Home > mailing lists

Re: Enabling Checksums - Mailing list pgsql-hackers

From	Garick Hamlin
Subject	Re: Enabling Checksums
Date	March 6, 2013 19:21:32
Msg-id	20130306162121.GB12808@isc.upenn.edu Whole thread Raw
In response to	Re: Enabling Checksums (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses	Re: Enabling Checksums (Andres Freund <andres@2ndquadrant.com>)
List	pgsql-hackers

Tree view

On Wed, Mar 06, 2013 at 01:34:21PM +0200, Heikki Linnakangas wrote:
> On 06.03.2013 10:41, Simon Riggs wrote:
>> On 5 March 2013 18:02, Jeff Davis<pgsql@j-davis.com>  wrote:
>>
>>> Fletcher is probably significantly faster than CRC-16, because I'm just
>>> doing int32 addition in a tight loop.
>>>
>>> Simon originally chose Fletcher, so perhaps he has more to say.
>>
>> IIRC the research showed Fletcher was significantly faster for only a
>> small loss in error detection rate.
>>
>> It was sufficient to make our error detection>  1 million times
>> better, possibly more. That seems sufficient to enable early detection
>> of problems, since if we missed the first error, a second is very
>> likely to be caught (etc). So I am assuming that we're trying to catch
>> a pattern of errors early, rather than guarantee we can catch the very
>> first error.
>
> Fletcher's checksum is good in general, I was mainly worried about  
> truncating the Fletcher-64 into two 8-bit values. I can't spot any  
> obvious weakness in it, but if it's indeed faster and as good as a  
> straightforward Fletcher-16, I wonder why that method is not more widely  
> used.

I was wondering about the effectiveness of this resulting truncated hash
function as well.

> Another thought is that perhaps something like CRC32C would be faster to  
> calculate on modern hardware, and could be safely truncated to 16-bits  
> using the same technique you're using to truncate the Fletcher's  
> Checksum. Greg's tests showed that the overhead of CRC calculation is  
> significant in some workloads, so it would be good to spend some time to  
> optimize that. It'd be difficult to change the algorithm in a future  
> release without breaking on-disk compatibility, so let's make sure we  
> pick the best one.

If picking a CRC why not a short optimal one rather than truncate CRC32C?

I've been reading about optimal checksum for small messages for other 
reasons and found this paper quite good.

http://www.ece.cmu.edu/~koopman/roses/dsn04/koopman04_crc_poly_embedded.pdf

I was interested in small messages and small checksums so this paper may not be
as much help here.

Other than CRCs and fletcher sums, Pearson hashing with a 16-bit block might
be worth considering.  Either a pearson hash or a 16-CRC is small enough to
implement with a lookup table rather than a formula.  

I've been wondering what kind of errors we expect?  Single bit flips?  Large
swaths of bytes corrupted?  Are we more worried about collisions (the odds 
total garbage has the same checksum) or the odds we detect a flip of n-bits.
I would think since the message is large and a write to the wrong location
seems about as likely as a bit flip a pearson hash be good.

Any choice seems like it would be a nice improvement of noticing a storage stack
problem.  The difference would be subtle.  Can I estimate the odds of 
undetected corruption that occurred since the condition was first detected
accurately or does the checksum/hash perform poorly?

Garick

>
>
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

From: Greg Stark
Date: 06 March 2013, 19:13:49
Subject: Re: Materialized views WIP patch

From: Andres Freund
Date: 06 March 2013, 19:53:52
Subject: Re: Optimizing pglz compressor

Re: Enabling Checksums - Mailing list pgsql-hackers

Previous

Next