Re: Substituting Checksum Algorithm (was: Enabling Checksums) - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Substituting Checksum Algorithm (was: Enabling Checksums)
Date
Msg-id 5180480D.7090507@2ndQuadrant.com
Whole thread Raw
In response to Re: Substituting Checksum Algorithm (was: Enabling Checksums)  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: Substituting Checksum Algorithm (was: Enabling Checksums)  (Noah Misch <noah@leadboat.com>)
Re: Substituting Checksum Algorithm (was: Enabling Checksums)  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On 4/30/13 5:26 PM, Martijn van Oosterhout wrote:
> I came across this today: Data Integrity Extensions, basically a
> standard for have an application calculate a checksum of a block and
> submitting it together with the block so that the disk can verify that
> the block it is writing matches what the application sent.
>
> It appears SCSI has standardised on a CRC-16 checksum with polynomial
> 0x18bb7 .

To be pedantic for a minute (for the first time *ever* on pgsql-hackers) 
it's not quite all of SCSI.  iSCSI has joined btrfs by settling on 
CRC-32C with the Castagnoli polynomial, as mentioned in that first 
reference.  CRC-32C is also the one with the SSE4.2 instructions to help 
too.  All the work around the T10/Data Integrity Field standard that's 
going on is nice.  I think it's going to leave a lot of PostgreSQL users 
behind though.  I'd bet a large sum of money that five years from now, 
there will still be more than 10X as many PostgreSQL servers on EC2 as 
on T10/DIF capable hardware.

I feel pretty good that this new FNV-1a implementation is a good 
trade-off spot that balances error detection and performance impact.  If 
you want a 16 bit checksum that seems ready for beta today, we can't do 
much better.  Fletcher-16 had too many detection holes, the WAL checksum 
was way too expensive.  Optimized FNV-1a is even better than unoptimized 
Fletcher-16 without as many detection issues.  Can't even complain about 
the code bloat for this part either--checksum.c is only 68 lines if you 
take out its documentation.

The WAL logging of hint bits is where the scary stuff to me for this 
feature has always been at.  My gut feel is that doing that needed to 
start being available as an option anyway.  Just this month we've had 
two customer issues pop up where we had to look for block differences 
between a master and a standby.  The security update forced some normal 
update stragglers to where they now have the 9.1.6 index corruption fix, 
and we're looking for cases where standby indexes might have been 
corrupted by it.  In this case the comparisons can just avoid anything 
but indexes, so hint bits are thankfully not involved.

But having false positives pop out of comparing a master and standby due 
to hint bits makes this sort of process much harder in general.  Being 
able to turn checksums on, and then compare more things between master 
and standby without expecting any block differences, that will make both 
routine quality auditing and forensics of broken clusters so much easier.

-- 
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: The missing pg_get_*def functions
Next
From: Noah Misch
Date:
Subject: Re: Substituting Checksum Algorithm (was: Enabling Checksums)