Re: Proposal for Updating CRC32C with AVX-512 Algorithm. - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Proposal for Updating CRC32C with AVX-512 Algorithm. |
Date | |
Msg-id | mclb5qcupjfkzctmjkaygy4b7ecnimttwhqljenrustz7om4ed@iko77qgbs3bc Whole thread Raw |
In response to | Re: Proposal for Updating CRC32C with AVX-512 Algorithm. (John Naylor <johncnaylorls@gmail.com>) |
Responses |
Re: Proposal for Updating CRC32C with AVX-512 Algorithm.
|
List | pgsql-hackers |
Hi, On 2024-12-14 12:08:57 +0700, John Naylor wrote: > On Thu, Jun 13, 2024 at 2:37 AM Andres Freund <andres@anarazel.de> wrote: > > > > It's hard to understand, but a nonetheless helpful page is > > https://users.ece.cmu.edu/~koopman/crc/crc32.html which lists properties for > > crc32c: > > https://users.ece.cmu.edu/~koopman/crc/c32/0x8f6e37a0_len.txt > > which lists > > (0x8f6e37a0; 0x11edc6f41) <=> (0x82f63b78; 0x105ec76f1) {2147483615,2147483615,5243,5243,177,177,47,47,20,20,8,8,6,6,1,1}| gold | (*op) iSCSI; CRC-32C; CRC-32/4 > > > > This cryptic notion AFAIU indicates that for our polynomial we can detect 2bit > > errors up to a length of 2147483615 bytes, 3 bit errors up to 2147483615, 3 > > and 4 bit errors up to 5243, 5 and 6 bit errors up to 177, 7/8 bit errors up > > to 47. > > One aspect of that cryptic notation that you seemed to have missed is > "(*op)" -- explained as: > > *p - primitive polynomial. This has optimal length for HD=3, and good > HD=2 performance above that length. > *o - odd bit errors detected. This has a factor of (x+1) and detects > all odd bit errors (implying that even number of bit errors have an > elevated undetected error rate) > *op - odd bit errors detected plus primitive. This is a primitive > polynomial times (x+1). It has optimal length for HD=4, and detects > all odd bit errors. > > This means it's not really a 32-bit checksum -- it's a 1-bit checksum > plus a 31-bit checksum. The 1-bit checksum can detect any odd number > of bit-flips. Do we really want to throw that property away? I think it's pretty much irrelevant for our usecase. What the WAL checksum needs to protect against are cases like a record spanning >1 disk sectors or >1 OS pages and one of those sectors/pages not having made it to disk, while the rest has made it (and thus shows old contents). That means we have to detect runs of "wrong content" that are *never* in the single bit range (since sector boundaries never fall within a bit), *never* within a 4 byte range (because that's what we IIRC align records to, and again, sector boundaries don't fall within aligned 4 byte quantities). Because the likely causes of failure are parts of the correct record and then a tail or an intermittent long chunk (>= 1 sector) of wrong content, detecting certain number of bit flips just doesn't help. Bit flips are an important thing to detect and correct when they are something that can happen in isolation. E.g. a bunch of interference in an ethernet cable. Or the charge in an individual flash cell being a tiny bit above/below some threshold. But that's just not what we have with WAL. It's also worth noting that just about *all* permanent storage already has applied sector-level checksums, protecting against (and correcting) bit flips at that level. > Sure, for an even number bitflips beyond a small number, we're left > with the luck ordinary collisions, and CRC is not particularly great, I.e. just about *all* failure scenarios for WAL. > but for two messages of the same length, I'm also not sure it's all > that bad, either Our records rarely have the same length, no? Greetings, Andres Freund
pgsql-hackers by date: