Re: Proposal for Updating CRC32C with AVX-512 Algorithm. - Mailing list pgsql-hackers
From | John Naylor |
---|---|
Subject | Re: Proposal for Updating CRC32C with AVX-512 Algorithm. |
Date | |
Msg-id | CANWCAZYQnppe=XHxXGwYEvuaqx7_v91sHk54kqWYRyinzvhbVA@mail.gmail.com Whole thread Raw |
In response to | Re: Proposal for Updating CRC32C with AVX-512 Algorithm. (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
On Sat, Dec 14, 2024 at 10:24 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2024-12-14 12:08:57 +0700, John Naylor wrote: > > On Thu, Jun 13, 2024 at 2:37 AM Andres Freund <andres@anarazel.de> wrote: > > > > > > It's hard to understand, but a nonetheless helpful page is > > > https://users.ece.cmu.edu/~koopman/crc/crc32.html which lists properties for > > > crc32c: > > > https://users.ece.cmu.edu/~koopman/crc/c32/0x8f6e37a0_len.txt > > > which lists > > > (0x8f6e37a0; 0x11edc6f41) <=> (0x82f63b78; 0x105ec76f1) {2147483615,2147483615,5243,5243,177,177,47,47,20,20,8,8,6,6,1,1}| gold | (*op) iSCSI; CRC-32C; CRC-32/4 > > > > > > This cryptic notion AFAIU indicates that for our polynomial we can detect 2bit > > > errors up to a length of 2147483615 bytes, 3 bit errors up to 2147483615, 3 > > > and 4 bit errors up to 5243, 5 and 6 bit errors up to 177, 7/8 bit errors up > > > to 47. > > > > One aspect of that cryptic notation that you seemed to have missed is > > "(*op)" -- explained as: > > > > *p - primitive polynomial. This has optimal length for HD=3, and good > > HD=2 performance above that length. > > *o - odd bit errors detected. This has a factor of (x+1) and detects > > all odd bit errors (implying that even number of bit errors have an > > elevated undetected error rate) > > *op - odd bit errors detected plus primitive. This is a primitive > > polynomial times (x+1). It has optimal length for HD=4, and detects > > all odd bit errors. > > > > This means it's not really a 32-bit checksum -- it's a 1-bit checksum > > plus a 31-bit checksum. The 1-bit checksum can detect any odd number > > of bit-flips. Do we really want to throw that property away? > > I think it's pretty much irrelevant for our usecase. > > What the WAL checksum needs to protect against are cases like a record > spanning >1 disk sectors or >1 OS pages and one of those sectors/pages not > having made it to disk, while the rest has made it (and thus shows old > contents). > > That means we have to detect runs of "wrong content" that are *never* in the > single bit range (since sector boundaries never fall within a bit), *never* > within a 4 byte range (because that's what we IIRC align records to, and > again, sector boundaries don't fall within aligned 4 byte quantities). > > Because the likely causes of failure are parts of the correct record and then > a tail or an intermittent long chunk (>= 1 sector) of wrong content, detecting > certain number of bit flips just doesn't help. Granted, but my point was, if a sector of wrong content is wrong by an odd number of bits, the 1-bit part of the checksum will always catch it. Every bit flip causes the popcount of the result to flip from even to odd (or vice versa), so the odd case can never collide: --original select crc32c(repeat('A', 512)::bytea); crc32c ------------ 3817965270 select bit_count(b'11100011100100011000011011010110') % 2; ?column? ---------- 0 --odd number of bitflips select crc32c(('A' || repeat('C', 511))::bytea); crc32c ----------- 113262028 select bit_count(b'110110000000011110111001100') % 2; ?column? ---------- 1 --even number of bitflips select crc32c(('A' || repeat('B', 511))::bytea); crc32c ------------ 1953030209 select bit_count(b'1110100011010001110000001000001') % 2; ?column? ---------- 0 If the number of bitflips is even, than the 1-bit part will tell us nothing, and the guarantees of the 31-bit part will not help the WAL case for the reasons you describe. So as I understand it the trade-off for WAL error detection is: CRC odd: 100% even: the collision-avoidance probability of a mediocre hash function good hash function: odd: the collision-avoidance probability of a good hash function even: the collision-avoidance probability of a good hash function Stated this way, it's possible we don't have the best solution, but it's also not immediately obvious to me that the second way is so much better that it's worth the effort to change it. If we did go to a hash function, It'd be ideal to have the collision guarantees of an "almost universal" hash function. For any two messages of length at most 'n', the claimed probability of collision is at most, for example: VHASH [1]: n * 2**-61 CLHASH [1]: 2.0004 * 2**-64 (for same length strings) umash [2]: ceil(n / 4096) 2**-55 polymur hash [3]: n * 2**-60.2 ...but these are all 64-bit hashes, and some have further traits that make them impractical for us. I'm not aware of any 32-bit universal hashes. If there were, the bound might be n * 2** -(31 or less?) ...which for n=8192 and larger, is starting not to look as good. But for a normal hash function, we only have statistical tests which are only practical for small lengths. > It's also worth noting that just about *all* permanent storage already has > applied sector-level checksums, protecting against (and correcting) bit flips > at that level. Sure. > > but for two messages of the same length, I'm also not sure it's all > > that bad, either > > Our records rarely have the same length, no? Right, I failed to consider the case where the length is in the garbled part of the message. [1] https://arxiv.org/pdf/1503.03465 [2] https://github.com/backtrace-labs/umash [3] https://github.com/orlp/polymur-hash -- John Naylor Amazon Web Services
pgsql-hackers by date: