Re: Proposal for Updating CRC32C with AVX-512 Algorithm. - Mailing list pgsql-hackers

From John Naylor
Subject Re: Proposal for Updating CRC32C with AVX-512 Algorithm.
Date
Msg-id CANWCAZYQnppe=XHxXGwYEvuaqx7_v91sHk54kqWYRyinzvhbVA@mail.gmail.com
Whole thread Raw
In response to Re: Proposal for Updating CRC32C with AVX-512 Algorithm.  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Sat, Dec 14, 2024 at 10:24 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2024-12-14 12:08:57 +0700, John Naylor wrote:
> > On Thu, Jun 13, 2024 at 2:37 AM Andres Freund <andres@anarazel.de> wrote:
> > >
> > > It's hard to understand, but a nonetheless helpful page is
> > > https://users.ece.cmu.edu/~koopman/crc/crc32.html which lists properties for
> > > crc32c:
> > > https://users.ece.cmu.edu/~koopman/crc/c32/0x8f6e37a0_len.txt
> > > which lists
> > > (0x8f6e37a0; 0x11edc6f41) <=> (0x82f63b78; 0x105ec76f1)
{2147483615,2147483615,5243,5243,177,177,47,47,20,20,8,8,6,6,1,1}| gold | (*op) iSCSI; CRC-32C; CRC-32/4 
> > >
> > > This cryptic notion AFAIU indicates that for our polynomial we can detect 2bit
> > > errors up to a length of 2147483615 bytes, 3 bit errors up to 2147483615, 3
> > > and 4 bit errors up to 5243, 5 and 6 bit errors up to 177, 7/8 bit errors up
> > > to 47.
> >
> > One aspect of that cryptic notation that you seemed to have missed is
> > "(*op)" -- explained as:
> >
> > *p - primitive polynomial. This has optimal length for HD=3, and good
> > HD=2 performance above that length.
> > *o - odd bit errors detected. This has a factor of (x+1) and detects
> > all odd bit errors (implying that even number of bit errors have an
> > elevated undetected error rate)
> > *op - odd bit errors detected plus primitive. This is a primitive
> > polynomial times (x+1). It has optimal length for HD=4, and detects
> > all odd bit errors.
> >
> > This means it's not really a 32-bit checksum -- it's a 1-bit checksum
> > plus a 31-bit checksum. The 1-bit checksum can detect any odd number
> > of bit-flips. Do we really want to throw that property away?
>
> I think it's pretty much irrelevant for our usecase.
>
> What the WAL checksum needs to protect against are cases like a record
> spanning >1 disk sectors or >1 OS pages and one of those sectors/pages not
> having made it to disk, while the rest has made it (and thus shows old
> contents).
>
> That means we have to detect runs of "wrong content" that are *never* in the
> single bit range (since sector boundaries never fall within a bit), *never*
> within a 4 byte range (because that's what we IIRC align records to, and
> again, sector boundaries don't fall within aligned 4 byte quantities).
>
> Because the likely causes of failure are parts of the correct record and then
> a tail or an intermittent long chunk (>= 1 sector) of wrong content, detecting
> certain number of bit flips just doesn't help.

Granted, but my point was, if a sector of wrong content is wrong by an
odd number of bits, the 1-bit part of the checksum will always catch
it. Every bit flip causes the popcount of the result to flip from even
to odd (or vice versa), so the odd case can never collide:

--original
select crc32c(repeat('A', 512)::bytea);
   crc32c
------------
 3817965270

select bit_count(b'11100011100100011000011011010110') % 2;
 ?column?
----------
        0

--odd number of bitflips
select crc32c(('A' || repeat('C', 511))::bytea);
  crc32c
-----------
 113262028

select bit_count(b'110110000000011110111001100') % 2;
 ?column?
----------
        1

--even number of bitflips
select crc32c(('A' || repeat('B', 511))::bytea);
   crc32c
------------
 1953030209

select bit_count(b'1110100011010001110000001000001') % 2;
 ?column?
----------
        0


If the number of bitflips is even, than the 1-bit part will tell us
nothing, and the guarantees of the 31-bit part will not help the WAL
case for the reasons you describe. So as I understand it the trade-off
for WAL error detection is:

CRC
odd:  100%
even: the collision-avoidance probability of a mediocre hash function

good hash function:
odd:  the collision-avoidance probability of a good hash function
even: the collision-avoidance probability of a good hash function

Stated this way, it's possible we don't have the best solution, but
it's also not immediately obvious to me that the second way is so much
better that it's worth the effort to change it.

If we did go to a hash function, It'd be ideal to have the collision
guarantees of an "almost universal" hash function. For any two
messages of length at most 'n', the claimed probability of collision
is at most, for example:

VHASH [1]: n * 2**-61
CLHASH [1]: 2.0004 * 2**-64 (for same length strings)
umash [2]: ceil(n / 4096) 2**-55
polymur hash [3]: n * 2**-60.2

...but these are all 64-bit hashes, and some have further traits that
make them impractical for us. I'm not aware of any 32-bit universal
hashes. If there were, the bound might be

n * 2** -(31 or less?)

...which for n=8192 and larger, is starting not to look as good. But
for a normal hash function, we only have statistical tests which are
only practical for small lengths.

> It's also worth noting that just about *all* permanent storage already has
> applied sector-level checksums, protecting against (and correcting) bit flips
> at that level.

Sure.

> > but for two messages of the same length, I'm also not sure it's all
> > that bad, either
>
> Our records rarely have the same length, no?

Right, I failed to consider the case where the length is in the
garbled part of the message.

[1] https://arxiv.org/pdf/1503.03465
[2] https://github.com/backtrace-labs/umash
[3] https://github.com/orlp/polymur-hash

--
John Naylor
Amazon Web Services



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Added schema level support for publication.
Next
From: "Andreas 'ads' Scherbaum"
Date:
Subject: Re: Crash: invalid DSA memory alloc request