Home > mailing lists

Re: Proposal for Updating CRC32C with AVX-512 Algorithm. - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Proposal for Updating CRC32C with AVX-512 Algorithm.
Date	December 14, 2024 18:24:08
Msg-id	mclb5qcupjfkzctmjkaygy4b7ecnimttwhqljenrustz7om4ed@iko77qgbs3bc Whole thread Raw
In response to	Re: Proposal for Updating CRC32C with AVX-512 Algorithm. (John Naylor <johncnaylorls@gmail.com>)
Responses	Re: Proposal for Updating CRC32C with AVX-512 Algorithm.
List	pgsql-hackers

Tree view

Hi,

On 2024-12-14 12:08:57 +0700, John Naylor wrote:
> On Thu, Jun 13, 2024 at 2:37 AM Andres Freund <andres@anarazel.de> wrote:
> >
> > It's hard to understand, but a nonetheless helpful page is
> > https://users.ece.cmu.edu/~koopman/crc/crc32.html which lists properties for
> > crc32c:
> > https://users.ece.cmu.edu/~koopman/crc/c32/0x8f6e37a0_len.txt
> > which lists
> > (0x8f6e37a0; 0x11edc6f41) <=> (0x82f63b78; 0x105ec76f1)
{2147483615,2147483615,5243,5243,177,177,47,47,20,20,8,8,6,6,1,1}| gold | (*op) iSCSI; CRC-32C; CRC-32/4

> >
> > This cryptic notion AFAIU indicates that for our polynomial we can detect 2bit
> > errors up to a length of 2147483615 bytes, 3 bit errors up to 2147483615, 3
> > and 4 bit errors up to 5243, 5 and 6 bit errors up to 177, 7/8 bit errors up
> > to 47.
>
> One aspect of that cryptic notation that you seemed to have missed is
> "(*op)" -- explained as:
>
> *p - primitive polynomial. This has optimal length for HD=3, and good
> HD=2 performance above that length.
> *o - odd bit errors detected. This has a factor of (x+1) and detects
> all odd bit errors (implying that even number of bit errors have an
> elevated undetected error rate)
> *op - odd bit errors detected plus primitive. This is a primitive
> polynomial times (x+1). It has optimal length for HD=4, and detects
> all odd bit errors.
>
> This means it's not really a 32-bit checksum -- it's a 1-bit checksum
> plus a 31-bit checksum. The 1-bit checksum can detect any odd number
> of bit-flips. Do we really want to throw that property away?

I think it's pretty much irrelevant for our usecase.

What the WAL checksum needs to protect against are cases like a record
spanning >1 disk sectors or >1 OS pages and one of those sectors/pages not
having made it to disk, while the rest has made it (and thus shows old
contents).

That means we have to detect runs of "wrong content" that are *never* in the
single bit range (since sector boundaries never fall within a bit), *never*
within a 4 byte range (because that's what we IIRC align records to, and
again, sector boundaries don't fall within aligned 4 byte quantities).

Because the likely causes of failure are parts of the correct record and then
a tail or an intermittent long chunk (>= 1 sector) of wrong content, detecting
certain number of bit flips just doesn't help.

Bit flips are an important thing to detect and correct when they are something
that can happen in isolation. E.g. a bunch of interference in an ethernet
cable. Or the charge in an individual flash cell being a tiny bit above/below
some threshold.  But that's just not what we have with WAL.

It's also worth noting that just about *all* permanent storage already has
applied sector-level checksums, protecting against (and correcting) bit flips
at that level.

> Sure, for an even number bitflips beyond a small number, we're left
> with the luck ordinary collisions, and CRC is not particularly great,

I.e. just about *all* failure scenarios for WAL.

> but for two messages of the same length, I'm also not sure it's all
> that bad, either

Our records rarely have the same length, no?

Greetings,

Andres Freund

pgsql-hackers by date:

From: Junwang Zhao
Date: 14 December 2024, 16:53:05
Subject: Re: SQL Property Graph Queries (SQL/PGQ)

From: Peter Eisentraut
Date: 14 December 2024, 20:15:13
Subject: Re: pg_attribute_noreturn(), MSVC, C11

Re: Proposal for Updating CRC32C with AVX-512 Algorithm. - Mailing list pgsql-hackers

Previous

Next