On Wed, Jun 12, 2024 at 12:37:46PM -0700, Andres Freund wrote:
> I'm wonder if this isn't going in the wrong direction. We're using CRCs for
> something they're not well suited for in my understanding - and are paying a
> reasonably high price for it, given that even hardware accelerated CRCs aren't
> blazingly fast.
I tend to agree, especially that we should be more concerned about all
bytes after a certain point being garbage than bit flips. (I think we
should also care about bit flips, but I hope those are much less common
than half-written WAL records.)
> With that I perhaps have established that CRC guarantees aren't useful for us.
> But not yet why we should use something else: Given that we already aren't
> relying on hard guarantees, we could instead just use a fast hash like xxh3.
> https://github.com/Cyan4973/xxHash which is fast both for large and small
> amounts of data.
Would it be out of the question to reuse the page checksum code (i.e., an
FNV-1a derivative)? The chart in your link claims that xxh3 is
substantially faster than "FNV64", but I wonder if the latter was
vectorized. I don't know how our CRC-32C implementations (and proposed
implementations) compare, either.
--
nathan