On Fri, 13 Dec 2024 at 00:14, Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> On Thu, Dec 12, 2024 at 10:45:29AM -0500, Andres Freund wrote:
> > Frankly, we should just move away from using CRCs. They're good for cases
> > where short runs of bit flips are much more likely than other kinds of errors
> > and where the amount of data covered by them has a low upper bound. That's not
> > at all the case for WAL records. It'd not matter too much if CRCs were cheap
> > to compute - but they aren't. We should instead move to some more generic
> > hashing algorithm, decent ones are much faster.
>
> Upthread [0], I wondered aloud about trying to reuse the page checksum code
> for this. IIRC there was a lot of focus on performance when that was
> added, and IME it catches problems decently well.
>
> [0] https://postgr.es/m/ZrUcX2kq-0doNBea%40nathan
It was carefully built to allow compiler auto-vectorization for power
of 2 block sizes to run fast on any CPU that has fast vectorized 32
bit multiplication instructions.
Performance is great, if compiled with -march=native it gets 15.8
bytes/cycle on Zen 3. Compared to 19.5 for t1ha0_aes_avx2, 7.9 for
aes-ni hash, and 2.15 for fasthash32. However, it isn't particularly
good for small (<1K) blocks both for hash quality and performance
reasons.
One idea would be to use fasthash for short lengths and an extended
version of the page checksum for larger values. But before committing
to that approach, I think revisiting the quality of the page checksum
algorithm is due. Quality and robustness were not the highest
priorities when developing it.
--
Ants Aasma
Lead Database Consultant
www.cybertec-postgresql.com