Re: Improve CRC32C performance on SSE4.2 - Mailing list pgsql-hackers

From John Naylor
Subject Re: Improve CRC32C performance on SSE4.2
Date
Msg-id CANWCAZabmia25iu8Z_qRhLKoOV1VxhcSMkJuzDomQrA2RWdTUA@mail.gmail.com
Whole thread Raw
In response to Re: Improve CRC32C performance on SSE4.2  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Improve CRC32C performance on SSE4.2
List pgsql-hackers
On Tue, Mar 4, 2025 at 2:11 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> Overall, I wish we could avoid splitting things into separate files and
> adding more header file gymnastics, but maybe there isn't much better we
> can do without overhauling the CPU feature detection code.

I wanted to make an attempt to make this aspect nicer. v13-0002
incorporates deliberately compact and simple loops for inlined
constant input into the dispatch function, and leaves the existing
code alone. This avoids code churn and saves vertical space in the
copied code. It needs a bit more commentary, but I hope this is a more
digestible prerequisite to the CLMUL algorithm -- as a reminder, it'll
be simpler if we can always assume non-constant input can go through a
function pointer.

I've re-attached the modified perf test from v12 just in case anyone
wants to play with it (v13-0003), but named so that the CF bot can't
find it, since it breaks the tests in the original perf test (It's not
for commit anyway).

Adding back AVX-512 should be fairly mechanical, since Raghuveer and
Nathan have already done the work needed for that.

--
John Naylor
Amazon Web Services

Attachment

pgsql-hackers by date:

Previous
From: Anthonin Bonnefoy
Date:
Subject: Re: Xact end leaves CurrentMemoryContext = TopMemoryContext
Next
From: Mahendra Singh Thalor
Date:
Subject: change on_exit_nicely_list array to the dynamic array to increase slots at run time for pg_restore