Re: Improve CRC32C performance on SSE4.2 - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Improve CRC32C performance on SSE4.2
Date
Msg-id Z8X-0wg7wXztjMQ2@nathan
Whole thread Raw
In response to Re: Improve CRC32C performance on SSE4.2  (John Naylor <johncnaylorls@gmail.com>)
Responses Re: Improve CRC32C performance on SSE4.2
List pgsql-hackers
On Fri, Feb 28, 2025 at 07:11:29PM +0700, John Naylor wrote:
> 0002: For SSE4.2 builds, arrange so that constant input uses an
> inlined path so that the compiler can emit unrolled loops anywhere.
> This is particularly important for the WAL insertion lock, so this is
> possibly committable on its own just for that.

Nice.

> 0004: the PCLMUL path for SSE4.2 builds. This uses a function pointer
> for long-ish input and the same above inlined path for short input
> (whether constant or not). So it gets the best of both worlds.

I spent some time staring at pg_crc32.h with all these patches applied, and
IIUC it leads to the following behavior:

* For compiled-in SSE 4.2 builds, we branch based on the length.  For
  smaller inputs, we are using an inlined version of the SSE 4.2 code.
  For larger inputs, we call a function pointer so that we can potentially
  use the PCLMUL version.  This could potentially lead to a small
  regression for machines with SSE 4.2 but not PCLMUL, but that may be
  uncommon enough at this point to not worry aobut.

* For runtime-check SSE 4.2 builds, we choose slicing-by-8, SSE 4.2, or
  SSE 4.2 with PCLMUL, and we always use a function pointer.

The main question I have is whether we can simplify this by always using a
runtime check and by inlining slicing-by-8 for small inputs.  That would be
dependent on the performance of slicing-by-8 and SSE 4.2 being comparable
for small inputs.

Overall, I wish we could avoid splitting things into separate files and
adding more header file gymnastics, but maybe there isn't much better we
can do without overhauling the CPU feature detection code.

-- 
nathan



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Next
From: Matheus Alcantara
Date:
Subject: Re: SQL:2023 JSON simplified accessor support