Re: Proposal for Updating CRC32C with AVX-512 Algorithm. - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Proposal for Updating CRC32C with AVX-512 Algorithm.
Date
Msg-id 20240612201135.kk77tiqcux77lgev@awork3.anarazel.de
Whole thread Raw
In response to Proposal for Updating CRC32C with AVX-512 Algorithm.  ("Amonson, Paul D" <paul.d.amonson@intel.com>)
Responses RE: Proposal for Updating CRC32C with AVX-512 Algorithm.
RE: Proposal for Updating CRC32C with AVX-512 Algorithm.
List pgsql-hackers
Hi,

On 2024-05-01 15:56:08 +0000, Amonson, Paul D wrote:
> Comparing the current SSE4.2 implementation of the CRC32C algorithm in
> Postgres, to an optimized AVX-512 algorithm [0] we observed significant
> gains. The result was a ~6.6X average multiplier of increased performance
> measured on 3 different Intel products. Details below. The AVX-512 algorithm
> in C is a port of the ISA-L library [1] assembler code.
>
> Workload call size distribution details (write heavy):
>    * Average was approximately around 1,010 bytes per call
>    * ~80% of the calls were under 256 bytes
>    * ~20% of the calls were greater than or equal to 256 bytes up to the max buffer size of 8192

This is extremely workload dependent, it's not hard to find workloads with
lots of very small record and very few big ones...  What you observed might
have "just" been the warmup behaviour where more full page writes have to be
written.

There a very frequent call computing COMP_CRC32C over just 20 bytes, while
holding a crucial lock.  If we were to do introduce something like this
AVX-512 algorithm, it'd probably be worth to dispatch differently in case of
compile-time known small lengths.


How does the latency of the AVX-512 algorithm compare to just using the CRC32C
instruction?


FWIW, I tried the v2 patch on my Xeon Gold 5215 workstation, and dies early on
with SIGILL:

Program terminated with signal SIGILL, Illegal instruction.
#0  0x0000000000d5946c in _mm512_clmulepi64_epi128 (__A=..., __B=..., __C=0)
    at /home/andres/build/gcc/master/install/lib/gcc/x86_64-pc-linux-gnu/15/include/vpclmulqdqintrin.h:42
42      return (__m512i) __builtin_ia32_vpclmulqdq_v8di ((__v8di)__A,
(gdb) bt
#0  0x0000000000d5946c in _mm512_clmulepi64_epi128 (__A=..., __B=..., __C=0)
    at /home/andres/build/gcc/master/install/lib/gcc/x86_64-pc-linux-gnu/15/include/vpclmulqdqintrin.h:42
#1  pg_comp_crc32c_avx512 (crc=<optimized out>, data=<optimized out>, length=<optimized out>)
    at ../../../../../home/andres/src/postgresql/src/port/pg_crc32c_avx512.c:163
#2  0x0000000000819343 in ReadControlFile () at
../../../../../home/andres/src/postgresql/src/backend/access/transam/xlog.c:4375
#3  0x000000000081c4ac in LocalProcessControlFile (reset=<optimized out>) at
../../../../../home/andres/src/postgresql/src/backend/access/transam/xlog.c:4817
#4  0x0000000000a8131d in PostmasterMain (argc=argc@entry=85, argv=argv@entry=0x341b08f0)
    at ../../../../../home/andres/src/postgresql/src/backend/postmaster/postmaster.c:902
#5  0x00000000009b53fe in main (argc=85, argv=0x341b08f0) at
../../../../../home/andres/src/postgresql/src/backend/main/main.c:197


Cascade lake doesn't have vpclmulqdq, so we shouldn't be getting here...

This is on an optimied build with meson, with -march=native included in
c_flags.

Relevant configure output:

Checking if "XSAVE intrinsics without -mxsave" : links: NO (cached)
Checking if "XSAVE intrinsics with -mxsave" : links: YES (cached)
Checking if "AVX-512 popcount without -mavx512vpopcntdq -mavx512bw" : links: NO (cached)
Checking if "AVX-512 popcount with -mavx512vpopcntdq -mavx512bw" : links: YES (cached)
Checking if "_mm512_clmulepi64_epi128 ... with -msse4.2 -mavx512vl -mvpclmulqdq" : links: YES
Checking if "x86_64: popcntq instruction" compiles: YES (cached)


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: Shouldn't jsonpath .string() Unwrap?
Next
From: Jeff Davis
Date:
Subject: Re: Addressing SECURITY DEFINER Function Vulnerabilities in PostgreSQL Extensions