Re: Improve CRC32C performance on SSE4.2 - Mailing list pgsql-hackers

From Soumyadeep Chakraborty
Subject Re: Improve CRC32C performance on SSE4.2
Date
Msg-id CAE-ML+-X8mnx-AsD-9QtB7rkWvCmcb4+VJWOrg0KPu5K2mucSA@mail.gmail.com
Whole thread Raw
In response to Re: Improve CRC32C performance on SSE4.2  (Andy Fan <zhihuifan1213@163.com>)
List pgsql-hackers
On Tue, Jun 17, 2025 at 1:55 AM John Naylor <johncnaylorls@gmail.com> wrote:

I took the minimal repro from [1] and took a look at the code generated
between clang 17 -O0 [2] and clang 17 -O3 [3]. I saw that -O3 (and
actually -O1 and -O2) generated the following code for:

castval = _mm512_castsi128_si512(_mm_cvtsi32_si128(crc0));
x0 = _mm512_xor_si512(castval, x0);

vinserti128  ymm0, ymm0, xmmword ptr [rip + .LCPI1_0], 0
vpxorq  zmm0, zmm0, zmmword ptr [rdi]

Reading vpxorq's pseudocode [4], it seems that it zeroes out the leading
bits:

DEST[MAXVL-1:VL] := 0

Same thing for clang 17 -O0, if we are using _mm512_zextsi128_si512
instead [5] -  vpxor and vbroadcast128 are used which seem to also
zero out leading bits.

So, -O1..-O3 were indeed emitting instructions that zero-extend and, thus
avoiding the undefined behavior.

Regards,
Deep (VMware)

pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Next
From: Tom Lane
Date:
Subject: Re: ABI Compliance Checker GSoC Project