Re: Improve CRC32C performance on SSE4.2 - Mailing list pgsql-hackers

From John Naylor
Subject Re: Improve CRC32C performance on SSE4.2
Date
Msg-id CANWCAZZK0hVk-N71JsuXPOB0ALy8BBOfRjSA4Nz2Kpt2RCLU0Q@mail.gmail.com
Whole thread Raw
In response to Re: Improve CRC32C performance on SSE4.2  (John Naylor <johncnaylorls@gmail.com>)
Responses Re: Improve CRC32C performance on SSE4.2
List pgsql-hackers
On Tue, Jun 17, 2025 at 6:40 AM Andy Fan <zhihuifan1213@163.com> wrote:
>
> "Devulapalli, Raghuveer" <raghuveer.devulapalli@intel.com> writes:
>
> > Great catch! From the intrinsic manual:
> >
> > Cast vector of type __m128i to type __m512i; the upper 384 bits of the
> > result are undefined.

Thanks Raghuveer and Nathan, for the diagnosis!

> Just be curious, what kind of optimization (like what -O2 does) could
> mask this issue?

In case Andy is asking about "how" rather than "under what
circumstances", my guess is: -O1+  may have just chosen instructions
that also happen to zero-extend, which are common. -O0 doesn't
represent the naive straightforward structure of what the programmer
wrote, it's more like an "exploded" representation suitable for later
optimization passes. That's why it always looks goofy.

> > Replacing that with _mm512_zextsi128_si512 fixes the problem.

Here's a patch for testing, which also reverts the previous
workaround. Help welcome, but I still promise to test it in the near
future regardless.

--
John Naylor
Amazon Web Services

Attachment

pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Next
From: Dean Rasheed
Date:
Subject: Re: wrong comments in rewriteTargetListIU