On Tue, Jun 17, 2025 at 6:40 AM Andy Fan <zhihuifan1213@163.com> wrote:
>
> "Devulapalli, Raghuveer" <raghuveer.devulapalli@intel.com> writes:
>
> > Great catch! From the intrinsic manual:
> >
> > Cast vector of type __m128i to type __m512i; the upper 384 bits of the
> > result are undefined.
Thanks Raghuveer and Nathan, for the diagnosis!
> Just be curious, what kind of optimization (like what -O2 does) could
> mask this issue?
In case Andy is asking about "how" rather than "under what
circumstances", my guess is: -O1+ may have just chosen instructions
that also happen to zero-extend, which are common. -O0 doesn't
represent the naive straightforward structure of what the programmer
wrote, it's more like an "exploded" representation suitable for later
optimization passes. That's why it always looks goofy.
> > Replacing that with _mm512_zextsi128_si512 fixes the problem.
Here's a patch for testing, which also reverts the previous
workaround. Help welcome, but I still promise to test it in the near
future regardless.
--
John Naylor
Amazon Web Services