On Thu, Mar 21, 2024 at 12:09:44PM -0500, Nathan Bossart wrote:
> On Thu, Mar 21, 2024 at 11:30:30AM +0700, John Naylor wrote:
>> Further, now that the algorithm is more SIMD-appropriate, I wonder
>> what doing 4 registers at a time is actually buying us for either SSE2
>> or AVX2. It might just be a matter of scale, but that would be good to
>> understand.
>
> I'll follow up with these numbers shortly.
It looks like the 4-register code still outperforms the 2-register code,
except for a handful of cases where there aren't many elements.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com