On Tue, Apr 02, 2024 at 05:01:32PM -0500, Nathan Bossart wrote:
> In v21, 0001 is just the above inlining idea, which seems worth doing
> independent of $SUBJECT. 0002 and 0003 are the AVX-512 patches, which I've
> modified similarly to 0001, i.e., I've inlined the "fast" version in the
> function pointer to avoid the function call overhead when there are fewer
> than 64 bytes. All of this overhead juggling should result in choosing the
> optimal popcount implementation depending on how many bytes there are to
> process, roughly speaking.
Sorry for the noise. I noticed a couple of silly mistakes immediately
after sending v21.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com