On Mon, Mar 30, 2026 at 10:01 PM Ants Aasma <ants.aasma@cybertec.at> wrote:
>
> On Mon, 30 Mar 2026 at 15:01, John Naylor <johncnaylorls@gmail.com> wrote:
> > I don't remember the last time anyone did measurements, so I went
> > ahead and did that:
> >
> > master: 945ms
> > 32 AVX2: 335ms
> > 64 AVX2: 220ms
>
> I'm guessing this is on a recent Intel. Any extra width is helpful on Intel as they doubled vpmulld latency from
underus after we had settled on this algorithm.
It's actually ancient and due to be replaced soon, but still several
years after the adoption of this algorithm.
> FWIW I think AVX2 (x86-64-v3) is fine.
Glad to hear it, although the patch doesn't use that build flag, so
it's not impossible there is some additional difference in the
compiler's model. Still, given the variation you found, I'll make sure
the commit message says "several time faster" so it's not specific to
my hardware.
--
John Naylor
Amazon Web Services