Hi Raghuveer,
You raised some interesting points, which deserve a thoughtful
response. After sleeping on it, however I came to the conclusion that
a sweeping change in runtime checks, with either of our approaches,
has downsides and unresolved questions. Perhaps we can come back to it
at a later time. For this release cycle, I took a step back and tried
to think of the least invasive way to solve the immediate problem,
which is: How to allow existing builds with "-msse4.2" to take
advantage of CLMUL while not adding overhead. Here's what I came up
with in v11:
0001: same benchmark module as before
0002: For SSE4.2 builds, arrange so that constant input uses an
inlined path so that the compiler can emit unrolled loops anywhere.
This is particularly important for the WAL insertion lock, so this is
possibly committable on its own just for that.
0003: the PCLMUL path, only for runtime-check builds
0004: the PCLMUL path for SSE4.2 builds. This uses a function pointer
for long-ish input and the same above inlined path for short input
(whether constant or not). So it gets the best of both worlds.
There is also a separate issue:
On Tue, Feb 25, 2025 at 6:05 PM John Naylor <johncnaylorls@gmail.com> wrote:
>
> Another thing I found in Agner's manuals: AMD Zen, even as recently as
> Zen 4, don't have as good a microarchitecture for PCLMUL, so if anyone
> with such a machine would like to help test the cutoff
David Rowley shared some results off-list, which are: Zen 4 is very
good with this algorithm even at 64 bytes input length, but Zen 2
regresses up to maybe 256 bytes. Having a large cutoff to cover all
bases makes this less useful, and that was one of my reservations
about AVX-512. However, with the corsix generator I found it's
possible to specify AVX-512 with a single accumulator (rather than 4),
which still gives a minimum input of 64 bytes, so I'll plan to put
something together to demonstrate.
(Older machines could use the 3-way stream algorithm as a fallback on
long inputs, as I've mentioned elsewhere, assuming that's legally
unencumbered.)
--
John Naylor
Amazon Web Services