On Fri, 5 Jul 2024 at 12:56, Joel Jacobson <joel@compiler.org> wrote:
>
> Interesting you got so bad bench results for v6-mul_var_int64.patch
> for var1ndigits=4, that patch is actually the winner on AMD Ryzen 9 7950X3D.
Interesting.
> On Intel Core i9-14900K the winner is v6-optimize-numeric-mul_var-small-var1-arbitrary-var2.patch.
That must be random noise, since
v6-optimize-numeric-mul_var-small-var1-arbitrary-var2.patch doesn't
invoke mul_var_small() for 4-digit inputs.
> On Apple M3 Max, HEAD is the winner.
Importantly, mul_var_int64() is around 1.25x slower there, and it was
even worse on my machine.
Attached is a v7 mul_var_small() patch adding 4-digit support. For me,
this gives a nice speedup:
SELECT SUM(var1*var2) FROM bench_mul_var_var1ndigits_4;
Time: 5617.150 ms (00:05.617) -- HEAD
Time: 8203.081 ms (00:08.203) -- v6-mul_var_int64.patch
Time: 4750.212 ms (00:04.750) -- v7-mul_var_small.patch
The other advantage, of course, is that it doesn't require 128-bit
integer support.
Regards,
Dean