On Tue, 2 Jul 2024 at 08:50, Joel Jacobson <joel@compiler.org> wrote:
>
> On Tue, Jul 2, 2024, at 00:19, Dean Rasheed wrote:
>
> > Attachments:
> > * optimize-numeric-mul_var-small-var1-arbitrary-var2.patch.txt
>
Shortly after posting that, I realised that there was a small bug. This bit:
case 2:
newdig = (int) var1digits[1] * var2digits[res_ndigits - 4];
isn't quite right in the case where rscale is less than the full
result. In that case, the least significant result digit has a
contribution from one more var2 digit, so it needs to be:
newdig = (int) var1digits[1] * var2digits[res_ndigits - 4];
if (res_ndigits - 3 < var2ndigits)
newdig += (int) var1digits[0] * var2digits[res_ndigits - 3];
That doesn't noticeably affect the performance though. Update attached.
> I've benchmarked your patch on my three machines with great results.
> I added a setseed() step, to make the benchmarks reproducible,
> shouldn't matter much since it should statistically average out, but I thought why not.
Nice. The results on the Apple M3 Max look particularly impressive.
I think it'd probably be worth trying to extend this to 3 or maybe 4
var1 digits, since that would cover a lot of "everyday" sized numeric
values that a lot of people might be using. I don't think we should go
beyond that though.
Regards,
Dean