On Tue, 9 Jul 2024 at 10:11, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> OK, I have committed this.
>
Before considering the other patches to optimise for larger inputs, I
think it's worth optimising the existing mul_var() code as much as
possible.
One thing I noticed while testing the earlier patches on this thread
was that they were significantly faster if they used unsigned integers
rather than signed integers. I think the reason is that operations
like "x / 10000" and "x % 10000" use fewer CPU instructions (on every
platform, according to godbolt.org) if x is unsigned.
In addition, this reduces the number of times the digit array needs to
be renormalised, which seems to be the biggest factor.
Another small optimisation that seems to be just about worthwhile is
to pull the first digit of var1 out of the main loop, so that its
contributions can be set directly in dig[], rather than being added to
it. This allows palloc() to be used to allocate dig[], rather than
palloc0(), and only requires the upper part of dig[] to be initialised
to zeros, rather than all of it.
Together, these seem to give a decent speed-up:
NBASE digits | HEAD rate | patch rate
--------------+---------------+---------------
5 | 5.8797105e+06 | 6.047134e+06
12 | 4.140017e+06 | 4.3429845e+06
25 | 2.5711072e+06 | 2.7530615e+06
50 | 1.0367389e+06 | 1.3370771e+06
100 | 367924.1 | 462437.38
250 | 77231.32 | 104593.95
2500 | 881.48694 | 1197.4739
15000 | 25.064987 | 32.78391
The largest gains are above around 50 NBASE digits, where the time
spent renormalising dig[] becomes significant.
Regards,
Dean