Hello hackers,
This patch adds a mul_var_large() that is dispatched to from mul_var()
for var1ndigits >= 8, regardless of rscale.
The main idea with mul_var_large() is to reduce the "n" in O(n^2) by a factor
of two.
This is achieved by first converting the (ndigits) number of int16 NBASE digits,
to (ndigits/2) number of int32 NBASE^2 digits, as well as upgrading the
int32 variables to int64-variables so that the products and carry values fit.
The existing mul_var() algorithm is then executed without structural change.
Finally, the int32 NBASE^2 result digits are converted back to twice the number
of int16 NBASE digits.
This adds overhead of approximately 4 * O(n), due to the conversion.
Benchmarks indicates mul_var_large() starts to be beneficial when
var1 is at least 8 ndigits, or perhaps a little more.
Definitively an interesting speed-up for 100 ndigits and above.
Benchmarked on Apple M3 Max so far:
-- var1ndigits == var2ndigits == 10
SELECT COUNT(*) FROM n_10 WHERE product = var1 * var2;
Time: 3957.740 ms (00:03.958) -- HEAD
Time: 3943.795 ms (00:03.944) -- mul_var_large
-- var1ndigits == var2ndigits == 100
SELECT COUNT(*) FROM n_100 WHERE product = var1 * var2;
Time: 1532.594 ms (00:01.533) -- HEAD
Time: 1065.974 ms (00:01.066) -- mul_var_large
-- var1ndigits == var2ndigits == 1000
SELECT COUNT(*) FROM n_1000 WHERE product = var1 * var2;
Time: 3055.372 ms (00:03.055) -- HEAD
Time: 2295.888 ms (00:02.296) -- mul_var_large
-- var1ndigits and var2ndigits completely random,
-- with random number of decimal digits
SELECT COUNT(*) FROM n_mixed WHERE product = var1 * var2;
Time: 46796.240 ms (00:46.796) -- HEAD
Time: 27970.006 ms (00:27.970) -- mul_var_large
-- var1ndigits == var2ndigits == 16384
SELECT COUNT(*) FROM n_max WHERE product = var1 * var2;
Time: 3191.145 ms (00:03.191) -- HEAD
Time: 1836.404 ms (00:01.836) -- mul_var_large
Regards,
Joel