On Tue, 2 Jul 2024 at 21:10, Joel Jacobson <joel@compiler.org> wrote: > > I found the bug in the case 3 code, > and it turns out the same type of bug also exists in the case 2 code: > > case 2: > newdig = (int) var1digits[1] * var2digits[res_ndigits - 4]; > > The problem here is that res_ndigits could become less than 4,
Yes. It can't be less than 3 though (per an earlier test), so the case 2 code was correct.
I've been hacking on this a bit and trying to tidy it up. Firstly, I moved it to a separate function, because it was starting to look messy having so much extra code in mul_var(). Then I added a bunch more comments to explain what's going on, and the limits of the various variables. Note that most of the boundary checks are actually unnecessary -- in particular all the ones in or after the main loop, provided you pull out the first 2 result digits from the main loop in the 3-digit case. That does seem to work very well, but...
I wasn't entirely happy with how messy that code is getting, so I tried a different approach. Similar to div_var_int(), I tried writing a mul_var_int() function instead. This can be used for 1 and 2 digit factors, and we could add a similar mul_var_int64() function on platforms with 128-bit integers. The code looks quite a lot neater, so it's probably less likely to contain bugs (though I have just written it in a hurry,so it might still have bugs). In testing, it seemed to give a decent speedup, but perhaps a little less than before. But that's to be balanced against having more maintainable code, and also a function that might be useful elsewhere in numeric.c.
Anyway, here are both patches for comparison. I'll stop hacking for a while and let you see what you make of these.