On Sun, Jan 22, 2023 at 10:42 PM Joel Jacobson <
joel@compiler.org> wrote:
> I did write the code like you suggest first, but changed it,
> since I realised the extra "else if" needed could be eliminated,
> and thought div_var_int64() wouldn't be slower than div_var_int() since
> I thought 64-bit instructions in general are as fast as 32-bit instructions,
> on 64-bit platforms.
According to Agner's instruction tables [1], integer division on Skylake (for example) has a latency of 26 cycles for 32-bit operands, and 42-95 cycles for 64-bit.
[1]
https://www.agner.org/optimize/instruction_tables.pdf