Hello Tom,
>> Which architecture has single cycle division? I think it's way above
>> that, based on profiles I've seen. And Agner seems to back me up:
>> https://www.agner.org/optimize/instruction_tables.pdf
>> That lists a 32/64 idiv with a latency of ~26/~42-95 cycles, even on a
>> moder uarch like skylake-x.
>
> Huh. I figured Intel would have thrown sufficient transistors at that
> problem by now.
It is not just a problem of number of transistors, division is
intrisically iterative (with various kind of iterations used in division
algorithms), involving some level of guessing and other arithmetics, so
the latency can only be bad, and the possibility of implementing that in 1
cycle at 3 GHz looks pretty remote.
--
Fabien.