> At least on Sparc processors, v8 and newer, any double precision math
> (including longs) is performed with a single instruction, just like for
> a 32 bit datum. Loads and stores of 8 byte datums are also handled via
> a single instruction. The urban myth that 64bit math is
> different/better on a 64 bit processor is just that; yes, some lower
> end processors would emulate/trap those instructions but that an
> implementation detail, not architecture. I believe that this is all
> true for other RISC processors as well.
>
> The 64bit API on UltraSparcs does bring along some extra FP registers
> IIRC.
It's very different on x86.
64-bit x86 like the Opteron has more registers, which are very scarce on
the base x86 (8 I think). This alone is very important. There are other
factors as well.
> Solaris, at least, provided support for far more than 4GB of physical
> memory on 32 bit kernels. A newer 64 bit kernel might be more
> efficient, but that's just because the time was taken to support large
> page sizes and more efficient data structures. It's nothing intrinsic
> to a 32 vs 64 bit kernel.
Well, on a large working set, a processor which can directly address more
than 4GB of memory will be a lot faster than one which can't, and has to
play with the MMU and paging units !