On 12/11/06, Daniel van Ham Colchete <daniel.colchete@gmail.com> wrote:
> But, trust me on this one. It's worth it. Think of this: PostgreSQL
> and GNU LibC use a lot of complex algorithms: btree, hashes,
> checksums, strings functions, etc... And you have a lot of ways to
> compile it into binary code. Now you have Pentium4's vectorization
> that allow you to run plenty of instructions in paralell, but AMD
> doesn't have this. Intel also have SSE2 that makes double-precision
> floatpoint operations a lot faster, AMD also doesn't have this (at
> least on 32bits). Now imagine that you're RedHat and that you have to
> deliver one CD to AMD and Intel servers. That means you can't use any
> AMD-specific or Intel-specific tecnology at the binary level.
AMD processors since the K6-2 and I think Intel ones since P-Pro are
essentially RISC processors with a hardware microcode compiler that
translates and reorganizes instructions on the fly. Instruction
choice and ordering was extremely important in older 32 bit
architectures (like the 486) but is much less important these days. I
think you will find that an optimized glibc might be faster in
specific contrived cases, the whole is unfortunately less than the sum
of its parts.
While SSE2 might be able to optimize things like video decoding and
the like, for most programs it's of little benifit and IMO a waste of
time. Also as others pointed out things like cache hits/misses and
i/o considerations are actually much more important than instruction
execution speed. We ran Gentoo here for months and did not to be
faster enough to merit the bleeding edge quirks it has for production
environments.
If you dig assembly, there was an interesting tackle of the spinlocks
code on the hackers list last year IIRC.
merlin