Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> (BTW, my reading of the articles I cited, as well as my own runs of the
> test programs therein, suggest that in order to get a really good
> performance improvement you need to hand-code calls to the POPCNT
> instruction in assembly rather than rely on the compiler intrinsics.
That observation led me to think about using asm() instead of
__builtin_popcount + -mpopcnt, and I realized there are several
fewer moving parts if we do it that way: we don't need to worry
about the compiler switch, and we don't need to rely on faith that
it actually changes the emitted code, and we don't need a separate
source file to limit the scope of the switch. And really, requiring
__builtin_popcount + -mpopcnt is pretty much restricting the
optimization to GCC-alikes anyway, so requiring asm() probably
doesn't eliminate any toolchains that would've handled the other way.
Hence, I made it work like that. Committed with that and some cosmetic
cleanups.
regards, tom lane