Re: Using POPCNT and other advanced bit manipulation instructions - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Using POPCNT and other advanced bit manipulation instructions
Date
Msg-id 14024.1550119448@sss.pgh.pa.us
Whole thread Raw
In response to Re: Using POPCNT and other advanced bit manipulation instructions  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: Using POPCNT and other advanced bit manipulation instructions  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Thomas Munro <thomas.munro@enterprisedb.com> writes:
> On Thu, Feb 14, 2019 at 4:38 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'd be inclined to rip out all of the run-time-detection logic here;
>> I doubt any of it is buying anything that's worth the price of an
>> indirect call.

> No view on that but apparently there were Intel Atom and AMD C chips
> sold in the early part of this decade that lack POPCNT so I suspect
> the distros can't ship software that requires it with no fallback.

Ah, I was not looking at the business with the optional -mpopcnt
compiler flag.  I agree that we probably should not assume that
code compiled with that will run anywhere.  But it's silly to build
all this infrastructure and then throw away the opportunity to
optimize for anything but late-model Intel.

A survey of the buildfarm results so far says that __builtin_clz
and __builtin_ctz exist just about everywhere, and even
__builtin_popcount is available on some non-Intel architectures.
It is reasonable to assume that those builtins are faster than
the C equivalents if they exist.  It's reasonable to assume that
even on old-school Intel hardware.

The way this should have been done is to have a separate file
that's compiled with -mpopcnt if the compiler has that (and
has the builtins), and for the mainline file to have "slow"
versions that use the less-optimized builtins if available,
and only fall back to raw C code if not HAVE__BUILTIN_WHATEVER.

Also, in

#if defined(HAVE__GET_CPUID) && defined(HAVE__BUILTIN_POPCOUNT)

static bool
pg_popcount_available(void)
{
    unsigned int exx[4] = { 0, 0, 0, 0 };

#if defined(HAVE__GET_CPUID)
    __get_cpuid(1, &exx[0], &exx[1], &exx[2], &exx[3]);
#elif defined(HAVE__CPUID)
    __cpuid(exx, 1);
#else
#error cpuid instruction not available
#endif

    return (exx[2] & (1 << 23)) != 0;    /* POPCNT */
}
#endif

it's obvious to the naked eye that the __cpuid() and #error
branches are unreachable because of the outer #if.  I don't
think that was the design intention.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Using POPCNT and other advanced bit manipulation instructions
Next
From: Tom Lane
Date:
Subject: Re: Using POPCNT and other advanced bit manipulation instructions