Re: Using POPCNT and other advanced bit manipulation instructions - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Using POPCNT and other advanced bit manipulation instructions
Date
Msg-id 4022.1550185074@sss.pgh.pa.us
Whole thread Raw
In response to Re: Using POPCNT and other advanced bit manipulation instructions  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: Using POPCNT and other advanced bit manipulation instructions
Re: Using POPCNT and other advanced bit manipulation instructions
List pgsql-hackers
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> That leads me to the attached patch.  It creates a new file
> pg_popcount.c which is the only one compiled with -mpopcnt (if
> available); if there's no compiler switch to enable POPCNT, we just
> don't compile the file.  I'm not sure that's kosher -- in particular I'm
> not sure if it can fail when POPCNT is enabled by other flags and
> -mpopcnt is not needed at all.  I think our c-compiler.m4 stuff is a bit
> too simplistic there: it just assumes that -mpopcnt is always required.

Yes, the configure test for this stuff is really pretty broken.
It's conflating two nearly independent questions: (1) does the compiler
have __builtin_popcount(), and (2) does the compiler accept -mpopcnt.
It is certainly the case that (1) may hold without (2); in fact, every
recent non-x86_64 gcc is a counterexample to how that's done in HEAD.

I think we need a clean test for __builtin_popcount(), and to be willing
to use it if available, independently of -mpopcnt.  Then separately we
should test to see if -mpopcnt works, probably with the same
infrastructure we use for checking for other compiler flags, viz

   # Optimization flags for specific files that benefit from vectorization
   PGAC_PROG_CC_VAR_OPT(CFLAGS_VECTOR, [-funroll-loops])
   PGAC_PROG_CC_VAR_OPT(CFLAGS_VECTOR, [-ftree-vectorize])
+  # Optimization flags for bit-twiddling
+  PGAC_PROG_CC_VAR_OPT(CFLAGS_POPCNT, [-mpopcnt])
   # We want to suppress clang's unhelpful unused-command-line-argument warnings

Then the correct test to see if we want to build pg_popcount.c (BTW,
please pick a less generic name for that) and the choose function
is whether we have *both* HAVE__BUILTIN_POPCOUNT and nonempty
CFLAGS_POPCNT.

I don't think this'd be fooled by user-specified CFLAGS.  The worst
possible outcome is that it builds a function that we intended would
use POPCNT but it's falling back to some other implementation, in
case the compiler has a switch named -mpopcnt but it doesn't do what
we think it does, or the user overrode things by adding -mno-popcnt.
That would really be nearly cost-free, other than the overhead of
the choose function the first time through: both of the execution
functions would be reducing to __builtin_popcount(), for whatever
version of that the compiler is giving us, so the choice wouldn't
matter.

            regards, tom lane


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Using POPCNT and other advanced bit manipulation instructions
Next
From: Michael Paquier
Date:
Subject: Re: [PATCH] xlogreader: do not read a file block twice