Thread: pgsql: Fix compiler builtin usage in new pg_bitutils.c

pgsql: Fix compiler builtin usage in new pg_bitutils.c

From
Alvaro Herrera
Date:
Fix compiler builtin usage in new pg_bitutils.c

Split out these new functions in three parts: one in a new file that
uses the compiler builtin and gets compiled with the -mpopcnt compiler
option if it exists; another one that uses the compiler builtin but not
the compiler option; and finally the fallback with open-coded
algorithms.

Split out the configure logic: in the original commit, it was selecting
to use the -mpopcnt compiler switch together with deciding whether to
use the compiler builtin, but those two things are really separate.
Split them out.  Also, expose whether the builtin exists to
Makefile.global, so that src/port's Makefile can decide whether to
compile the hw-optimized file.

Remove CPUID test for CTZ/CLZ.  Make pg_{right,left}most_ones use either
the compiler intrinsic or open-coded algo; trying to use the
HW-optimized version is a waste of time.  Make them static inline
functions.

Discussion: https://postgr.es/m/20190213221719.GA15976@alvherre.pgsql

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/fc6c72747ae6db6909fcd7d1adbc3d923ec1fffa

Modified Files
--------------
config/c-compiler.m4            |  22 +--
configure                       |  66 +++++--
configure.in                    |   6 +-
src/Makefile.global.in          |   3 +
src/include/port/pg_bitutils.h  | 176 ++++++++++++++++++-
src/port/Makefile               |  16 +-
src/port/pg_bitutils.c          | 378 +++++-----------------------------------
src/port/pg_bitutils_hwpopcnt.c |  36 ++++
8 files changed, 327 insertions(+), 376 deletions(-)


Re: pgsql: Fix compiler builtin usage in new pg_bitutils.c

From
Alvaro Herrera
Date:
Hmm, this should fix the build, but I'm rushing out to lunch -- maybe
I'm missing something.

diff --git a/src/port/pg_bitutils.c b/src/port/pg_bitutils.c
index 97bfcebe4e1..e0198f3ab35 100644
--- a/src/port/pg_bitutils.c
+++ b/src/port/pg_bitutils.c
@@ -90,9 +90,11 @@ pg_popcount_available(void)
 static int
 pg_popcount32_choose(uint32 word)
 {
+#if defined(HAVE__GET_CPUID) || defined(HAVE__CPUID)
     if (pg_popcount_available())
         pg_popcount32 = pg_popcount32_hw;
     else
+#endif
         pg_popcount32 = pg_popcount32_builtin;
 
     return pg_popcount32(word);
@@ -178,9 +180,11 @@ pg_popcount(const char *buf, int bytes)
 static int
 pg_popcount64_choose(uint64 word)
 {
+#if defined(HAVE__GET_CPUID) || defined(HAVE__CPUID)
     if (pg_popcount_available())
         pg_popcount64 = pg_popcount64_hw;
     else
+#endif
         pg_popcount64 = pg_popcount64_builtin;
 
     return pg_popcount64(word);

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pgsql: Fix compiler builtin usage in new pg_bitutils.c

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Hmm, this should fix the build, but I'm rushing out to lunch -- maybe
> I'm missing something.

> diff --git a/src/port/pg_bitutils.c b/src/port/pg_bitutils.c
> index 97bfcebe4e1..e0198f3ab35 100644
> --- a/src/port/pg_bitutils.c
> +++ b/src/port/pg_bitutils.c
> @@ -90,9 +90,11 @@ pg_popcount_available(void)
>  static int
>  pg_popcount32_choose(uint32 word)
>  {
> +#if defined(HAVE__GET_CPUID) || defined(HAVE__CPUID)
>      if (pg_popcount_available())
>          pg_popcount32 = pg_popcount32_hw;
>      else
> +#endif
>          pg_popcount32 = pg_popcount32_builtin;

Meh.  I don't see why this entire function should exist if there
is nothing for it to do.  I'm inclined to think that somewhere
there needs to be a symbol NEED_POPCOUNT_CHOOSING that is only
enabled if we have all three of HAVE__BUILTIN_POPCOUNT, nonempty
CFLAGS_POPCOUNT, and HAVE__GET_CPUID || HAVE__CPUID.  Possibly
we should gate building of pg_bitutils_hwpopcnt.c that way too?
Not much point in building that file if we have no way to figure
out when to use it.

            regards, tom lane


Re: pgsql: Fix compiler builtin usage in new pg_bitutils.c

From
Alvaro Herrera
Date:
On 2019-Feb-15, Tom Lane wrote:

> Meh.  I don't see why this entire function should exist if there
> is nothing for it to do.  I'm inclined to think that somewhere
> there needs to be a symbol NEED_POPCOUNT_CHOOSING that is only
> enabled if we have all three of HAVE__BUILTIN_POPCOUNT, nonempty
> CFLAGS_POPCOUNT, and HAVE__GET_CPUID || HAVE__CPUID.  Possibly
> we should gate building of pg_bitutils_hwpopcnt.c that way too?
> Not much point in building that file if we have no way to figure
> out when to use it.

I'm done with this stuff.  Anybody feel free to run with it, but for a
barely noticeable performance improvement it's not going to be me.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pgsql: Fix compiler builtin usage in new pg_bitutils.c

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> I'm done with this stuff.  Anybody feel free to run with it, but for a
> barely noticeable performance improvement it's not going to be me.

Sure, I'll pick it up.  I agree it's probably a marginal performance
change, but it seems a shame to give up when we were 80% of the way
there.

            regards, tom lane


Re: pgsql: Fix compiler builtin usage in new pg_bitutils.c

From
Alvaro Herrera
Date:
On 2019-Feb-15, Tom Lane wrote:

> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> > I'm done with this stuff.  Anybody feel free to run with it, but for a
> > barely noticeable performance improvement it's not going to be me.
> 
> Sure, I'll pick it up.  I agree it's probably a marginal performance
> change, but it seems a shame to give up when we were 80% of the way
> there.

(BTW, my reading of the articles I cited, as well as my own runs of the
test programs therein, suggest that in order to get a really good
performance improvement you need to hand-code calls to the POPCNT
instruction in assembly rather than rely on the compiler intrinsics.  I
think there is^Wwas almost enough infrastructure to do that.  I'm not
sure that the operations being optimized by these changes are
interesting enough, in the grand scheme of things.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: pgsql: Fix compiler builtin usage in new pg_bitutils.c

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2019-Feb-15, Tom Lane wrote:
>> Sure, I'll pick it up.  I agree it's probably a marginal performance
>> change, but it seems a shame to give up when we were 80% of the way
>> there.

> (BTW, my reading of the articles I cited, as well as my own runs of the
> test programs therein, suggest that in order to get a really good
> performance improvement you need to hand-code calls to the POPCNT
> instruction in assembly rather than rely on the compiler intrinsics.  I
> think there is^Wwas almost enough infrastructure to do that.  I'm not
> sure that the operations being optimized by these changes are
> interesting enough, in the grand scheme of things.)

Yeah, that last point is a fair one; if the situation were static,
I'm not sure it'd be worth the trouble either.  But I think that
once we have these functions we may find more uses for them.  IIRC,
there's already one patch in the CF queue that wanted to use
__builtin_clz for something or other.

            regards, tom lane


Re: pgsql: Fix compiler builtin usage in new pg_bitutils.c

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> (BTW, my reading of the articles I cited, as well as my own runs of the
> test programs therein, suggest that in order to get a really good
> performance improvement you need to hand-code calls to the POPCNT
> instruction in assembly rather than rely on the compiler intrinsics.

That observation led me to think about using asm() instead of
__builtin_popcount + -mpopcnt, and I realized there are several
fewer moving parts if we do it that way: we don't need to worry
about the compiler switch, and we don't need to rely on faith that
it actually changes the emitted code, and we don't need a separate
source file to limit the scope of the switch.  And really, requiring
__builtin_popcount + -mpopcnt is pretty much restricting the
optimization to GCC-alikes anyway, so requiring asm() probably
doesn't eliminate any toolchains that would've handled the other way.
Hence, I made it work like that.  Committed with that and some cosmetic
cleanups.

            regards, tom lane