call popcount32/64 directly on non-x86 platforms - Mailing list pgsql-hackers

From John Naylor
Subject call popcount32/64 directly on non-x86 platforms
Date
Msg-id CAFBsxsE7otwnfA36Ly44zZO+b7AEWHRFANxR1h1kxveEV=ghLQ@mail.gmail.com
Whole thread Raw
Responses Re: call popcount32/64 directly on non-x86 platforms  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers
Currently, all platforms must indirect through a function pointer to call popcount on a word-sized input, even though we don't arrange for a fast implementation on non-x86 to make it worthwhile.

0001 moves some declarations around so that "slow" popcount functions are called directly on non-x86 platforms.

0002 was an idea to simplify and unify the coding for the slow functions.

Also attached is a test module for building microbenchmarks.

On a Power8 machine using gcc 4.8, and running
time ./inst/bin/psql -c 'select drive_popcount(100000, 1024)'

I get

master: 647ms
0001: 183ms
0002: 228ms

So 0001 is a clear winner on that platform. 0002 is still good, but slower than 0001 for some reason, and it turns out that on master, gcc does emit a popcnt instruction from the intrinsic:

0000000000000000 <pg_popcount32_slow>:
   0:   f4 02 63 7c     popcntw r3,r3
   4:   b4 07 63 7c     extsw   r3,r3
   8:   20 00 80 4e     blr
        ...

The gcc docs mention a flag for this, but I'm not sure why it seems not to need it:

https://gcc.gnu.org/onlinedocs/gcc/RS_002f6000-and-PowerPC-Options.html#RS_002f6000-and-PowerPC-Options

Maybe that's because the machine I used was ppc64le, but I'm not sure a ppc binary built like this is portable to other hardware. For that reason, maybe 0002 is a good idea. 

--
Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Next Steps with Hash Indexes
Next
From: Suraj Khamkar
Date:
Subject: Re: Tab completion for CREATE SCHEMAAUTHORIZATION