On Wed, 20 Mar 2024 at 11:56, Amonson, Paul D <paul.d.amonson@intel.com> wrote:
> Changed in this patch set.
>
> * Rebased.
> * Direct *slow* calls via macros as shown in example patch.
> * Changed the choose filename to be platform specific as suggested.
> * Falls back to intermediate "Fast" methods if AVX512 is not available at runtime.
> * inline used where is makes sense, remember using "extern" negates "inline".
I'm not sure about this "extern negates inline" comment. It seems to
me the compiler is perfectly free to inline a static function into an
external function and it's free to inline the static function
elsewhere within the same .c file.
The final sentence of the following comment that the 0001 patch
removes explains this:
/*
* When the POPCNT instruction is not available, there's no point in using
* function pointers to vary the implementation between the fast and slow
* method. We instead just make these actual external functions when
* TRY_POPCNT_FAST is not defined. The compiler should be able to inline
* the slow versions here.
*/
Also, have a look at [1]. You'll see f_slow() wasn't even compiled
and the code was just inlined into f(). I just added the
__attribute__((noinline)) so that usage() wouldn't just perform
constant folding and just return 6.
I think, unless you have evidence that some common compiler isn't
inlining the static into the extern then we shouldn't add the macros.
It adds quite a bit of churn to the patch and will break out of core
code as you no longer have functions named pg_popcount32(),
pg_popcount64() and pg_popcount().
David
[1] https://godbolt.org/z/6joExb79d