On Tue, Aug 3, 2021 at 10:43 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
> (Side note, but sort of related to #1 above: non-x86 platforms have to indirect through a function pointer even
thoughthey have no fast implementation to make it worth their while. It would be better for them if the "slow"
implementationwas called static inline or at least a direct function call, but that's a separate thread.)
+1
I haven't looked into whether we could benefit from it in real use
cases, but it seems like it'd also be nice if pg_popcount() were a
candidate for auto-vectorisation and inlining. For example, NEON has
vector popcount, and for Intel/AMD there is a shuffle-based AVX2 trick
that at least Clang produces automatically[1]. We're obstructing that
by doing function dispatch at individual word level, and using inline
assembler instead of builtins.
[1] https://arxiv.org/abs/1611.07612