On Mon, Mar 18, 2024 at 12:30:04PM -0500, Nathan Bossart wrote:
> Here is a more fleshed-out version of what I believe David is proposing.
> On my machine, the gains aren't quite as impressive (~8.8s to ~5.2s for the
> test_popcount benchmark). I assume this is because this patch turns
> pg_popcount() into a function pointer, which is what the AVX512 patches do,
> too. I left out the 32-bit section from pg_popcount_fast(), but I'll admit
> that I'm not yet 100% sure that we can assume we're on a 64-bit system
> there.
>
> IMHO this work is arguably a prerequisite for the AVX512 work, as turning
> pg_popcount() into a function pointer will likely regress performance for
> folks on systems without AVX512 otherwise.
Apologies for the noise. I noticed that we could (and probably should)
inline the pg_popcount32/64 calls in the "slow" version, too.
--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com