Re: refactor architecture-specific popcount code - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: refactor architecture-specific popcount code
Date
Msg-id aYEqini0ukxQv2_D@nathan
Whole thread Raw
In response to Re: refactor architecture-specific popcount code  (John Naylor <johncnaylorls@gmail.com>)
Responses Re: refactor architecture-specific popcount code
List pgsql-hackers
On Mon, Feb 02, 2026 at 09:16:42PM +0700, John Naylor wrote:
> It might be a good idea to do a little new testing, and I see a use
> for a special 8-byte path independent of AVX512: v6 seems to regress a
> little for single-words. But, it turns out that when gcc turns
> __builtin_popcountl into a single instruction, it's inline, but if it
> emits portable bitwise ops, it does so in a function called
> __popcountdi2(). That can be avoided by hand-coding in C for normal
> builds (and for 32-bit looks cleaner anyway), as in the attached 0005.

Oh, interesting.  I looked into this a little more [0].  Both gcc and clang
generate cnt instructions for aarch64, so we're good there.  However, clang
on x86-64 generates the bit-twiddling version, and gcc on x86-64 generates
a call to __popcountdi2() (which I imagine does something similar).  It's
not until you provide a compiler flag like -march=x86-64-v2 that gcc/clang
start generating popcnt instructions for x86-64, which makes sense.  0005
seems like the correct move to me...

[0] https://godbolt.org/z/he3WozG3E

-- 
nathan



pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Next
From: Radim Marek
Date:
Subject: Non-deterministic buffer counts reported in execution with EXPLAIN ANALYZE BUFFERS