Re: refactor architecture-specific popcount code - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: refactor architecture-specific popcount code
Date
Msg-id aYO9llwttJEtl7er@nathan
Whole thread Raw
In response to Re: refactor architecture-specific popcount code  (John Naylor <johncnaylorls@gmail.com>)
Responses Re: refactor architecture-specific popcount code
List pgsql-hackers
On Wed, Feb 04, 2026 at 12:13:35PM +0700, John Naylor wrote:
> - /*
> - * We set the threshold to the point at which we'll first use special
> - * instructions in the optimized version.
> - */
> [...]
> 
> It seems like we should still have some kind of comment here. Even
> just to say the 8 value was found through testing. (It was some time
> ago, right?)

Yeah, it needs a comment.  I recall doing a lot of testing for small
inputs since those are what tended to regress.

> It was intentional that my PoC pg_popcount32 was simple, pure C and
> didn't have #ifdefs anymore, and I'd prefer to go back to that. This
> function is now only used for single-word bitmapsets on 32-bit
> platforms and in an assert, and even if that changes we've shown that
> inlining bitwise ops for a single word is already pretty good for
> performance. Plus, doesn't this cause gcc generate a function call on
> 32-bit x86 because "!defined(__x86_64__)" is true? That defeats the
> whole purpose of inlining in the first place. Simple is good here.

Agreed and done.

> +#elif defined(_MSC_VER)
> +   return __popcnt64(word);
> 
> The commit message says "converts the portable ones to inlined
> functions", but this was copied from a architecture specific file with
> a runtime check. I've seen an assertion in this thread that the
> hardware instruction is required for some Windows version, but it
> would be nice to have a link to documentation for the archives. More
> worryingly, this is almost certainly broken on 32-bit, and the
> buildfarm won't tell us -- please see commit
> 53ea2b7ad050ce4ad95c89bb55197209b65886a1 and bug report that led to
> it. Seems like material for a separate commit.

Sure.  I'm tempted to suggest that we only use the plain C version here,
too.  The SSE4.2 bms_num_members() test I did yesterday used it and showed
improvement at one word.  If we do that, we can rip out even more code
since we no longer need the popcount built-ins.

* tests plain C version on an Apple M3 *

Yeah, the plain C version might be marginally slower than the built-in
version for that test, but it still seems quite a bit faster than HEAD.

    HEAD  v8  v10
      40  25   29

We probably want to re-add the Neon version of pg_popcount64() so that
pg_popcount_neon() and pg_popcount_masked_neon() can use it, but that's
easy enough.

> - for (int i = 0; i < RT_BM_IDX(RT_NODE_MAX_SLOTS); i++)
> - cnt += bmw_popcount(n256->isset[i]);
> + cnt += pg_popcount((const char *) n256->isset,
> +    RT_NODE_MAX_SLOTS / BITS_PER_BYTE);
> 
> This can now be "cnt =". The initialization to zero is now
> unnecessary, but it's also harmless.

Fixed.  The only reason I didn't make that change earlier was to keep the
patch tidy.

I haven't updated the commit messages yet.  Once the code is ready to go,
I'll give those another try.

[0] https://postgr.es/m/aYJeGs-xz3NEEdSe%40nathan

-- 
nathan

Attachment

pgsql-hackers by date:

Previous
From: "Euler Taveira"
Date:
Subject: Re: Pasword expiration warning
Next
From: Nathan Bossart
Date:
Subject: Re: Pasword expiration warning