Re: Use POPCNT on MSVC - Mailing list pgsql-hackers

From David Rowley
Subject Re: Use POPCNT on MSVC
Date
Msg-id CAApHDvrONNcYxGV6C0O3ZmaL0BvXBWY+rBOCBuYcQVUOURwhkA@mail.gmail.com
Whole thread Raw
In response to Re: Use POPCNT on MSVC  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: Use POPCNT on MSVC  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers
On Tue, 3 Aug 2021 at 22:43, John Naylor <john.naylor@enterprisedb.com> wrote:
> 1. the __popcnt64() intrinsic is put inside pg_popcount64_asm(), which is a bit of a misnomer since it's not
assembly.Renaming s/_asm/_fast/ would help it look better. But then looking around, other platforms have intrinsics
coded,but for some reason they're put in pg_popcount64_slow(), where the compiler will emit instructions we could
easilywrite ourselves in C (and without #ifdefs) since without the right CFLAGS these intrinsics won't emit a popcnt
instruction.Is MSVC different in that regard? If so, it might be worth a comment. 

Yeah, the names no longer fit so well after stuffing the MSVC
intrinsic into the asm function.  The reason I did it that way is down
to what I read in the docs. Namely:

"If you run code that uses these intrinsics on hardware that doesn't
support the popcnt instruction, the results are unpredictable."

So, it has to go somewhere where we're sure the CPU supports POPCNT
and that seemed like the correct place.

From what I understand of GCC and __builtin_popcountl(), the code
it'll output will depend on the -mpopcnt flag.  So having
__builtin_popcountl() does not mean the popcnt instruction will be
used. The Microsoft documentation indicates that's not the case for
__popcnt().

> 2. (defined(_MSC_VER) && defined(_WIN64)  lead to a runtime check for the CPUID, which is fine, but now next to it
HAVE_X86_64_POPCNTQlooks strange because the latter symbol comes from a specific configure test -- the two don't seem
equivalent,but maybe they are because of what MSVC does? That would be nice to spell out here. 

The problem there is that we define HAVE_X86_64_POPCNTQ based on the
outcome of configure so it does not get set for MSVC.  Maybe it's
worth coming up with some other constant that can be defined or we
could just do:

#if defined(_MSC_VER) && defined(_WIN64)
#define HAVE_X86_64_POPCNTQ
#endif

> I know the present state of affairs works a bit different than what you proposed a couple years ago and that
somethinghad to change for portability reasons I didn't quite understand, but I think if we change things here we
shouldat least try to have things fit together a bit more nicely. 

I think the reason for the asm is that __builtin_popcountl does not
mean popcnt will be used. Maybe we could have done something like
compile pg_bitutils.c with -mpopcnt, and have kept the run-time check,
but the problem there is that means that the compiler might end up
using that instruction in some other function in that file that we
don't want it to. It looks like my patch in [1] did pass the -mpopcnt
flag which Tom fixed.

> (Side note, but sort of related to #1 above: non-x86 platforms have to indirect through a function pointer even
thoughthey have no fast implementation to make it worth their while. It would be better for them if the "slow"
implementationwas called static inline or at least a direct function call, but that's a separate thread.) 

hmm yeah. I see there's a few more usages of pg_popcount() than when I
looked last.  It would be fairly easy to get rid of the
pg_popcount64/pg_popcount32 function call in that. A separate patch
though.

David

[1] https://www.postgresql.org/message-id/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Failed transaction statistics to measure the logical replication progress
Next
From: Greg Nancarrow
Date:
Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump