On Tue, Aug 3, 2021 at 5:03 AM David Rowley <
dgrowleyml@gmail.com> wrote:
>
> Going by [1], it looks like we can use the __popcnt and __popcnt64
> intrinsic functions on MSVC if the CPU supports POPCNT. We already
> have code to check for that, we just need to enable it on MSVC.
>
> The attached patch seems to be all that's needed.
+1 for the concept, but the coding is a bit strange. Granted, the way we handle popcnt is a bit strange, but this makes it stranger:
1. the __popcnt64() intrinsic is put inside pg_popcount64_asm(), which is a bit of a misnomer since it's not assembly. Renaming s/_asm/_fast/ would help it look better. But then looking around, other platforms have intrinsics coded, but for some reason they're put in pg_popcount64_slow(), where the compiler will emit instructions we could easily write ourselves in C (and without #ifdefs) since without the right CFLAGS these intrinsics won't emit a popcnt instruction. Is MSVC different in that regard? If so, it might be worth a comment.
2. (defined(_MSC_VER) && defined(_WIN64) lead to a runtime check for the CPUID, which is fine, but now next to it HAVE_X86_64_POPCNTQ looks strange because the latter symbol comes from a specific configure test -- the two don't seem equivalent, but maybe they are because of what MSVC does? That would be nice to spell out here.
I know the present state of affairs works a bit different than what you proposed a couple years ago and that something had to change for portability reasons I didn't quite understand, but I think if we change things here we should at least try to have things fit together a bit more nicely.
(Side note, but sort of related to #1 above: non-x86 platforms have to indirect through a function pointer even though they have no fast implementation to make it worth their while. It would be better for them if the "slow" implementation was called static inline or at least a direct function call, but that's a separate thread.)