Thread: Use POPCNT on MSVC

Re: Use POPCNT on MSVC

From

John Naylor

Date:

03 August 2021, 13:43:25

On Tue, Aug 3, 2021 at 5:03 AM David Rowley <dgrowleyml@gmail.com> wrote:
>
> Going by [1], it looks like we can use the __popcnt and __popcnt64
> intrinsic functions on MSVC if the CPU supports POPCNT. We already
> have code to check for that, we just need to enable it on MSVC.
>
> The attached patch seems to be all that's needed.

+1 for the concept, but the coding is a bit strange. Granted, the way we handle popcnt is a bit strange, but this makes it stranger:

1. the __popcnt64() intrinsic is put inside pg_popcount64_asm(), which is a bit of a misnomer since it's not assembly. Renaming s/_asm/_fast/ would help it look better. But then looking around, other platforms have intrinsics coded, but for some reason they're put in pg_popcount64_slow(), where the compiler will emit instructions we could easily write ourselves in C (and without #ifdefs) since without the right CFLAGS these intrinsics won't emit a popcnt instruction. Is MSVC different in that regard? If so, it might be worth a comment.

2. (defined(_MSC_VER) && defined(_WIN64) lead to a runtime check for the CPUID, which is fine, but now next to it HAVE_X86_64_POPCNTQ looks strange because the latter symbol comes from a specific configure test -- the two don't seem equivalent, but maybe they are because of what MSVC does? That would be nice to spell out here.

I know the present state of affairs works a bit different than what you proposed a couple years ago and that something had to change for portability reasons I didn't quite understand, but I think if we change things here we should at least try to have things fit together a bit more nicely.

(Side note, but sort of related to #1 above: non-x86 platforms have to indirect through a function pointer even though they have no fast implementation to make it worth their while. It would be better for them if the "slow" implementation was called static inline or at least a direct function call, but that's a separate thread.)

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Use POPCNT on MSVC

From

Thomas Munro

Date:

04 August 2021, 04:05:06

On Tue, Aug 3, 2021 at 10:43 PM John Naylor
<john.naylor@enterprisedb.com> wrote:
> (Side note, but sort of related to #1 above: non-x86 platforms have to indirect through a function pointer even
thoughthey have no fast implementation to make it worth their while. It would be better for them if the "slow"
implementationwas called static inline or at least a direct function call, but that's a separate thread.) 

+1

I haven't looked into whether we could benefit from it in real use
cases, but it seems like it'd also be nice if pg_popcount() were a
candidate for auto-vectorisation and inlining.  For example, NEON has
vector popcount, and for Intel/AMD there is a shuffle-based AVX2 trick
that at least Clang produces automatically[1].  We're obstructing that
by doing function dispatch at individual word level, and using inline
assembler instead of builtins.

[1] https://arxiv.org/abs/1611.07612

Re: Use POPCNT on MSVC

From

David Rowley

Date:

04 August 2021, 06:36:00

On Tue, 3 Aug 2021 at 22:43, John Naylor <john.naylor@enterprisedb.com> wrote:
> 1. the __popcnt64() intrinsic is put inside pg_popcount64_asm(), which is a bit of a misnomer since it's not
assembly.Renaming s/_asm/_fast/ would help it look better. But then looking around, other platforms have intrinsics
coded,but for some reason they're put in pg_popcount64_slow(), where the compiler will emit instructions we could
easilywrite ourselves in C (and without #ifdefs) since without the right CFLAGS these intrinsics won't emit a popcnt
instruction.Is MSVC different in that regard? If so, it might be worth a comment. 

Yeah, the names no longer fit so well after stuffing the MSVC
intrinsic into the asm function.  The reason I did it that way is down
to what I read in the docs. Namely:

"If you run code that uses these intrinsics on hardware that doesn't
support the popcnt instruction, the results are unpredictable."

So, it has to go somewhere where we're sure the CPU supports POPCNT
and that seemed like the correct place.

From what I understand of GCC and __builtin_popcountl(), the code
it'll output will depend on the -mpopcnt flag.  So having
__builtin_popcountl() does not mean the popcnt instruction will be
used. The Microsoft documentation indicates that's not the case for
__popcnt().

> 2. (defined(_MSC_VER) && defined(_WIN64)  lead to a runtime check for the CPUID, which is fine, but now next to it
HAVE_X86_64_POPCNTQlooks strange because the latter symbol comes from a specific configure test -- the two don't seem
equivalent,but maybe they are because of what MSVC does? That would be nice to spell out here. 

The problem there is that we define HAVE_X86_64_POPCNTQ based on the
outcome of configure so it does not get set for MSVC.  Maybe it's
worth coming up with some other constant that can be defined or we
could just do:

#if defined(_MSC_VER) && defined(_WIN64)
#define HAVE_X86_64_POPCNTQ
#endif

> I know the present state of affairs works a bit different than what you proposed a couple years ago and that
somethinghad to change for portability reasons I didn't quite understand, but I think if we change things here we
shouldat least try to have things fit together a bit more nicely. 

I think the reason for the asm is that __builtin_popcountl does not
mean popcnt will be used. Maybe we could have done something like
compile pg_bitutils.c with -mpopcnt, and have kept the run-time check,
but the problem there is that means that the compiler might end up
using that instruction in some other function in that file that we
don't want it to. It looks like my patch in [1] did pass the -mpopcnt
flag which Tom fixed.

> (Side note, but sort of related to #1 above: non-x86 platforms have to indirect through a function pointer even
thoughthey have no fast implementation to make it worth their while. It would be better for them if the "slow"
implementationwas called static inline or at least a direct function call, but that's a separate thread.) 

hmm yeah. I see there's a few more usages of pg_popcount() than when I
looked last.  It would be fairly easy to get rid of the
pg_popcount64/pg_popcount32 function call in that. A separate patch
though.

David

[1] https://www.postgresql.org/message-id/CAKJS1f9WTAGG1tPeJnD18hiQW5gAk59fQ6WK-vfdAKEHyRg2RA@mail.gmail.com

Re: Use POPCNT on MSVC

From

John Naylor

Date:

04 August 2021, 22:01:50

On Tue, Aug 3, 2021 at 11:36 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Tue, 3 Aug 2021 at 22:43, John Naylor <john.naylor@enterprisedb.com> wrote:
> > 1. the __popcnt64() intrinsic is put inside pg_popcount64_asm(), which is a bit of a misnomer since it's not assembly. Renaming s/_asm/_fast/ would help it look better. But then looking around, other platforms have intrinsics coded, but for some reason they're put in pg_popcount64_slow(), where the compiler will emit instructions we could easily write ourselves in C (and without #ifdefs) since without the right CFLAGS these intrinsics won't emit a popcnt instruction. Is MSVC different in that regard? If so, it might be worth a comment.
>
> Yeah, the names no longer fit so well after stuffing the MSVC
> intrinsic into the asm function. The reason I did it that way is down
> to what I read in the docs. Namely:
>
> "If you run code that uses these intrinsics on hardware that doesn't
> support the popcnt instruction, the results are unpredictable."
>
> So, it has to go somewhere where we're sure the CPU supports POPCNT
> and that seemed like the correct place.
>
> From what I understand of GCC and __builtin_popcountl(), the code
> it'll output will depend on the -mpopcnt flag. So having
> __builtin_popcountl() does not mean the popcnt instruction will be
> used. The Microsoft documentation indicates that's not the case for
> __popcnt().

Okay, "unpredictable" sounds bad.

> > 2. (defined(_MSC_VER) && defined(_WIN64) lead to a runtime check for the CPUID, which is fine, but now next to it HAVE_X86_64_POPCNTQ looks strange because the latter symbol comes from a specific configure test -- the two don't seem equivalent, but maybe they are because of what MSVC does? That would be nice to spell out here.
>
> The problem there is that we define HAVE_X86_64_POPCNTQ based on the
> outcome of configure so it does not get set for MSVC. Maybe it's
> worth coming up with some other constant that can be defined or we
> could just do:
>
> #if defined(_MSC_VER) && defined(_WIN64)
> #define HAVE_X86_64_POPCNTQ
> #endif

That seems fine. I don't know PG can build with Arm on Windows, but for the cpuid to work, it seems safer to also check for __x86_64?

> I think the reason for the asm is that __builtin_popcountl does not
> mean popcnt will be used. Maybe we could have done something like
> compile pg_bitutils.c with -mpopcnt, and have kept the run-time check,
> but the problem there is that means that the compiler might end up
> using that instruction in some other function in that file that we
> don't want it to. It looks like my patch in [1] did pass the -mpopcnt
> flag which Tom fixed.

Ah, that makes sense. (If we someday offer a configure option for x86-64-v2, we can use intrinsics in the asm functions, and call them directly. Yet another different thread.)

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Use POPCNT on MSVC

From

David Rowley

Date:

09 August 2021, 03:31:16

On Thu, 5 Aug 2021 at 07:02, John Naylor <john.naylor@enterprisedb.com> wrote:
> > #if defined(_MSC_VER) && defined(_WIN64)
> > #define HAVE_X86_64_POPCNTQ
> > #endif
>
> That seems fine. I don't know PG can build with Arm on Windows, but for the cpuid to work, it seems safer to also
checkfor __x86_64?

That's a good point. I've adjusted it to do #if defined(_MSC_VER) &&
defined(_M_AMD64).

I've attached a v2 patch which I think is more along the lines of what
you had in mind.

David

Attachment

use_popcnt_on_msvc_v2.patch

Re: Use POPCNT on MSVC

From

John Naylor

Date:

09 August 2021, 03:58:07

On Sun, Aug 8, 2021 at 8:31 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> I've attached a v2 patch which I think is more along the lines of what
> you had in mind.

LGTM

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Use POPCNT on MSVC

From

David Rowley

Date:

09 August 2021, 06:24:56

On Mon, 9 Aug 2021 at 12:58, John Naylor <john.naylor@enterprisedb.com> wrote:
>
> On Sun, Aug 8, 2021 at 8:31 PM David Rowley <dgrowleyml@gmail.com> wrote:
> >
> > I've attached a v2 patch which I think is more along the lines of what
> > you had in mind.
>
> LGTM

Thanks for the review.

Pushed.

David