Thread: support for SSE2 intrinsics
Recently there have been several threads where the problem at hand lends itself to using SSE2 SIMD intrinsics. These are convenient because on 64-bit x86 the instructions are always present and so don't need a runtime check. To integrate them into our code base, we will need to take some measures for portability, but after looking around it seems fairly lightweight:
1. Compiler invocation and symbols
Since SSE2 is part of the AMD64 spec, gcc enables it always:
$ gcc -dM -E - < /dev/null | grep SSE | sort
$ gcc -dM -E -msse2 - < /dev/null | grep SSE | sort
#define __MMX_WITH_SSE__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
Passing -m32 discards the "MATH" macros but keeps the rest:
$ gcc -dM -E -m32 - < /dev/null | grep SSE | sort
#define __SSE__ 1
#define __SSE2__ 1
Clang behaves similarly.
MSVC doesn't define __SSE2__ (although it does define __AVX__ etc), but we can just test for _M_X64 or _M_AMD64 (they are equivalent according to [1], and we have both in our code base already). We could test for __SSE2__ for 32-bit gcc-alikes in the build farm, but I don't think that would tell us anything interesting, so we can just test for __x86_64__.
2. The intrinsics header
From Peter Cordes on StackOverflow [2]:
```
immintrin.h is portable across all compilers, and includes all Intel SIMD intrinsics, and some scalar extensions like BMI2 _pdep_u32. (For AMD SSE4a and XOP (Bulldozer-family only, dropped for Zen), you need to include a different header as well.)
The only reason I can think of for including <emmintrin.h> specifically would be if you're using MSVC and want to leave intrinsics undefined for ISA extensions you don't want to depend on.
```
It seems then that MSVC will compile intrinsics without prompting, so to be safe we'd need to take the latter advice and use <emmintrin.h>.
3. Support for SSE2 intrinsics
This seems to be well-nigh universal AFAICT and doesn't need to be tested for at configure time. A quick search doesn't turn up anything weird for Msys or Cygwin. From [2] again, gcc older than 4.4 can generate poor code, but there is no mention that correctness is a problem.
4. Helper functions
In a couple proposed patches, there has been some interest in abstracting some SIMD functionality into functions to hide implementation details away. I agree there are cases where that would help readability and avoid duplication.
Given all this, the anti-climax is: it seems we can start with something like src/include/port/simd.h with:
#if (defined(__x86_64__) || defined(_M_AMD64))
#include <emmintrin.h>
#define USE_SSE2
#endif
(plus a comment summarizing the above)
That we can include into other files, and would be the place to put helper functions. Thoughts?
[1] https://docs.microsoft.com/en-us/archive/blogs/reiley/macro-revisited
[2] https://stackoverflow.com/questions/56049110/including-the-correct-intrinsic-header
--
John Naylor
EDB: http://www.enterprisedb.com
1. Compiler invocation and symbols
Since SSE2 is part of the AMD64 spec, gcc enables it always:
$ gcc -dM -E - < /dev/null | grep SSE | sort
$ gcc -dM -E -msse2 - < /dev/null | grep SSE | sort
#define __MMX_WITH_SSE__ 1
#define __SSE__ 1
#define __SSE2__ 1
#define __SSE2_MATH__ 1
#define __SSE_MATH__ 1
Passing -m32 discards the "MATH" macros but keeps the rest:
$ gcc -dM -E -m32 - < /dev/null | grep SSE | sort
#define __SSE__ 1
#define __SSE2__ 1
Clang behaves similarly.
MSVC doesn't define __SSE2__ (although it does define __AVX__ etc), but we can just test for _M_X64 or _M_AMD64 (they are equivalent according to [1], and we have both in our code base already). We could test for __SSE2__ for 32-bit gcc-alikes in the build farm, but I don't think that would tell us anything interesting, so we can just test for __x86_64__.
2. The intrinsics header
From Peter Cordes on StackOverflow [2]:
```
immintrin.h is portable across all compilers, and includes all Intel SIMD intrinsics, and some scalar extensions like BMI2 _pdep_u32. (For AMD SSE4a and XOP (Bulldozer-family only, dropped for Zen), you need to include a different header as well.)
The only reason I can think of for including <emmintrin.h> specifically would be if you're using MSVC and want to leave intrinsics undefined for ISA extensions you don't want to depend on.
```
It seems then that MSVC will compile intrinsics without prompting, so to be safe we'd need to take the latter advice and use <emmintrin.h>.
3. Support for SSE2 intrinsics
This seems to be well-nigh universal AFAICT and doesn't need to be tested for at configure time. A quick search doesn't turn up anything weird for Msys or Cygwin. From [2] again, gcc older than 4.4 can generate poor code, but there is no mention that correctness is a problem.
4. Helper functions
In a couple proposed patches, there has been some interest in abstracting some SIMD functionality into functions to hide implementation details away. I agree there are cases where that would help readability and avoid duplication.
Given all this, the anti-climax is: it seems we can start with something like src/include/port/simd.h with:
#if (defined(__x86_64__) || defined(_M_AMD64))
#include <emmintrin.h>
#define USE_SSE2
#endif
(plus a comment summarizing the above)
That we can include into other files, and would be the place to put helper functions. Thoughts?
[1] https://docs.microsoft.com/en-us/archive/blogs/reiley/macro-revisited
[2] https://stackoverflow.com/questions/56049110/including-the-correct-intrinsic-header
--
John Naylor
EDB: http://www.enterprisedb.com
On Tue, Aug 02, 2022 at 05:22:52PM +0700, John Naylor wrote: > Given all this, the anti-climax is: it seems we can start with something > like src/include/port/simd.h with: > > #if (defined(__x86_64__) || defined(_M_AMD64)) > #include <emmintrin.h> > #define USE_SSE2 > #endif > > (plus a comment summarizing the above) > > That we can include into other files, and would be the place to put helper > functions. Thoughts? +1 I did a bit of cross-checking, and AFAICT this is a reasonable starting point. emmintrin.h appears to be sufficient for one of my patches that makes use of SSE2 instructions. That being said, I imagine it'll be especially important to keep an eye on the buildfarm when this change is committed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Tue, Aug 2, 2022 at 11:53 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> I did a bit of cross-checking, and AFAICT this is a reasonable starting
> point. emmintrin.h appears to be sufficient for one of my patches that
> makes use of SSE2 instructions. That being said, I imagine it'll be
> especially important to keep an eye on the buildfarm when this change is
> committed.
Thanks for checking! Here's a concrete patch for testing.
--
John Naylor
EDB: http://www.enterprisedb.com
Attachment
On Wed, Aug 03, 2022 at 12:00:39PM +0700, John Naylor wrote: > Thanks for checking! Here's a concrete patch for testing. LGTM -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Hi, On Wed, Aug 3, 2022 at 2:01 PM John Naylor <john.naylor@enterprisedb.com> wrote: > > > On Tue, Aug 2, 2022 at 11:53 PM Nathan Bossart <nathandbossart@gmail.com> wrote: > > I did a bit of cross-checking, and AFAICT this is a reasonable starting > > point. emmintrin.h appears to be sufficient for one of my patches that > > makes use of SSE2 instructions. That being said, I imagine it'll be > > especially important to keep an eye on the buildfarm when this change is > > committed. > > Thanks for checking! Here's a concrete patch for testing. I also think it's a good start. There is a typo in the commit message: s/hepler/helper/ The rest looks good to me. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Thu, Aug 4, 2022 at 12:38 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> I also think it's a good start. There is a typo in the commit message:
>
> s/hepler/helper/
>
> The rest looks good to me.
Fixed, and pushed, thanks to you both! I'll polish a small patch I have that actually uses this.
--
John Naylor
EDB: http://www.enterprisedb.com
>
> I also think it's a good start. There is a typo in the commit message:
>
> s/hepler/helper/
>
> The rest looks good to me.
Fixed, and pushed, thanks to you both! I'll polish a small patch I have that actually uses this.
--
John Naylor
EDB: http://www.enterprisedb.com