Thread: Re: Use __attribute__((target(sse4.2))) for SSE42 CRC32C
+ __attribute__((target("sse4.2"))) These need to be surrounded with #if defined(__has_attribute) && __has_attribute (target) so that we still attempt the check on compilers that don't support it (e.g., MSVC). # Check for Intel SSE 4.2 intrinsics to do CRC calculations. # -# First check if the _mm_crc32_u8 and _mm_crc32_u64 intrinsics can be used -# with the default compiler flags. If not, check if adding the -msse4.2 -# flag helps. CFLAGS_CRC is set to -msse4.2 if that's required. +# Check if the _mm_crc32_u8 and _mm_crc32_u64 intrinsics can be used +# with the __attribute__((target("sse4.2"))). PGAC_SSE42_CRC32_INTRINSICS([]) -if test x"$pgac_sse42_crc32_intrinsics" != x"yes"; then - PGAC_SSE42_CRC32_INTRINSICS([-msse4.2]) -fi IIUC this means we will never set USE_SSE42_CRC32C_WITH_RUNTIME_CHECK. To maintain the existing behavior, I think we need to still perform two configure tests (one with __attribute__((target(...))) and another without), and then we can choose whether to set USE_SSE42_CRC32C or USE_SSE42_CRC32C_WITH_RUNTIME_CHECK based on the results. + 'pg_crc32c_sse42_choose.c', + 'pg_crc32c_sse42.c', Can we combine these? -- nathan
> + __attribute__((target("sse4.2"))) > > These need to be surrounded with > > #if defined(__has_attribute) && __has_attribute (target) > > so that we still attempt the check on compilers that don't support it (e.g., MSVC). I assumed configure was only used on linux and ignored this check. Doesn't hurt to add this I suppose. > > # Check for Intel SSE 4.2 intrinsics to do CRC calculations. > # > -# First check if the _mm_crc32_u8 and _mm_crc32_u64 intrinsics can be used -# > with the default compiler flags. If not, check if adding the -msse4.2 -# flag helps. > CFLAGS_CRC is set to -msse4.2 if that's required. > +# Check if the _mm_crc32_u8 and _mm_crc32_u64 intrinsics can be used # > +with the __attribute__((target("sse4.2"))). > PGAC_SSE42_CRC32_INTRINSICS([]) > -if test x"$pgac_sse42_crc32_intrinsics" != x"yes"; then > - PGAC_SSE42_CRC32_INTRINSICS([-msse4.2]) > -fi > > IIUC this means we will never set USE_SSE42_CRC32C_WITH_RUNTIME_CHECK. I don't think so. USE_SSE42_CRC32C additionally requires SSE4_2_TARGETED to be true which will only happen when explicitlybuilt with -msse4.2. When this explicit compiler flag is missing, we set USE_SSE42_CRC32C_WITH_RUNTIME_CHECK totrue. This logic is further down in the configure.ac file: if test x"$pgac_sse42_crc32_intrinsics" = x"yes" && test x"$SSE4_2_TARGETED" = x"1" ; then USE_SSE42_CRC32C=1 else > To maintain the existing behavior, I think we need to still perform two configure > tests (one with __attribute__((target(...))) and another without), and then we can > choose whether to set USE_SSE42_CRC32C or > USE_SSE42_CRC32C_WITH_RUNTIME_CHECK based on the results. > > + 'pg_crc32c_sse42_choose.c', > + 'pg_crc32c_sse42.c', > > Can we combine these? Knowing the AVX-512 will come next, I think it makes sense to keep the runtime choose function separate. Otherwise this getspolluted with runtime choose function, sse42 algorithm and the avx512 algorithm in the next patch. Does that make sense? Raghuveer
On Thu, Nov 07, 2024 at 09:30:32PM +0000, Devulapalli, Raghuveer wrote: >> # Check for Intel SSE 4.2 intrinsics to do CRC calculations. >> # >> -# First check if the _mm_crc32_u8 and _mm_crc32_u64 intrinsics can be used -# >> with the default compiler flags. If not, check if adding the -msse4.2 -# flag helps. >> CFLAGS_CRC is set to -msse4.2 if that's required. >> +# Check if the _mm_crc32_u8 and _mm_crc32_u64 intrinsics can be used # >> +with the __attribute__((target("sse4.2"))). >> PGAC_SSE42_CRC32_INTRINSICS([]) >> -if test x"$pgac_sse42_crc32_intrinsics" != x"yes"; then >> - PGAC_SSE42_CRC32_INTRINSICS([-msse4.2]) >> -fi >> >> IIUC this means we will never set USE_SSE42_CRC32C_WITH_RUNTIME_CHECK. > > I don't think so. USE_SSE42_CRC32C additionally requires SSE4_2_TARGETED > to be true which will only happen when explicitly built with -msse4.2. > When this explicit compiler flag is missing, we set > USE_SSE42_CRC32C_WITH_RUNTIME_CHECK to true. This logic is further down > in the configure.ac file: > > if test x"$pgac_sse42_crc32_intrinsics" = x"yes" && test x"$SSE4_2_TARGETED" = x"1" ; then > USE_SSE42_CRC32C=1 > else Oh, you are right, sorry. >> + 'pg_crc32c_sse42_choose.c', >> + 'pg_crc32c_sse42.c', >> >> Can we combine these? > > Knowing the AVX-512 will come next, I think it makes sense to keep the > runtime choose function separate. Otherwise this gets polluted with > runtime choose function, sse42 algorithm and the avx512 algorithm in the > next patch. Does that make sense?? Is the idea that we will put both "choose" functions in one file and the actual CRC-32C code in another? I'm okay with that. -- nathan
- prog = ''' + sse42_crc_prog = ''' nitpick: Why does this need to be renamed? +#ifdef TEST_SSE42_WITH_ATTRIBUTE +#if defined(__has_attribute) && __has_attribute (target) +__attribute__((target("sse4.2"))) +#endif +#endif So, for meson builds, we do test with and without __attribute___((target(..."))), but for autoconf builds, we check for __SSE4_2__ to determine whether we need a runtime check. This difference isn't the fault of your patch, but it's a little odd. That being said, I'm not sure there's a problem with either approach. +if host_cpu == 'x86' or host_cpu == 'x86_64' + pgport_sources += files( + 'pg_crc32c_sse42_choose.c', + 'pg_crc32c_sse42.c', + ) +endi We could probably just build these unconditionally (i.e., put them in the first list of pgport_sources in this file) instead of keeping this complexity in the build scripts. +#if defined(USE_SSE42_CRC32C) || defined(USE_SSE42_CRC32C_WITH_RUNTIME_CHECK) Can we move this to just below #include "c.h"? The reason I didn't do this for the AVX-512 stuff is because TRY_POPCNT_FAST is defined in pg_bitutils.h, but looking again, I may be able to at least move the check for USE_AVX512_POPCNT_WITH_RUNTIME_CHECK up similarly. -- nathan
> - prog = ''' > + sse42_crc_prog = ''' > > nitpick: Why does this need to be renamed? Doesn't need to be renamed, but just a better name to describe that the code is meant for. It will be followed up with avx512_crc_progin the next patch. > > +#ifdef TEST_SSE42_WITH_ATTRIBUTE > +#if defined(__has_attribute) && __has_attribute (target) > +__attribute__((target("sse4.2"))) > +#endif > +#endif > > So, for meson builds, we do test with and without __attribute___((target(..."))), > but for autoconf builds, we check for __SSE4_2__ to determine whether we need > a runtime check. This difference isn't the fault of your patch, but it's a little odd. > That being said, I'm not sure there's a problem with either approach. I just realized, isn't this a problem on MSVC? When building with MSVC, USE_SSE42_CRC32C is always set to true (because MSVCdoesn't require a specific SSE42 flag to build with): see https://gcc.godbolt.org/z/eoc1Ec33j. This means it is alwaysrunning the SSE42 without a runtime check which can technically SEGILL on a really old CPU (SSE42 is nearly 18 yearsold though). This problem exists in the master branch too. > > +if host_cpu == 'x86' or host_cpu == 'x86_64' > + pgport_sources += files( > + 'pg_crc32c_sse42_choose.c', > + 'pg_crc32c_sse42.c', > + ) > +endi > > We could probably just build these unconditionally (i.e., put them in the first list of > pgport_sources in this file) instead of keeping this complexity in the build scripts. Sure. > +#if defined(USE_SSE42_CRC32C) || > +defined(USE_SSE42_CRC32C_WITH_RUNTIME_CHECK) > > Can we move this to just below #include "c.h"? Sounds good. Raghuveer
On Fri, Nov 08, 2024 at 05:52:23PM +0000, Devulapalli, Raghuveer wrote: >> So, for meson builds, we do test with and without __attribute___((target(..."))), >> but for autoconf builds, we check for __SSE4_2__ to determine whether we need >> a runtime check. This difference isn't the fault of your patch, but it's a little odd. >> That being said, I'm not sure there's a problem with either approach. > > I just realized, isn't this a problem on MSVC? When building with MSVC, > USE_SSE42_CRC32C is always set to true (because MSVC doesn't require a > specific SSE42 flag to build with): see > https://gcc.godbolt.org/z/eoc1Ec33j. This means it is always running the > SSE42 without a runtime check which can technically SEGILL on a really > old CPU (SSE42 is nearly 18 years old though). This problem exists in the > master branch too. I believe we expect MSVC builds to use meson at this point, which is probably why there's this extra check: if cc.get_id() == 'msvc' cdata.set('USE_SSE42_CRC32C', false) cdata.set('USE_SSE42_CRC32C_WITH_RUNTIME_CHECK', 1) have_optimized_crc = true There used to be special MSVC scripts (removed in commit 1301c80), and it looks like those just set USE_SSE42_CRC32C_WITH_RUNTIME_CHECK unconditionally, too. -- nathan
> I believe we expect MSVC builds to use meson at this point, which is probably > why there's this extra check: > > if cc.get_id() == 'msvc' > cdata.set('USE_SSE42_CRC32C', false) > cdata.set('USE_SSE42_CRC32C_WITH_RUNTIME_CHECK', 1) > have_optimized_crc = true > Ah, I missed this. This makes sense.
Anymore feedback on this patch? Hoping this is a straightforward one. Raghuveer > -----Original Message----- > From: Devulapalli, Raghuveer <raghuveer.devulapalli@intel.com> > Sent: Friday, November 8, 2024 11:05 AM > To: Devulapalli, Raghuveer <raghuveer.devulapalli@intel.com>; Nathan Bossart > <nathandbossart@gmail.com> > Cc: pgsql-hackers@lists.postgresql.org; Shankaran, Akash > <akash.shankaran@intel.com> > Subject: RE: Use __attribute__((target(sse4.2))) for SSE42 CRC32C > > V3: With the suggested changes. > > Raghuveer > > > -----Original Message----- > > From: Devulapalli, Raghuveer <raghuveer.devulapalli@intel.com> > > Sent: Friday, November 8, 2024 10:43 AM > > To: Nathan Bossart <nathandbossart@gmail.com> > > Cc: pgsql-hackers@lists.postgresql.org; Shankaran, Akash > > <akash.shankaran@intel.com> > > Subject: RE: Use __attribute__((target(sse4.2))) for SSE42 CRC32C > > > > > I believe we expect MSVC builds to use meson at this point, which is > > > probably why there's this extra check: > > > > > > if cc.get_id() == 'msvc' > > > cdata.set('USE_SSE42_CRC32C', false) > > > cdata.set('USE_SSE42_CRC32C_WITH_RUNTIME_CHECK', 1) > > > have_optimized_crc = true > > > > > > > Ah, I missed this. This makes sense. > >
On Wed, Nov 20, 2024 at 05:37:33PM +0000, Devulapalli, Raghuveer wrote: > Anymore feedback on this patch? Hoping this is a straightforward one. I think it is in pretty good shape. After a read-through, the only thing that stands out to me is the difference in configure tests between autoconf and meson. I think we should consider picking one approach, although I'm not sure I have a strong preference. -- nathan
On Thu, Nov 21, 2024 at 11:07:59PM +0000, Devulapalli, Raghuveer wrote: > Thanks for the review! We can actually leverage meson's built in option > to check for a macro: cc.get_define('__SSE4_2__') != ''. This should keep > the logic consistent across configure and meson. I think we should still use the test program even when __SSE4_2__ is defined, but we can use that macro to determine whether to use a runtime check. I think that would keep autoconf and meson consistent. -- nathan
On Fri, Nov 22, 2024 at 09:00:01PM +0000, Devulapalli, Raghuveer wrote: > Sure. Updated patch. Thanks. I think my only remaining feedback is that we should probably add a comment to explain why we aren't doing this for ARM yet [0]. [0] https://postgr.es/m/ZwXsE0KG_wh6_heU%40nathan -- nathan
> > Thanks. I think my only remaining feedback is that we should probably add a > comment to explain why we aren't doing this for ARM yet [0]. Sounds good. Where would you like me to add this comment? Meson.build and configure?
On Mon, Nov 25, 2024 at 11:28:52PM +0000, Devulapalli, Raghuveer wrote: >> Thanks. I think my only remaining feedback is that we should probably add a >> comment to explain why we aren't doing this for ARM yet [0]. > > Sounds good. Where would you like me to add this comment? Meson.build and > configure? I'll take care of it. -- nathan
> Here's what I have staged for commit (except for the commit message, which > needs some work). LGTM. Raghuveer
On Wed, Nov 27, 2024 at 05:09:17PM +0000, Devulapalli, Raghuveer wrote: > LGTM. Committed. -- nathan