Thread: Re: Use __attribute__((target(sse4.2))) for SSE42 CRC32C

Re: Use __attribute__((target(sse4.2))) for SSE42 CRC32C

From
Nathan Bossart
Date:
+    __attribute__((target("sse4.2")))

These need to be surrounded with

    #if defined(__has_attribute) && __has_attribute (target)

so that we still attempt the check on compilers that don't support it
(e.g., MSVC).

 # Check for Intel SSE 4.2 intrinsics to do CRC calculations.
 #
-# First check if the _mm_crc32_u8 and _mm_crc32_u64 intrinsics can be used
-# with the default compiler flags. If not, check if adding the -msse4.2
-# flag helps. CFLAGS_CRC is set to -msse4.2 if that's required.
+# Check if the _mm_crc32_u8 and _mm_crc32_u64 intrinsics can be used
+# with the __attribute__((target("sse4.2"))).
 PGAC_SSE42_CRC32_INTRINSICS([])
-if test x"$pgac_sse42_crc32_intrinsics" != x"yes"; then
-  PGAC_SSE42_CRC32_INTRINSICS([-msse4.2])
-fi

IIUC this means we will never set USE_SSE42_CRC32C_WITH_RUNTIME_CHECK.  To
maintain the existing behavior, I think we need to still perform two
configure tests (one with __attribute__((target(...))) and another
without), and then we can choose whether to set USE_SSE42_CRC32C or
USE_SSE42_CRC32C_WITH_RUNTIME_CHECK based on the results.

+  'pg_crc32c_sse42_choose.c',
+  'pg_crc32c_sse42.c',

Can we combine these?

-- 
nathan



RE: Use __attribute__((target(sse4.2))) for SSE42 CRC32C

From
"Devulapalli, Raghuveer"
Date:
> +    __attribute__((target("sse4.2")))
>
> These need to be surrounded with
>
>     #if defined(__has_attribute) && __has_attribute (target)
>
> so that we still attempt the check on compilers that don't support it (e.g., MSVC).

I assumed configure was only used on linux and ignored this check. Doesn't hurt to add this I suppose.

>
>  # Check for Intel SSE 4.2 intrinsics to do CRC calculations.
>  #
> -# First check if the _mm_crc32_u8 and _mm_crc32_u64 intrinsics can be used -#
> with the default compiler flags. If not, check if adding the -msse4.2 -# flag helps.
> CFLAGS_CRC is set to -msse4.2 if that's required.
> +# Check if the _mm_crc32_u8 and _mm_crc32_u64 intrinsics can be used #
> +with the __attribute__((target("sse4.2"))).
>  PGAC_SSE42_CRC32_INTRINSICS([])
> -if test x"$pgac_sse42_crc32_intrinsics" != x"yes"; then
> -  PGAC_SSE42_CRC32_INTRINSICS([-msse4.2])
> -fi
>
> IIUC this means we will never set USE_SSE42_CRC32C_WITH_RUNTIME_CHECK.

I don't think so. USE_SSE42_CRC32C additionally requires SSE4_2_TARGETED to be true which will only happen when
explicitlybuilt with -msse4.2. When this explicit compiler flag is missing, we set USE_SSE42_CRC32C_WITH_RUNTIME_CHECK
totrue. This logic is further down in  the configure.ac file: 

if test x"$pgac_sse42_crc32_intrinsics" = x"yes" && test x"$SSE4_2_TARGETED" = x"1" ; then
      USE_SSE42_CRC32C=1
    else

> To maintain the existing behavior, I think we need to still perform two configure
> tests (one with __attribute__((target(...))) and another without), and then we can
> choose whether to set USE_SSE42_CRC32C or
> USE_SSE42_CRC32C_WITH_RUNTIME_CHECK based on the results.
>
> +  'pg_crc32c_sse42_choose.c',
> +  'pg_crc32c_sse42.c',
>
> Can we combine these?

Knowing the AVX-512 will come next, I think it makes sense to keep the runtime choose function separate. Otherwise this
getspolluted with runtime choose function, sse42 algorithm and the avx512 algorithm in the next patch. Does that make
sense? 

Raghuveer



Re: Use __attribute__((target(sse4.2))) for SSE42 CRC32C

From
Nathan Bossart
Date:
On Thu, Nov 07, 2024 at 09:30:32PM +0000, Devulapalli, Raghuveer wrote:
>>  # Check for Intel SSE 4.2 intrinsics to do CRC calculations.
>>  #
>> -# First check if the _mm_crc32_u8 and _mm_crc32_u64 intrinsics can be used -#
>> with the default compiler flags. If not, check if adding the -msse4.2 -# flag helps.
>> CFLAGS_CRC is set to -msse4.2 if that's required.
>> +# Check if the _mm_crc32_u8 and _mm_crc32_u64 intrinsics can be used #
>> +with the __attribute__((target("sse4.2"))).
>>  PGAC_SSE42_CRC32_INTRINSICS([])
>> -if test x"$pgac_sse42_crc32_intrinsics" != x"yes"; then
>> -  PGAC_SSE42_CRC32_INTRINSICS([-msse4.2])
>> -fi
>> 
>> IIUC this means we will never set USE_SSE42_CRC32C_WITH_RUNTIME_CHECK.
> 
> I don't think so. USE_SSE42_CRC32C additionally requires SSE4_2_TARGETED
> to be true which will only happen when explicitly built with -msse4.2.
> When this explicit compiler flag is missing, we set
> USE_SSE42_CRC32C_WITH_RUNTIME_CHECK to true. This logic is further down
> in  the configure.ac file:
> 
> if test x"$pgac_sse42_crc32_intrinsics" = x"yes" && test x"$SSE4_2_TARGETED" = x"1" ; then
>       USE_SSE42_CRC32C=1
>     else

Oh, you are right, sorry.

>> +  'pg_crc32c_sse42_choose.c',
>> +  'pg_crc32c_sse42.c',
>> 
>> Can we combine these?
> 
> Knowing the AVX-512 will come next, I think it makes sense to keep the
> runtime choose function separate. Otherwise this gets polluted with
> runtime choose function, sse42 algorithm and the avx512 algorithm in the
> next patch. Does that make sense??

Is the idea that we will put both "choose" functions in one file and the
actual CRC-32C code in another?  I'm okay with that.

-- 
nathan



Re: Use __attribute__((target(sse4.2))) for SSE42 CRC32C

From
Nathan Bossart
Date:
-    prog = '''
+    sse42_crc_prog = '''

nitpick: Why does this need to be renamed?

+#ifdef TEST_SSE42_WITH_ATTRIBUTE
+#if defined(__has_attribute) && __has_attribute (target)
+__attribute__((target("sse4.2")))
+#endif
+#endif

So, for meson builds, we do test with and without
__attribute___((target(..."))), but for autoconf builds, we check for
__SSE4_2__ to determine whether we need a runtime check.  This difference
isn't the fault of your patch, but it's a little odd.  That being said, I'm
not sure there's a problem with either approach.

+if host_cpu == 'x86' or host_cpu == 'x86_64'
+  pgport_sources += files(
+  'pg_crc32c_sse42_choose.c',
+  'pg_crc32c_sse42.c',
+    )
+endi

We could probably just build these unconditionally (i.e., put them in the
first list of pgport_sources in this file) instead of keeping this
complexity in the build scripts.

+#if defined(USE_SSE42_CRC32C) || defined(USE_SSE42_CRC32C_WITH_RUNTIME_CHECK)

Can we move this to just below #include "c.h"?  The reason I didn't do this
for the AVX-512 stuff is because TRY_POPCNT_FAST is defined in
pg_bitutils.h, but looking again, I may be able to at least move the check
for USE_AVX512_POPCNT_WITH_RUNTIME_CHECK up similarly.

-- 
nathan



RE: Use __attribute__((target(sse4.2))) for SSE42 CRC32C

From
"Devulapalli, Raghuveer"
Date:
> -    prog = '''
> +    sse42_crc_prog = '''
>
> nitpick: Why does this need to be renamed?

Doesn't need to be renamed, but just a better name to describe that the code is meant for. It will be followed up with
avx512_crc_progin the next patch.  

>
> +#ifdef TEST_SSE42_WITH_ATTRIBUTE
> +#if defined(__has_attribute) && __has_attribute (target)
> +__attribute__((target("sse4.2")))
> +#endif
> +#endif
>
> So, for meson builds, we do test with and without __attribute___((target(..."))),
> but for autoconf builds, we check for __SSE4_2__ to determine whether we need
> a runtime check.  This difference isn't the fault of your patch, but it's a little odd.
> That being said, I'm not sure there's a problem with either approach.

I just realized, isn't this a problem on MSVC? When building with MSVC, USE_SSE42_CRC32C is always set to true (because
MSVCdoesn't require a specific SSE42 flag to build with): see https://gcc.godbolt.org/z/eoc1Ec33j. This means it is
alwaysrunning the SSE42 without a runtime check which can technically SEGILL on a really old CPU (SSE42 is nearly 18
yearsold though). This problem exists in the master branch too.  

>
> +if host_cpu == 'x86' or host_cpu == 'x86_64'
> +  pgport_sources += files(
> +  'pg_crc32c_sse42_choose.c',
> +  'pg_crc32c_sse42.c',
> +    )
> +endi
>
> We could probably just build these unconditionally (i.e., put them in the first list of
> pgport_sources in this file) instead of keeping this complexity in the build scripts.

Sure.

> +#if defined(USE_SSE42_CRC32C) ||
> +defined(USE_SSE42_CRC32C_WITH_RUNTIME_CHECK)
>
> Can we move this to just below #include "c.h"?

Sounds good.

Raghuveer



Re: Use __attribute__((target(sse4.2))) for SSE42 CRC32C

From
Nathan Bossart
Date:
On Fri, Nov 08, 2024 at 05:52:23PM +0000, Devulapalli, Raghuveer wrote:
>> So, for meson builds, we do test with and without __attribute___((target(..."))),
>> but for autoconf builds, we check for __SSE4_2__ to determine whether we need
>> a runtime check.  This difference isn't the fault of your patch, but it's a little odd.
>> That being said, I'm not sure there's a problem with either approach.
> 
> I just realized, isn't this a problem on MSVC? When building with MSVC,
> USE_SSE42_CRC32C is always set to true (because MSVC doesn't require a
> specific SSE42 flag to build with): see
> https://gcc.godbolt.org/z/eoc1Ec33j. This means it is always running the
> SSE42 without a runtime check which can technically SEGILL on a really
> old CPU (SSE42 is nearly 18 years old though). This problem exists in the
> master branch too. 

I believe we expect MSVC builds to use meson at this point, which is
probably why there's this extra check:

  if cc.get_id() == 'msvc'
    cdata.set('USE_SSE42_CRC32C', false)
    cdata.set('USE_SSE42_CRC32C_WITH_RUNTIME_CHECK', 1)
    have_optimized_crc = true

There used to be special MSVC scripts (removed in commit 1301c80), and it
looks like those just set USE_SSE42_CRC32C_WITH_RUNTIME_CHECK
unconditionally, too.

-- 
nathan



RE: Use __attribute__((target(sse4.2))) for SSE42 CRC32C

From
"Devulapalli, Raghuveer"
Date:
> I believe we expect MSVC builds to use meson at this point, which is probably
> why there's this extra check:
>
>   if cc.get_id() == 'msvc'
>     cdata.set('USE_SSE42_CRC32C', false)
>     cdata.set('USE_SSE42_CRC32C_WITH_RUNTIME_CHECK', 1)
>     have_optimized_crc = true
>

Ah, I missed this. This makes sense.



RE: Use __attribute__((target(sse4.2))) for SSE42 CRC32C

From
"Devulapalli, Raghuveer"
Date:
Anymore feedback on this patch? Hoping this is a straightforward one.

Raghuveer

> -----Original Message-----
> From: Devulapalli, Raghuveer <raghuveer.devulapalli@intel.com>
> Sent: Friday, November 8, 2024 11:05 AM
> To: Devulapalli, Raghuveer <raghuveer.devulapalli@intel.com>; Nathan Bossart
> <nathandbossart@gmail.com>
> Cc: pgsql-hackers@lists.postgresql.org; Shankaran, Akash
> <akash.shankaran@intel.com>
> Subject: RE: Use __attribute__((target(sse4.2))) for SSE42 CRC32C
>
> V3: With the suggested changes.
>
> Raghuveer
>
> > -----Original Message-----
> > From: Devulapalli, Raghuveer <raghuveer.devulapalli@intel.com>
> > Sent: Friday, November 8, 2024 10:43 AM
> > To: Nathan Bossart <nathandbossart@gmail.com>
> > Cc: pgsql-hackers@lists.postgresql.org; Shankaran, Akash
> > <akash.shankaran@intel.com>
> > Subject: RE: Use __attribute__((target(sse4.2))) for SSE42 CRC32C
> >
> > > I believe we expect MSVC builds to use meson at this point, which is
> > > probably why there's this extra check:
> > >
> > >   if cc.get_id() == 'msvc'
> > >     cdata.set('USE_SSE42_CRC32C', false)
> > >     cdata.set('USE_SSE42_CRC32C_WITH_RUNTIME_CHECK', 1)
> > >     have_optimized_crc = true
> > >
> >
> > Ah, I missed this. This makes sense.
> >




Re: Use __attribute__((target(sse4.2))) for SSE42 CRC32C

From
Nathan Bossart
Date:
On Wed, Nov 20, 2024 at 05:37:33PM +0000, Devulapalli, Raghuveer wrote:
> Anymore feedback on this patch? Hoping this is a straightforward one. 

I think it is in pretty good shape.  After a read-through, the only thing
that stands out to me is the difference in configure tests between autoconf
and meson.  I think we should consider picking one approach, although I'm
not sure I have a strong preference.

-- 
nathan



Re: Use __attribute__((target(sse4.2))) for SSE42 CRC32C

From
Nathan Bossart
Date:
On Thu, Nov 21, 2024 at 11:07:59PM +0000, Devulapalli, Raghuveer wrote:
> Thanks for the review! We can actually leverage meson's built in option
> to check for a macro: cc.get_define('__SSE4_2__') != ''. This should keep
> the logic consistent across configure and meson. 

I think we should still use the test program even when __SSE4_2__ is
defined, but we can use that macro to determine whether to use a runtime
check.  I think that would keep autoconf and meson consistent.

-- 
nathan



Re: Use __attribute__((target(sse4.2))) for SSE42 CRC32C

From
Nathan Bossart
Date:
On Fri, Nov 22, 2024 at 09:00:01PM +0000, Devulapalli, Raghuveer wrote:
> Sure. Updated patch.

Thanks.  I think my only remaining feedback is that we should probably add
a comment to explain why we aren't doing this for ARM yet [0].

[0] https://postgr.es/m/ZwXsE0KG_wh6_heU%40nathan

-- 
nathan



RE: Use __attribute__((target(sse4.2))) for SSE42 CRC32C

From
"Devulapalli, Raghuveer"
Date:
>
> Thanks.  I think my only remaining feedback is that we should probably add a
> comment to explain why we aren't doing this for ARM yet [0].

Sounds good. Where would you like me to add this comment? Meson.build and configure?



Re: Use __attribute__((target(sse4.2))) for SSE42 CRC32C

From
Nathan Bossart
Date:
On Mon, Nov 25, 2024 at 11:28:52PM +0000, Devulapalli, Raghuveer wrote:
>> Thanks.  I think my only remaining feedback is that we should probably add a
>> comment to explain why we aren't doing this for ARM yet [0].
> 
> Sounds good. Where would you like me to add this comment? Meson.build and
> configure?

I'll take care of it.

-- 
nathan



RE: Use __attribute__((target(sse4.2))) for SSE42 CRC32C

From
"Devulapalli, Raghuveer"
Date:
> Here's what I have staged for commit (except for the commit message, which
> needs some work).

LGTM.

Raghuveer



Re: Use __attribute__((target(sse4.2))) for SSE42 CRC32C

From
Nathan Bossart
Date:
On Wed, Nov 27, 2024 at 05:09:17PM +0000, Devulapalli, Raghuveer wrote:
> LGTM. 

Committed.

-- 
nathan