Re: Popcount optimization using AVX512 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Popcount optimization using AVX512 |
Date | |
Msg-id | 20240209183450.snllkk7erharc73y@awork3.anarazel.de Whole thread Raw |
In response to | RE: Popcount optimization using AVX512 ("Amonson, Paul D" <paul.d.amonson@intel.com>) |
Responses |
RE: Popcount optimization using AVX512
|
List | pgsql-hackers |
Hi, On 2024-02-09 17:39:46 +0000, Amonson, Paul D wrote: > diff --git a/meson.build b/meson.build > index 8ed51b6aae..1e7a4dc942 100644 > --- a/meson.build > +++ b/meson.build > @@ -1773,6 +1773,45 @@ elif cc.links(''' > endif > > > +# XXX: The configure.ac check for __cpuidex() is broken, we don't copy that > +# here. To prevent problems due to two detection methods working, stop > +# checking after one. This seems like a bogus copy-paste. > +if cc.links(''' > + #include <cpuid.h> > + int main(int arg, char **argv) > + { > + unsigned int exx[4] = {0, 0, 0, 0}; > + __get_cpuid_count(7, 0, &exx[0], &exx[1], &exx[2], &exx[3]); > + } > + ''', name: '__get_cpuid_count', > + args: test_c_args) > + cdata.set('HAVE__GET_CPUID_COUNT', 1) > +elif cc.links(''' > + #include <intrin.h> > + int main(int arg, char **argv) > + { > + unsigned int exx[4] = {0, 0, 0, 0}; > + __cpuidex(exx, 7, 0); > + } > + ''', name: '__cpuidex', > + args: test_c_args) > + cdata.set('HAVE__CPUIDEX', 1) > +endif > + > + > +# Check for header immintrin.h > +if cc.links(''' > + #include <immintrin.h> > + int main(int arg, char **argv) > + { > + return 1701; > + } > + ''', name: '__immintrin', > + args: test_c_args) > + cdata.set('HAVE__IMMINTRIN', 1) > +endif Do these all actually have to link? Invoking the linker is slow. I think you might be able to just use cc.has_header_symbol(). > +############################################################### > +# AVX 512 POPCNT Intrinsic check > +############################################################### > +have_avx512_popcnt = false > +cflags_avx512_popcnt = [] > +if host_cpu == 'x86_64' > + prog = ''' > + #include <immintrin.h> > + #include <stdint.h> > + void main(void) > + { > + __m512i tmp __attribute__((aligned(64))); > + __m512i input = _mm512_setzero_si512(); > + __m512i output = _mm512_popcnt_epi64(input); > + uint64_t cnt = 999; > + _mm512_store_si512(&tmp, output); > + cnt = _mm512_reduce_add_epi64(tmp); > + /* return computed value, to prevent the above being optimized away */ > + return cnt == 0; > + }''' Does this work with msvc? > + if cc.links(prog, name: '_mm512_setzero_si512, _mm512_popcnt_epi64, _mm512_store_si512, and _mm512_reduce_add_epi64with -mavx512vpopcntdq -mavx512f', That's a very long line in the output, how about using the avx feature name or something? > diff --git a/src/port/Makefile b/src/port/Makefile > index dcc8737e68..6a01a7d89a 100644 > --- a/src/port/Makefile > +++ b/src/port/Makefile > @@ -87,6 +87,11 @@ pg_crc32c_sse42.o: CFLAGS+=$(CFLAGS_CRC) > pg_crc32c_sse42_shlib.o: CFLAGS+=$(CFLAGS_CRC) > pg_crc32c_sse42_srv.o: CFLAGS+=$(CFLAGS_CRC) > > +# Newer Intel processors can use AVX-512 POPCNT Capabilities (01/30/2024) > +pg_bitutils.o: CFLAGS+=$(CFLAGS_AVX512_POPCNT) > +pg_bitutils_shlib.o: CFLAGS+=$(CFLAGS_AVX512_POPCNT) > +pg_bitutils_srv.o:CFLAGS+=$(CFLAGS_AVX512_POPCNT) > + > # all versions of pg_crc32c_armv8.o need CFLAGS_CRC > pg_crc32c_armv8.o: CFLAGS+=$(CFLAGS_CRC) > pg_crc32c_armv8_shlib.o: CFLAGS+=$(CFLAGS_CRC) > diff --git a/src/port/meson.build b/src/port/meson.build > index 69b30ab21b..1c48a3b07e 100644 > --- a/src/port/meson.build > +++ b/src/port/meson.build > @@ -184,6 +184,7 @@ foreach name, opts : pgport_variants > link_with: cflag_libs, > c_pch: pch_c_h, > kwargs: opts + { > + 'c_args': opts.get('c_args', []) + cflags_avx512_popcnt, > 'dependencies': opts['dependencies'] + [ssl], > } > ) This will build all of pgport with the avx flags, which wouldn't be correct, I think? The compiler might inject automatic uses of avx512 in places, which would cause problems, no? While you don't do the same for make, isn't even just using the avx512 for all of pg_bitutils.c broken for exactly that reson? That's why the existing code builds the files for various crc variants as their own file. Greetings, Andres Freund
pgsql-hackers by date: