Re: Popcount optimization using AVX512 - Mailing list pgsql-hackers
From | Nathan Bossart |
---|---|
Subject | Re: Popcount optimization using AVX512 |
Date | |
Msg-id | 20240301214457.GA3024365@nathanxps13 Whole thread Raw |
In response to | RE: Popcount optimization using AVX512 ("Amonson, Paul D" <paul.d.amonson@intel.com>) |
Responses |
RE: Popcount optimization using AVX512
|
List | pgsql-hackers |
Thanks for the new version of the patch. I didn't see a commitfest entry for this one, and unfortunately I think it's too late to add it for the March commitfest. I would encourage you to add it to July's commitfest [0] so that we can get some routine cfbot coverage. On Tue, Feb 27, 2024 at 08:46:06PM +0000, Amonson, Paul D wrote: > After consulting some Intel internal experts on MSVC the linking issue as > it stood was not resolved. Instead, I created a MSVC ONLY work-around. > This adds one extra functional call on the Windows builds (The linker > resolves a real function just fine but not a function pointer of the same > name). This extra latency does not exist on any of the other platforms. I > also believe I addressed all issues raised in the previous reviews. The > new pg_popcnt_x86_64_accel.c file is now the ONLY file compiled with the > AVX512 compiler flags. I added support for the MSVC compiler flag as > well. Both meson and autoconf are updated with the new refactor. > > I am attaching the new patch. I think this patch might be missing the new files. -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) +#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31)) IME this means that the autoconf you are using has been patched. A quick search on the mailing lists seems to indicate that it might be specific to Debian [1]. -static int pg_popcount32_slow(uint32 word); -static int pg_popcount64_slow(uint64 word); +int pg_popcount32_slow(uint32 word); +int pg_popcount64_slow(uint64 word); +uint64 pg_popcount_slow(const char *buf, int bytes); This patch appears to do a lot of refactoring. Would it be possible to break out the refactoring parts into a prerequisite patch that could be reviewed and committed independently from the AVX512 stuff? -#if SIZEOF_VOID_P >= 8 +#if SIZEOF_VOID_P == 8 /* Process in 64-bit chunks if the buffer is aligned. */ - if (buf == (const char *) TYPEALIGN(8, buf)) + if (buf == (const char *)TYPEALIGN(8, buf)) { - const uint64 *words = (const uint64 *) buf; + const uint64 *words = (const uint64 *)buf; while (bytes >= 8) { @@ -309,9 +213,9 @@ pg_popcount(const char *buf, int bytes) bytes -= 8; } - buf = (const char *) words; + buf = (const char *)words; } -#else +#elif SIZEOF_VOID_P == 4 /* Process in 32-bit chunks if the buffer is aligned. */ if (buf == (const char *) TYPEALIGN(4, buf)) { Most, if not all, of these changes seem extraneous. Do we actually need to more strictly check SIZEOF_VOID_P? [0] https://commitfest.postgresql.org/48/ [1] https://postgr.es/m/20230211020042.uthdgj72kp3xlqam%40awork3.anarazel.de -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: