RE: Popcount optimization using AVX512 - Mailing list pgsql-hackers
From | Amonson, Paul D |
---|---|
Subject | RE: Popcount optimization using AVX512 |
Date | |
Msg-id | BL1PR11MB5304BFE26BAC25624508CFD2DC232@BL1PR11MB5304.namprd11.prod.outlook.com Whole thread Raw |
In response to | Re: Popcount optimization using AVX512 (Nathan Bossart <nathandbossart@gmail.com>) |
Responses |
Re: Popcount optimization using AVX512
Re: Popcount optimization using AVX512 |
List | pgsql-hackers |
Hi, First, apologies on the patch. Find re-attached updated version. Now I have some questions.... #1 > -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) > +#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) > +<< 31)) > > IME this means that the autoconf you are using has been patched. A quick search on the mailing lists seems to indicatethat it might be specific to Debian [1]. I am not sure what the ask is here? I made changes to the configure.ac and ran autoconf2.69 to get builds to succeed. Doyou have a separate feedback here? #2 As for the refactoring, this was done to satisfy previous review feedback about applying the AVX512 CFLAGS to the entirepg_bitutils.c file. Mainly to avoid segfault due to the AVX512 flags. If its ok, I would prefer to make a single commitas the change is pretty small and straight forward. #3 I am not sure I understand the comment about the SIZE_VOID_P checks. Aren't they necessary to choose which functions to callbased on 32 or 64 bit architectures? #4 Would this change qualify for Workflow A as described in [0] and can be picked up by a committer, given it has been reviewedby multiple committers so far? The scope of the change is pretty contained as well. [0] https://wiki.postgresql.org/wiki/Submitting_a_Patch Thanks, Paul -----Original Message----- From: Nathan Bossart <nathandbossart@gmail.com> Sent: Friday, March 1, 2024 1:45 PM To: Amonson, Paul D <paul.d.amonson@intel.com> Cc: Andres Freund <andres@anarazel.de>; Alvaro Herrera <alvherre@alvh.no-ip.org>; Shankaran, Akash <akash.shankaran@intel.com>;Noah Misch <noah@leadboat.com>; Tom Lane <tgl@sss.pgh.pa.us>; Matthias van de Meent <boekewurm+postgres@gmail.com>;pgsql-hackers@lists.postgresql.org Subject: Re: Popcount optimization using AVX512 Thanks for the new version of the patch. I didn't see a commitfest entry for this one, and unfortunately I think it's toolate to add it for the March commitfest. I would encourage you to add it to July's commitfest [0] so that we can getsome routine cfbot coverage. On Tue, Feb 27, 2024 at 08:46:06PM +0000, Amonson, Paul D wrote: > After consulting some Intel internal experts on MSVC the linking issue > as it stood was not resolved. Instead, I created a MSVC ONLY work-around. > This adds one extra functional call on the Windows builds (The linker > resolves a real function just fine but not a function pointer of the > same name). This extra latency does not exist on any of the other > platforms. I also believe I addressed all issues raised in the > previous reviews. The new pg_popcnt_x86_64_accel.c file is now the > ONLY file compiled with the > AVX512 compiler flags. I added support for the MSVC compiler flag as > well. Both meson and autoconf are updated with the new refactor. > > I am attaching the new patch. I think this patch might be missing the new files. -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62)) +#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) +<< 31)) IME this means that the autoconf you are using has been patched. A quick search on the mailing lists seems to indicate thatit might be specific to Debian [1]. -static int pg_popcount32_slow(uint32 word); -static int pg_popcount64_slow(uint64 word); +int pg_popcount32_slow(uint32 word); +int pg_popcount64_slow(uint64 word); +uint64 pg_popcount_slow(const char *buf, int bytes); This patch appears to do a lot of refactoring. Would it be possible to break out the refactoring parts into a prerequisitepatch that could be reviewed and committed independently from the AVX512 stuff? -#if SIZEOF_VOID_P >= 8 +#if SIZEOF_VOID_P == 8 /* Process in 64-bit chunks if the buffer is aligned. */ - if (buf == (const char *) TYPEALIGN(8, buf)) + if (buf == (const char *)TYPEALIGN(8, buf)) { - const uint64 *words = (const uint64 *) buf; + const uint64 *words = (const uint64 *)buf; while (bytes >= 8) { @@ -309,9 +213,9 @@ pg_popcount(const char *buf, int bytes) bytes -= 8; } - buf = (const char *) words; + buf = (const char *)words; } -#else +#elif SIZEOF_VOID_P == 4 /* Process in 32-bit chunks if the buffer is aligned. */ if (buf == (const char *) TYPEALIGN(4, buf)) { Most, if not all, of these changes seem extraneous. Do we actually need to more strictly check SIZEOF_VOID_P? [0] https://commitfest.postgresql.org/48/ [1] https://postgr.es/m/20230211020042.uthdgj72kp3xlqam%40awork3.anarazel.de -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
pgsql-hackers by date: