Re: Popcount optimization using AVX512 - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: Popcount optimization using AVX512
Date
Msg-id 20240301214457.GA3024365@nathanxps13
Whole thread Raw
In response to RE: Popcount optimization using AVX512  ("Amonson, Paul D" <paul.d.amonson@intel.com>)
Responses RE: Popcount optimization using AVX512
List pgsql-hackers
Thanks for the new version of the patch.  I didn't see a commitfest entry
for this one, and unfortunately I think it's too late to add it for the
March commitfest.  I would encourage you to add it to July's commitfest [0]
so that we can get some routine cfbot coverage.

On Tue, Feb 27, 2024 at 08:46:06PM +0000, Amonson, Paul D wrote:
> After consulting some Intel internal experts on MSVC the linking issue as
> it stood was not resolved. Instead, I created a MSVC ONLY work-around.
> This adds one extra functional call on the Windows builds (The linker
> resolves a real function just fine but not a function pointer of the same
> name). This extra latency does not exist on any of the other platforms. I
> also believe I addressed all issues raised in the previous reviews. The
> new pg_popcnt_x86_64_accel.c file is now the ONLY file compiled with the
> AVX512 compiler flags. I added support for the MSVC compiler flag as
> well. Both meson and autoconf are updated with the new refactor.
> 
> I am attaching the new patch.

I think this patch might be missing the new files.

-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31) << 31))

IME this means that the autoconf you are using has been patched.  A quick
search on the mailing lists seems to indicate that it might be specific to
Debian [1].

-static int    pg_popcount32_slow(uint32 word);
-static int    pg_popcount64_slow(uint64 word);
+int    pg_popcount32_slow(uint32 word);
+int    pg_popcount64_slow(uint64 word);
+uint64 pg_popcount_slow(const char *buf, int bytes);

This patch appears to do a lot of refactoring.  Would it be possible to
break out the refactoring parts into a prerequisite patch that could be
reviewed and committed independently from the AVX512 stuff?

-#if SIZEOF_VOID_P >= 8
+#if SIZEOF_VOID_P == 8
     /* Process in 64-bit chunks if the buffer is aligned. */
-    if (buf == (const char *) TYPEALIGN(8, buf))
+    if (buf == (const char *)TYPEALIGN(8, buf))
     {
-        const uint64 *words = (const uint64 *) buf;
+        const uint64 *words = (const uint64 *)buf;
 
         while (bytes >= 8)
         {
@@ -309,9 +213,9 @@ pg_popcount(const char *buf, int bytes)
             bytes -= 8;
         }
 
-        buf = (const char *) words;
+        buf = (const char *)words;
     }
-#else
+#elif SIZEOF_VOID_P == 4
     /* Process in 32-bit chunks if the buffer is aligned. */
     if (buf == (const char *) TYPEALIGN(4, buf))
     {

Most, if not all, of these changes seem extraneous.  Do we actually need to
more strictly check SIZEOF_VOID_P?

[0] https://commitfest.postgresql.org/48/
[1] https://postgr.es/m/20230211020042.uthdgj72kp3xlqam%40awork3.anarazel.de

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: pread, pwrite, etc return ssize_t not int
Next
From: Jacob Champion
Date:
Subject: Re: Experiments with Postgres and SSL