RE: Popcount optimization using AVX512 - Mailing list pgsql-hackers

From Amonson, Paul D
Subject RE: Popcount optimization using AVX512
Date
Msg-id BL1PR11MB5304BFE26BAC25624508CFD2DC232@BL1PR11MB5304.namprd11.prod.outlook.com
Whole thread Raw
In response to Re: Popcount optimization using AVX512  (Nathan Bossart <nathandbossart@gmail.com>)
Responses Re: Popcount optimization using AVX512
Re: Popcount optimization using AVX512
List pgsql-hackers
Hi,

First, apologies on the patch. Find re-attached updated version.

Now I have some questions....
#1

> -#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
> +#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31)
> +<< 31))
>
> IME this means that the autoconf you are using has been patched.  A quick search on the mailing lists seems to
indicatethat it might be specific to Debian [1]. 

I am not sure what the ask is here?  I made changes to the configure.ac and ran autoconf2.69 to get builds to succeed.
Doyou have a separate feedback here?  

#2
As for the refactoring, this was done to satisfy previous review feedback about applying the AVX512 CFLAGS to the
entirepg_bitutils.c file. Mainly to avoid segfault due to the AVX512 flags. If its ok, I would prefer to make a single
commitas the change is pretty small and straight forward. 

#3
I am not sure I understand the comment about the SIZE_VOID_P checks. Aren't they necessary to choose which functions to
callbased on 32 or 64 bit architectures? 

#4
Would this change qualify for Workflow A as described in [0] and can be picked up by a committer, given it has been
reviewedby multiple committers so far? The scope of the change is pretty contained as well.  

[0] https://wiki.postgresql.org/wiki/Submitting_a_Patch

Thanks,
Paul


-----Original Message-----
From: Nathan Bossart <nathandbossart@gmail.com>
Sent: Friday, March 1, 2024 1:45 PM
To: Amonson, Paul D <paul.d.amonson@intel.com>
Cc: Andres Freund <andres@anarazel.de>; Alvaro Herrera <alvherre@alvh.no-ip.org>; Shankaran, Akash
<akash.shankaran@intel.com>;Noah Misch <noah@leadboat.com>; Tom Lane <tgl@sss.pgh.pa.us>; Matthias van de Meent
<boekewurm+postgres@gmail.com>;pgsql-hackers@lists.postgresql.org 
Subject: Re: Popcount optimization using AVX512

Thanks for the new version of the patch.  I didn't see a commitfest entry for this one, and unfortunately I think it's
toolate to add it for the March commitfest.  I would encourage you to add it to July's commitfest [0] so that we can
getsome routine cfbot coverage. 

On Tue, Feb 27, 2024 at 08:46:06PM +0000, Amonson, Paul D wrote:
> After consulting some Intel internal experts on MSVC the linking issue
> as it stood was not resolved. Instead, I created a MSVC ONLY work-around.
> This adds one extra functional call on the Windows builds (The linker
> resolves a real function just fine but not a function pointer of the
> same name). This extra latency does not exist on any of the other
> platforms. I also believe I addressed all issues raised in the
> previous reviews. The new pg_popcnt_x86_64_accel.c file is now the
> ONLY file compiled with the
> AVX512 compiler flags. I added support for the MSVC compiler flag as
> well. Both meson and autoconf are updated with the new refactor.
>
> I am attaching the new patch.

I think this patch might be missing the new files.

-#define LARGE_OFF_T (((off_t) 1 << 62) - 1 + ((off_t) 1 << 62))
+#define LARGE_OFF_T ((((off_t) 1 << 31) << 31) - 1 + (((off_t) 1 << 31)
+<< 31))

IME this means that the autoconf you are using has been patched.  A quick search on the mailing lists seems to indicate
thatit might be specific to Debian [1]. 

-static int    pg_popcount32_slow(uint32 word);
-static int    pg_popcount64_slow(uint64 word);
+int    pg_popcount32_slow(uint32 word);
+int    pg_popcount64_slow(uint64 word);
+uint64 pg_popcount_slow(const char *buf, int bytes);

This patch appears to do a lot of refactoring.  Would it be possible to break out the refactoring parts into a
prerequisitepatch that could be reviewed and committed independently from the AVX512 stuff? 

-#if SIZEOF_VOID_P >= 8
+#if SIZEOF_VOID_P == 8
     /* Process in 64-bit chunks if the buffer is aligned. */
-    if (buf == (const char *) TYPEALIGN(8, buf))
+    if (buf == (const char *)TYPEALIGN(8, buf))
     {
-        const uint64 *words = (const uint64 *) buf;
+        const uint64 *words = (const uint64 *)buf;

         while (bytes >= 8)
         {
@@ -309,9 +213,9 @@ pg_popcount(const char *buf, int bytes)
             bytes -= 8;
         }

-        buf = (const char *) words;
+        buf = (const char *)words;
     }
-#else
+#elif SIZEOF_VOID_P == 4
     /* Process in 32-bit chunks if the buffer is aligned. */
     if (buf == (const char *) TYPEALIGN(4, buf))
     {

Most, if not all, of these changes seem extraneous.  Do we actually need to more strictly check SIZEOF_VOID_P?

[0] https://commitfest.postgresql.org/48/
[1] https://postgr.es/m/20230211020042.uthdgj72kp3xlqam%40awork3.anarazel.de

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

pgsql-hackers by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Adding deprecation notices to pgcrypto documentation
Next
From: Nathan Bossart
Date:
Subject: Re: Improve the log message output of basic_archive when basic_archive.archive_directory parameter is not set