Re: [PATCH] SVE popcount support - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: [PATCH] SVE popcount support
Date
Msg-id Z5QII8eZnj_6P1Mh@nathan
Whole thread Raw
In response to Re: [PATCH] SVE popcount support  (Kirill Reshke <reshkekirill@gmail.com>)
List pgsql-hackers
The meson configure check seems to fail on my machine:

    error: too many arguments to function call, expected 0, have 1
       10 |     svuint64_t popcnt = svcntb(val);
          |                         ~~~~~~ ^~~

    error: returning '__SVInt64_t' from a function with incompatible result type 'int'
       12 |     return popcnt == 0;
          |            ^~~~~~~~~~~

The autoconf version seems to work okay, though.

+    pgac_save_CFLAGS=$CFLAGS
+    CFLAGS="$pgac_save_CFLAGS $1"

I don't see any extra compiler flag tests used, so we no longer need this,
right?

+  if test x"$pgac_cv_arm_sve_popcnt_intrinsics" = x"yes"; then
+    pgac_arm_sve_popcnt_intrinsics=yes
+  fi

I'm curious why this doesn't use Ac_cachevar like the examples above it
(e.g., PGAC_XSAVE_INTRINSICS).

+  prog = '''
+#include <arm_sve.h>
+
+#if defined(__has_attribute) && __has_attribute (target)
+    __attribute__((target("arch=armv8-a+sve")))
+#endif
+int main(void)
+{
+    const svuint64_t val = svdup_u64(0xFFFFFFFFFFFFFFFF);
+    svuint64_t popcnt = svcntb(val);
+    /* return computed value, to prevent the above being optimized away */
+    return popcnt == 0;
+}
+'''

This test looks quite different than the autoconf one.  Why is that?  I
would expect them to be the same.  And I think ideally the test would check
that all the intrinsics functions we need are available.

+/*
+ * Returns true if the CPU supports the instructions required for the SVE
+ * pg_popcount() implementation.
+ */
+bool
+pg_popcount_sve_available(void)
+{
+    return getauxval(AT_HWCAP) & HWCAP_SVE;
+}

pg_crc32c_armv8_available() (in pg_crc32c_armv8_choose.c) looks quite a bit
more complicated than this.  Are we missing something here?

+    /*
+     * For smaller inputs, aligning the buffer degrades the performance.
+     * Therefore, the buffers only when the input size is sufficiently large.
+     */

Is the inverse true, i.e., does aligning the buffer improve performance for
larger inputs?  I'm also curious what level of performance degradation you
were seeing.

+#include "c.h"
+#include "port/pg_bitutils.h"
+
+#ifdef USE_SVE_POPCNT_WITH_RUNTIME_CHECK

nitpick: The USE_SVE_POPCNT_WITH_RUNTIME_CHECK check can probably go above
the #include for pg_bitutils.h (but below the one for c.h).

-- 
nathan



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Eagerly scan all-visible pages to amortize aggressive vacuum
Next
From: Daniel Gustafsson
Date:
Subject: Re: Windows: openssl & gssapi dislike each other