Re: use ARM intrinsics in pg_lfind32() where available - Mailing list pgsql-hackers

From Tom Lane
Subject Re: use ARM intrinsics in pg_lfind32() where available
Date
Msg-id 2494641.1661635114@sss.pgh.pa.us
Whole thread Raw
In response to Re: use ARM intrinsics in pg_lfind32() where available  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: use ARM intrinsics in pg_lfind32() where available
List pgsql-hackers
I spent a bit more time researching the portability implications of
this patch.  I think that we should check __ARM_NEON before #including
<arm_neon.h>; there is authoritative documentation out there telling
you to, eg [1], and I can see no upside at all to not checking.
We cannot check *only* __ARM_NEON, though.  I found it to get defined
by clang 8.0.0 in my Fedora 30 32-bit image, although that does not
provide all the instructions we want (I see "undefined function"
complaints for vmaxvq_u8 etc if I try to make it use the patch).
Looking into that installation's <arm_neon.h>, those functions are
defined conditionally if "__ARM_FP & 2", which is kind of interesting
--- per [1], that bit indicates support for 16-bit floating point,
which seems a mite unrelated.

It appears from the info at [2] that there are at least some 32-bit
ARM platforms that set that bit, implying (if the clang authors are
well informed) that they have the instructions we want.  But we
could not realistically make 32-bit builds that try to use those
instructions without a run-time test; such a build would fail for
too many people.  I doubt that a run-time test is worth the trouble,
so I concur with the idea of selecting NEON on aarch64 only and hoping
to thereby avoid a runtime test.

In short, I think the critical part of 0002 needs to look more like
this:

+#elif defined(__aarch64__) && defined(__ARM_NEON)
+/*
+ * We use the Neon instructions if the compiler provides access to them
+ * (as indicated by __ARM_NEON) and we are on aarch64.  While Neon support is
+ * technically optional for aarch64, it appears that all available 64-bit
+ * hardware does have it.  Neon exists in some 32-bit hardware too, but
+ * we could not realistically use it there without a run-time check,
+ * which seems not worth the trouble for now.
+ */
+#include <arm_neon.h>
+#define USE_NEON
...

Coding like this appears to work on both my Apple M1 and my Raspberry
Pi, with several different OSes checked on the latter.

            regards, tom lane

[1]
https://developer.arm.com/documentation/101754/0618/armclang-Reference/Other-Compiler-specific-Features/Predefined-macros
[2] http://micro-os-plus.github.io/develop/predefined-macros/



pgsql-hackers by date:

Previous
From: Ibrar Ahmed
Date:
Subject: [Commitfest 2022-09] Begins This Thursday
Next
From: Nathan Bossart
Date:
Subject: Re: use ARM intrinsics in pg_lfind32() where available