Thread: Re: Detection of hadware feature => please do not use signal

Re: Detection of hadware feature => please do not use signal

From
Heikki Linnakangas
Date:
On 31/10/2024 17:41, Bastien Roucariès wrote:
> Hi,
> 
> On debian side we will like to avoid use of sigill to detect feature.
> 
> https://sources.debian.org/src/postgresql-17/17.0-1/src/port/pg_crc32c_armv8_choose.c/#L55
> 
> is really bad.

Why is it bad?

> Could you use the canonical way under linux
> 
> #include <sys/auxv.h>
> #include <asm/hwcap.h>
> 
>   #if defined(__aarch64__)
>     return !(getauxval(AT_HWCAP) &  HWCAP_CRC32);
>   #else
>     return !(getauxval(AT_HWCAP2) & HWCAP2_CRC32);
>   #endif

We used to do that, but it was changed in commit 1c72ec6f. Looking at 
the discussion that led to it, it's not clear to me why we switched [1]. 
Portablity to non-glibc systems maybe?

If there's a handy, portable way to check CPU capabilities, I agree it 
seem less hacky than catching SIGILL...

[1] 
https://www.postgresql.org/message-id/CAEepm%3D02Run-Pk3xyt%2BRV3p1N%2B7cKZxN95_MamaJw8Cnw%2BDwjQ%40mail.gmail.com

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 31/10/2024 17:41, Bastien Roucariès wrote:
>> Could you use the canonical way under linux

> We used to do that, but it was changed in commit 1c72ec6f. Looking at 
> the discussion that led to it, it's not clear to me why we switched [1]. 

The complaint was precisely that it would not work under non-linux.

Maybe we could do it one way on linux and the other way everywhere
else, but I do not find that attractive; it just makes it harder
to get full test coverage.

            regards, tom lane



Re: Detection of hadware feature => please do not use signal

From
Heikki Linnakangas
Date:
On 31/10/2024 19:48, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> On 31/10/2024 17:41, Bastien Roucariès wrote:
>>> Could you use the canonical way under linux
> 
>> We used to do that, but it was changed in commit 1c72ec6f. Looking at
>> the discussion that led to it, it's not clear to me why we switched [1].
> 
> The complaint was precisely that it would not work under non-linux.
> 
> Maybe we could do it one way on linux and the other way everywhere
> else, but I do not find that attractive; it just makes it harder
> to get full test coverage.

Bastien, would you happen to know how other portable libraries that need 
to run on other operating system do this? We added the SIGILL probe back 
in 2018, there might be better options available now.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 31/10/2024 19:48, Tom Lane wrote:
>> Maybe we could do it one way on linux and the other way everywhere
>> else, but I do not find that attractive; it just makes it harder
>> to get full test coverage.

> Bastien, would you happen to know how other portable libraries that need 
> to run on other operating system do this? We added the SIGILL probe back 
> in 2018, there might be better options available now.

It occurs to me to wonder whether the existing code works on Windows.
Windows-on-ARM wasn't a thing we thought about in 2018, but it's
a reasonable target now.

            regards, tom lane



Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
Bastien =?ISO-8859-1?Q?Roucari=E8s?= <rouca@debian.org> writes:
> Le jeudi 31 octobre 2024, 17:29:45 UTC Heikki Linnakangas a écrit :
>> Why is it bad?

> for a binary it is maybe safe for a library it is unsafe in Multi thread context to capture signal...

[ shrug... ] If we're concerned about that, we should hardly be less
concerned on other platforms than we are on Linux.

(AFAIK, we do not execute the code in question in any libraries.)

            regards, tom lane



Re: Detection of hadware feature => please do not use signal

From
Thomas Munro
Date:
On Fri, Nov 1, 2024 at 7:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It occurs to me to wonder whether the existing code works on Windows.
> Windows-on-ARM wasn't a thing we thought about in 2018, but it's
> a reasonable target now.

I looked into that[1] and decided that I was going to ignore it
completely, because:

* Windows defines a dummy SIGILL signal number as required by the C
standard, so stuff like this compiles, but
EXCEPTION_ILLEGAL_INSTRUCTION isn't connected up to it and you'd just
crash instead, but even if it were...
* we don't install a native signal handler anyway, just one of our
fake ones (that I would love to get rid of)
* there is also a native API to check CPU features, but I don't see
the point in even thinking about it for now, because...
* Windows 11 effectively requires ARMv8.1-A to boot, and that's what
comes installed on current machines you can buy (this is expressed
differently officially, by saying that LSE atomics are required and by
listing the supported CPUs by model not ISA, and the list of CPUs that
are no longer supported can also be found online, so it's fairly
clear; there are also fun investigative reports from the time when
RPI4s and other never-officially-supported systems could suddenly no
longer boot after some update, as they would croak on LSE
instructions)
* if someone is using Windows 10 for ARM (which I gather is getting
harder to obtain by now) on an old enough low power laptop that lacks
the instruction (and note that most ARMv8-A chips *did* have that
instruction as an optional extra anyway, they just didn't have LSE),
well then maybe it's an issue but I think for a new platform where we
are so short of developers that we haven't even got past other really
basic starter problems after a couple of years of talking about it, I
think we should focus only on current systems instead of allowing even
1 nanosecond to be wasted on retro-computing topics... and in any case
that hypothetical user is just about out of time because...
* Windows 10's announced EOL (no more updates/support, full screen
warning messages if past EOLs are anything to go by) coincides
approximately with PostgreSQL 18's expected release

(Huh, thinking about other synchronous signals, don't we also fail to
install a native SIGFPE handler on Windows, that would convert eg
div-by-zero in C code into an ereport on Unix?)

Anyway I don't mind getting rid of the SIGILL stuff as long as the new
coding is tidy and cross-platform enough, if people are insisting.
I'm not aware that there's anything technically wrong with it on any
Unix system, but I agree that it's not beautiful code.  FreeBSD was my
motivation at the time, but it has since gained elf_aux_info(), and
OpenBSD too, and I can help write/test that part if we go that way.
Or maybe we could just take some fragments of OpenSSL or libzlma or
whatever code if the licensing aspect is OK.

As for the other ARM platforms in our universe:

Unfortunately it looks like NetBSD doesn't put AT_HWCAP into its
auxv[], or even expose it to user space nicely, and those libraries
don't seem to know of another way.  Hopefully NetBSD will align with
those other systems for portability's sake.  The only other way I
could find in a quick googling session is /usr/sbin/cpuctl, root only,
no cigar (but a solid clue that the information is floating around
somewhere, it just depends where exactly the root-only fencing is
happening).  Even if a way can be found, it'd be better to be able to
do it the same way as other systems that we are actually testing if at
all possible.  Put that way, the build farm absence is a sort of vote
for just not even trying to detect it on NetBSD.  A NetBSD user who
really wants to access the feature could still compile with
-march=<new thing>, supply a build farm animal and a patch, or
preferrably help get a compatible elf_auxv_info(AT_HWCAP) merged into
NetBSD.  That'd be my vote if we switch to auxv probing on the other
systems.

Macs don't do ELF or auxv, but there's a sysctl.  I think it must
always report true in practice.  The first M1 was ARMv8.4-A.  (There
are old non-computer Apple devices that used ARMv8-A still in the
wild, which is interesting to me only because it explains the recent
invention of -moutline-atomics, about which more soon hopefully, to
get faster lwlocks etc on our -march=armv8-a builds as shipped by
Debian et al.)  So I think we could skip the palaver and just
hard-code the knowledge that Macs can do this stuff.

(Archeological note: the reason several systems have the same auxv[]
concept, ie a table of parameters that exec*() sets up alongside
argv[] and environ[] to communicate stuff about memory layout etc
primarily to libc, and the reason they even agree on some of the
parameter names, is that it came from SVR4 with ELF.  I think Sun's
original version got AT_HWCAP from the ELF binary after selecting from
several object variants that were compiled to match various CPU
feature sets, as a way of shipping fat binaries that go faster on the
right hardware, while these modern systems are just passing on
whatever the CPU reports under the same hijacked name; it's related,
but different...  Anyway they needed some way for the kernel to give
the features to user space on ARM, because its register that is
equivalent to x86's CPUID can't be accessed from user space's
privilege level and libc itself would like to be able to use some
fancy features.  Even for non-ARM architectures it's nice not to have
to break out the assembler in user programs that want to do the same
sorts of tricks.  Amazingly, illumos can apparently run on ARM now so
we might in theory encounter the old meaning of AT_HWCAP, but that's
quite a hypothetical unicorn so I'm filing the thought under
archeology and hiding it in parentheses.)

Note that Andres recently wondered out loud[2] if CRC32 might be
fundamentally the wrong tool for the job in a related thread, so
perhaps this will all become moot if someone does the research and
replaces it, but that's vapourware for now...

[1] https://www.postgresql.org/message-id/CA%2BhUKGJ2B5rAGUncAob%3DChutCT%3Dfx0Ot7kwvio5cB7NpOGKG1Q%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/flat/20240612193746.rjeiip4hcamjedgo%40awork3.anarazel.de#ab383730597411817c69516ec6c1a65c



Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Fri, Nov 1, 2024 at 7:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It occurs to me to wonder whether the existing code works on Windows.
>> Windows-on-ARM wasn't a thing we thought about in 2018, but it's
>> a reasonable target now.

> I looked into that[1] and decided that I was going to ignore it
> completely, because:
> [ oodles o' details ]

Hmm.  So it seems like we could do this:

* Linux, FreeBSD, OpenBSD: do run-time test as recommended
* Windows, macOS: assume that supported ARM hardware can do this
* anything else: assume no hardware CRC

The only thing I'd be even a bit sad about there is not having
NetBSD support.  Maybe we just need to research that a bit more,
or maybe we have to wait for/lobby for them to do what the other
BSDen have done.  But in any case I'm not seeing NetBSD-on-ARM
as a platform that's important enough to block movement on this,
let alone anything else in the "anything else" category.

An alternative to "assume no hardware CRC" could be

* anything else: use the existing SIGILL test

But I do sympathize with Bastien's concerns about that.

            regards, tom lane



Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
I wrote:
> Hmm.  So it seems like we could do this:
> * Linux, FreeBSD, OpenBSD: do run-time test as recommended
> * Windows, macOS: assume that supported ARM hardware can do this
> * anything else: assume no hardware CRC

Actually, after chewing on that second point awhile longer,
how about this modest proposal:

* Drop all code for run-time determination of ARM CRC support.
Assume it's there unless user builds with a -march option that
says it definitely isn't.

Realistically, exactly who is going to be running Postgres 18+
on ARM hardware that lacks CRC support?  I can think of lots of
projects that are more worthy of our time than this.

(Perhaps it's time to apply the same mindset to x86 CRC support
too?)

            regards, tom lane



Re: Detection of hadware feature => please do not use signal

From
Thomas Munro
Date:
On Fri, Nov 1, 2024 at 7:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > Hmm.  So it seems like we could do this:
> > * Linux, FreeBSD, OpenBSD: do run-time test as recommended
> > * Windows, macOS: assume that supported ARM hardware can do this
> > * anything else: assume no hardware CRC
>
> Actually, after chewing on that second point awhile longer,
> how about this modest proposal:
>
> * Drop all code for run-time determination of ARM CRC support.
> Assume it's there unless user builds with a -march option that
> says it definitely isn't.

For CRC32, that would work (and already does via configure tests) for
macOS (I'd forgotten about that in my earlier message, Macs never even
compile the CRC32 "choose" code because the system compiler targets M1
by default), recent RHEL (see below), and presumably others, but it
would penalise Debian, FreeBSD (using the standard binary package
repo) and others if we didn't have the runtime check due to their
conservative choice of baseline arch (as Bastien just said about
Debian in a crossed email).  I've been wondering about what to do
about all this for a while...  There are also other features we aren't
using yet, but should, or single instructions that we are calling
through function pointers, preventing automatic vectorisation etc.

> Realistically, exactly who is going to be running Postgres 18+
> on ARM hardware that lacks CRC support?  I can think of lots of
> projects that are more worthy of our time than this.
>
> (Perhaps it's time to apply the same mindset to x86 CRC support
> too?)

I think it's quite interesting that RHEL has moved its baseline to
x86_86-v2, having recently worked with Intel and AMD to standardise
new levels "-vN" that group and "linearise" the feature sets, and
others were already talking about standarding on -v3 last time I
looked, while Debian still targets x86_86 which means the instruction
set of the first AMD K8 chip from 2003.  Debian has a nice wiki page
(see thread ↓) explaining various workaround strategies, ranging from
runtime tests like ours, per-sub-architecture library selection,
through to (last resort) depending on pseudo-packages like
"x86-64-v2-support" so that your package becomes uninstallable on
ancient hardware.  I'm not involved in packaging but I was wondering
out loud whether the PGDG repos might potentially consider making a
different choice than the official Debian repos, always or as an
option, or ... various other ideas proposed by others.  Perhaps we
could continue the topic on that thread:

https://www.postgresql.org/message-id/flat/CA%2BhUKGKS64zJezV9y9mPcB-J0i%2BfLGiv3FAdwSH_3SCaVdrjyQ%40mail.gmail.com

In the meantime I'll see about the Linux/*BSD patch for this thing,
more tomorrow hopefully.  And I might see if I can find a NetBSD
person to ask...



Re: Detection of hadware feature => please do not use signal

From
Nathan Bossart
Date:
On Sat, Nov 02, 2024 at 06:11:33PM +1300, Thomas Munro wrote:
> On Fri, Nov 1, 2024 at 7:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Actually, after chewing on that second point awhile longer,
>> how about this modest proposal:
>>
>> * Drop all code for run-time determination of ARM CRC support.
>> Assume it's there unless user builds with a -march option that
>> says it definitely isn't.
> 
> For CRC32, that would work (and already does via configure tests) for
> macOS (I'd forgotten about that in my earlier message, Macs never even
> compile the CRC32 "choose" code because the system compiler targets M1
> by default), recent RHEL (see below), and presumably others, but it
> would penalise Debian, FreeBSD (using the standard binary package
> repo) and others if we didn't have the runtime check due to their
> conservative choice of baseline arch (as Bastien just said about
> Debian in a crossed email).  I've been wondering about what to do
> about all this for a while...  There are also other features we aren't
> using yet, but should, or single instructions that we are calling
> through function pointers, preventing automatic vectorisation etc.

One small improvement in this area that I'm working on (and about to commit
for the AVX-512 stuff) is using __attribute__((target(...))) instead of
special -march options for specific files [0].  Among other things, this
simplifies the configure checks and eliminates the need to put chunks of
code in separate files.  However, I found that until recently, it wasn't
possible to include arm_acle.h without the -march flag [1], which means we
likely won't be able to make the switch for the ARM CRC stuff for a few
more years.

More to the point, until the baseline moves, I think we'll end up adding
checks in header files so that we only take the function pointer overhead
when it's likely to be worth it (e.g., pg_popcount() in pg_bitutils.h).  It
might be worth doing that for the CRC-32C code, too, since we are still
quite reliant on the runtime check.

[0] https://commitfest.postgresql.org/50/5305/
[1] https://postgr.es/m/ZwXsE0KG_wh6_heU%40nathan

-- 
nathan



Re: Detection of hadware feature => please do not use signal

From
Thomas Munro
Date:
On Fri, Nov 8, 2024 at 4:28 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> One small improvement in this area that I'm working on (and about to commit
> for the AVX-512 stuff) is using __attribute__((target(...))) instead of
> special -march options for specific files [0].

(Catching up with that stuff).  Neat.

I tried sprinkling __attribute__((target_clones("default,lse"))) in
front of some functions like LWLockAquire() just to see if this
multifunction stuff worked enough yet on ARM.  FreeBSD 15/clang 18:
yes, I got LSE CAS-based lwlocks with loader-time feature detection!
Debian 12/gcc 12: no, it fails with "error: target does not support
function version dispatcher".  I think maybe gcc 14 will do it (from
quick look at release notes)?  I realise that's yet a different issue
than the <arm_acle.h> visibility thing you mentioned for the CRC32
builtins.  Anyway, it's apparently still too soon to help on all
systems with this specific topic as you already said, so the explicit
auxv check still looks like the only reasonable way for now.



Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> Unfortunately it looks like NetBSD doesn't put AT_HWCAP into its
> auxv[], or even expose it to user space nicely, and those libraries
> don't seem to know of another way.  Hopefully NetBSD will align with
> those other systems for portability's sake.  The only other way I
> could find in a quick googling session is /usr/sbin/cpuctl, root only,
> no cigar (but a solid clue that the information is floating around
> somewhere, it just depends where exactly the root-only fencing is
> happening).

I poked into this with NetBSD and found that cpuctl's "identify"
option works just fine without root.  So that motivated me to dig
into its source code, and I end up with the attached.  Sadly it only
works in 64-bit; there is presumably a way to detect this in 32-bit
as well, but cpuctl doesn't know it.  That doesn't bother me a whole
lot though, at least not nearly as much as the 64-bit case.  Anybody
running PG on 32-bit ARM shouldn't be expecting great performance.

I've checked this on NetBSD 9.2 and 10.0.  I don't have hardware
that should return false (and there may not be any such 64-bit
hardware anyway...), so there's not much more I can test.

            regards, tom lane

diff --git a/src/port/pg_crc32c_armv8_choose.c b/src/port/pg_crc32c_armv8_choose.c
index 306500154e..f035ac9855 100644
--- a/src/port/pg_crc32c_armv8_choose.c
+++ b/src/port/pg_crc32c_armv8_choose.c
@@ -31,6 +31,11 @@
 #endif
 #endif

+#if defined(__NetBSD__) && defined(__aarch64__)
+#include <sys/sysctl.h>
+#include <aarch64/armreg.h>
+#endif
+
 #include "port/pg_crc32c.h"

 static bool
@@ -52,6 +57,42 @@ pg_crc32c_armv8_available(void)
 #else
     return (getauxval(AT_HWCAP2) & HWCAP2_CRC32) != 0;
 #endif
+#elif defined(__NetBSD__) && defined(__aarch64__)
+    /*
+     * On NetBSD we can read AArch64 Instruction Set Attribute Register 0 via
+     * sysctl.  We could probably do something similar on arm32 as well, but
+     * it's undocumented.  Note we assume cpu0 is representative of all the
+     * machine's CPUs.
+     */
+    const char *path = "machdep.cpu0.cpu_id";
+    size_t        len;
+#define SYSCTL_CPU_ID_MAXSIZE    64
+    uint64        sysctlbuf[SYSCTL_CPU_ID_MAXSIZE];
+    struct aarch64_sysctl_cpu_id *id =
+        (struct aarch64_sysctl_cpu_id *) sysctlbuf;
+    uint64        reg;
+    uint64        fld;
+
+    len = sizeof(sysctlbuf);
+    memset(sysctlbuf, 0, len);
+    if (sysctlbyname(path, id, &len, NULL, 0) != 0)
+        return false;
+    if (len != sizeof(struct aarch64_sysctl_cpu_id))
+        return false;            /* kernel incompatibility? */
+
+#define ISAR0_CRC32_BITPOS 16
+#define ISAR0_CRC32_BITWIDTH 4
+#define WIDTHMASK(w)    ((((uint64) 1) << (w)) - 1)
+
+    reg = id->ac_aa64isar0;
+    fld = (reg >> ISAR0_CRC32_BITPOS) & WIDTHMASK(ISAR0_CRC32_BITWIDTH);
+
+    /*
+     * Available documentation defines only the field values 0 (No CRC32) and
+     * 1 (CRC32B/CRC32H/CRC32W/CRC32X/CRC32CB/CRC32CH/CRC32CW/CRC32CX). Assume
+     * that any future nonzero value will be a superset of 1.
+     */
+    return (fld != 0);
 #else
     return false;
 #endif

Re: Detection of hadware feature => please do not use signal

From
Thomas Munro
Date:
On Sat, Nov 23, 2024 at 2:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I poked into this with NetBSD and found that cpuctl's "identify"
> option works just fine without root.  So that motivated me to dig
> into its source code, and I end up with the attached.  Sadly it only
> works in 64-bit; there is presumably a way to detect this in 32-bit
> as well, but cpuctl doesn't know it.  That doesn't bother me a whole
> lot though, at least not nearly as much as the 64-bit case.  Anybody
> running PG on 32-bit ARM shouldn't be expecting great performance.

(Nerd sniped) No NetBSD here but think a 32 bit kernel should expose
sysctl machdep.id_isar, which should be an array of 6 ints, and bits
16-19 of id_isar[5] should give you the answer:

https://github.com/NetBSD/src/blob/312631d82871c81540b81abea3baa09313478bd3/sys/arch/arm/arm32/arm32_machdep.c#L142
https://github.com/NetBSD/src/blob/312631d82871c81540b81abea3baa09313478bd3/sys/arch/arm/arm32/cpu.c#L869

https://developer.arm.com/documentation/ddi0601/2024-09/AArch32-Registers/ID-ISAR5--Instruction-Set-Attribute-Register-5?lang=en

A real 32 bit armv7 or similar like RPI2 must say no.  An armv8+ like
RPI3+ running a 32 bit kernel (ie a 64 bit chip running in aarch32
mode) should say yes.

With a 64 bit kernel running a 32 bit process, I think your code
should work, except that it wouldn't be used because it's gated on
__aarch64__.

So there are combinations to consider.  However, beyond satisfying
curiosity about how it works, I agree that it's all a complete waste
of time and what you have is perfectly fine.  (I wonder why there
isn't a sysctl that gives you the id_aa64isar0_el1 register more
directly, instead of having to slurp it out of struct
aarch64_sysctl_cpu_id.  I mean, why expose similar "isar" registers
differently for aarch32 and aarch64 kernels... but OK.)

+#define ISAR0_CRC32_BITPOS 16
+#define ISAR0_CRC32_BITWIDTH 4
+#define WIDTHMASK(w)    ((((uint64) 1) << (w)) - 1)
+
+    reg = id->ac_aa64isar0;
+    fld = (reg >> ISAR0_CRC32_BITPOS) & WIDTHMASK(ISAR0_CRC32_BITWIDTH);
+
+    /*
+     * Available documentation defines only the field values 0 (No CRC32) and
+     * 1 (CRC32B/CRC32H/CRC32W/CRC32X/CRC32CB/CRC32CH/CRC32CW/CRC32CX). Assume
+     * that any future nonzero value will be a superset of 1.
+     */
+    return (fld != 0);

Since you included <aarch64/armreg.h>, and only care about non-zero,
couldn't you use the mask from the header, instead of all that
bitswizzling?

return (id->ac_aa64isar0 & ID_AA64ISAR0_EL1_CRC32) != 0;

> I've checked this on NetBSD 9.2 and 10.0.  I don't have hardware
> that should return false (and there may not be any such 64-bit
> hardware anyway...), so there's not much more I can test.

I wish qemu supported a cpu that didn't have that...



Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> (Nerd sniped) No NetBSD here but think a 32 bit kernel should expose
> sysctl machdep.id_isar, which should be an array of 6 ints, and bits
> 16-19 of id_isar[5] should give you the answer:

Huh.  cpuctl's arm32 module knows of machdep.id_isar, but it does not
appear to know that that word exists:

http://cvsweb.netbsd.org/bsdweb.cgi/src/usr.sbin/cpuctl/arch/arm.c?rev=1.6;content-type=text%2Fplain

But that developer.arm.com page seems pretty definitive.  Tomorrow
I'll have a go at spinning up a 32-bit NetBSD installation and test
it.  Thanks for the research!

> Since you included <aarch64/armreg.h>, and only care about non-zero,
> couldn't you use the mask from the header, instead of all that
> bitswizzling?

> return (id->ac_aa64isar0 & ID_AA64ISAR0_EL1_CRC32) != 0;

I didn't actually look into that header ;-) ... I was just basing this
on what cpuctl's code does.  That version does look cleaner, at
least for aarch64 --- but given that the field is defined to be
in the same place in the relevant register for each arch, I'm
a bit inclined to keep the bit-swizzling and just have two paths
for getting the register contents.  Unless maybe that header is
also installed in arm32 builds, in which case we could do it as
you show for both arches.  I'll find out mañana.

            regards, tom lane



Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
I wrote:
> But that developer.arm.com page seems pretty definitive.  Tomorrow
> I'll have a go at spinning up a 32-bit NetBSD installation and test
> it.  Thanks for the research!

Conclusions:

1. Unsurprisingly, a 64-bit kernel exposes machdep.cpuN.cpu_id but
not machdep.id_isar; a 32-bit kernel the reverse.

2. 32-bit userland does not provide <aarch64/armreg.h>.  (So I did
not try to use the ID_AA64ISAR0_EL1_CRC32 macro: we'd have had to
hard-code the field position anyway for 32-bit.)

3. It doesn't look to me like NetBSD really supports 32-bit userland
under a 64-bit kernel.  Maybe it'd kind of work, but utilities like
cpuctl seem to be built for only one of the kernel APIs.

So I end up proposing the attached.  If we do somehow find ourselves
running 32-bit with a 64-bit kernel, this will report "no CRC"
because the sysctl call will fail.  I don't think it's worth
sweating over that case.

BTW, I found out that by default we'll build a software-CRC-only
implementation on 32-bit ARM NetBSD, at least in the "generic"
eabihf build:

checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS=... no
checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS=-march=armv8-a+crc... no
checking which CRC-32C implementation to use... slicing-by-8

It turns out that -march=armv8-a+crc would work, except:

configure:17624: checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS=-march=armv8-a+crc
configure:17648: ccache gcc -o conftest -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement
-Werror=vla-Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type
-Wshadow=compatible-local-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard
-Wno-format-truncation-Wno-stringop-truncation -g -O2 -march=armv8-a+crc    -Wl,-z,now -Wl,-z,relro  conftest.c -lz
-ledit-lcurses -lpthread -lexecinfo -lrt -lm  >&5 
cc1: error: '-mfloat-abi=hard': selected processor lacks an FPU

If you force it with a compatible -mfpu switch (I used -mfpu=neon)
then you get a build with runtime CRC check, and all seems to work.
I did not try very hard to determine what the default -mfpu is here
or why -march=armv8-a+crc overrides whatever the default is.  This
probably means that no end user is going to see any benefit from this
exercise, because they are not likely to try adding -mfpu by itself.
If they add "-march=armv8-a+crc -mfpu=whatever" to CFLAGS manually
then configure will deem that a runtime check is not required, so this
code still doesn't get exercised.

tl;dr: a lot of time spent on a probably-useless exercise.  Still,
now that we've got the code we may as well commit it.  The configure
issue was there all along.

            regards, tom lane

diff --git a/src/port/pg_crc32c_armv8_choose.c b/src/port/pg_crc32c_armv8_choose.c
index 306500154e..11e27978fd 100644
--- a/src/port/pg_crc32c_armv8_choose.c
+++ b/src/port/pg_crc32c_armv8_choose.c
@@ -31,6 +31,13 @@
 #endif
 #endif

+#if defined(__NetBSD__)
+#include <sys/sysctl.h>
+#if defined(__aarch64__)
+#include <aarch64/armreg.h>
+#endif
+#endif
+
 #include "port/pg_crc32c.h"

 static bool
@@ -52,6 +59,47 @@ pg_crc32c_armv8_available(void)
 #else
     return (getauxval(AT_HWCAP2) & HWCAP2_CRC32) != 0;
 #endif
+#elif defined(__NetBSD__)
+    /*
+     * On NetBSD we can read the Instruction Set Attribute Registers via
+     * sysctl.  For doubtless-historical reasons the sysctl interface is
+     * completely different on 64-bit than 32-bit, but the underlying
+     * registers contain the same fields.
+     */
+#define ISAR0_CRC32_BITPOS 16
+#define ISAR0_CRC32_BITWIDTH 4
+#define WIDTHMASK(w)    ((1 << (w)) - 1)
+#define SYSCTL_CPU_ID_MAXSIZE 64
+
+    size_t        len;
+    uint64        sysctlbuf[SYSCTL_CPU_ID_MAXSIZE];
+#if defined(__aarch64__)
+    /* We assume cpu0 is representative of all the machine's CPUs. */
+    const char *path = "machdep.cpu0.cpu_id";
+    size_t        expected_len = sizeof(struct aarch64_sysctl_cpu_id);
+#define ISAR0 ((struct aarch64_sysctl_cpu_id *) sysctlbuf)->ac_aa64isar0
+#else
+    const char *path = "machdep.id_isar";
+    size_t        expected_len = 6 * sizeof(int);
+#define ISAR0 ((int *) sysctlbuf)[5]
+#endif
+    uint64        fld;
+
+    len = sizeof(sysctlbuf);
+    memset(sysctlbuf, 0, len);
+    if (sysctlbyname(path, sysctlbuf, &len, NULL, 0) != 0)
+        return false;            /* perhaps kernel is 64-bit and we aren't? */
+    if (len != expected_len)
+        return false;            /* kernel API change? */
+
+    fld = (ISAR0 >> ISAR0_CRC32_BITPOS) & WIDTHMASK(ISAR0_CRC32_BITWIDTH);
+
+    /*
+     * Current documentation defines only the field values 0 (No CRC32) and 1
+     * (CRC32B/CRC32H/CRC32W/CRC32X/CRC32CB/CRC32CH/CRC32CW/CRC32CX).  Assume
+     * that any future nonzero value will be a superset of 1.
+     */
+    return (fld != 0);
 #else
     return false;
 #endif

Re: Detection of hadware feature => please do not use signal

From
Thomas Munro
Date:
On Sun, Nov 24, 2024 at 11:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> cc1: error: '-mfloat-abi=hard': selected processor lacks an FPU

Interesting.  Happens also on Linux starting with GCC 11, apparently.
Here's one with some clues:

https://github.com/checkpoint-restore/criu/issues/1653#issuecomment-986034428

We have a few Linux armv7 animals in the farm, but using GCC 10 and
older so not seeing it yet.



Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Sun, Nov 24, 2024 at 11:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> cc1: error: '-mfloat-abi=hard': selected processor lacks an FPU

> Interesting.  Happens also on Linux starting with GCC 11, apparently.
> Here's one with some clues:
> https://github.com/checkpoint-restore/criu/issues/1653#issuecomment-986034428

Oh, very interesting!  Maybe we need to try -march=armv8-a+crc+fp
(or some spelling like that) if -march=armv8-a+crc doesn't work?

            regards, tom lane



Re: Detection of hadware feature => please do not use signal

From
Thomas Munro
Date:
On Sun, Nov 24, 2024 at 1:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Oh, very interesting!  Maybe we need to try -march=armv8-a+crc+fp
> (or some spelling like that) if -march=armv8-a+crc doesn't work?

Hmm, they were talking about armv7-a, not armv8-a, but it seems
related... does it work?

The GCC docs give only +nofp, not +fp as an option for 32 bit armv8-a
(that is, under "ARM options", as opposed to "aarch64 options").  But
I guess GCC is pretty configurable and NetBSD targets some weird
hardware... as for why it could be like that (if not a mistake), it
looks like it might be technically possible for armv8 (including -a)
to lack FP in aarch32 state (unlike aarch64 state), but I went looking
for real existing 32-bit only armv8 chips lacking FP and found only
stuff like Cortex M23, and they're using armv8-m (embedded profile,
lots of other stuff missing or optional).  The weakest 32 bit-only
armv8-a chip I could find in a quick search was the Cortex A32 but it
looks like it has FP.  I don't know enough about any of this stuff to
guess what's going on here.



Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> The GCC docs give only +nofp, not +fp as an option for 32 bit armv8-a
> (that is, under "ARM options", as opposed to "aarch64 options").  But
> I guess GCC is pretty configurable and NetBSD targets some weird
> hardware... as for why it could be like that (if not a mistake), it
> looks like it might be technically possible for armv8 (including -a)
> to lack FP in aarch32 state (unlike aarch64 state), but I went looking
> for real existing 32-bit only armv8 chips lacking FP and found only
> stuff like Cortex M23, and they're using armv8-m (embedded profile,
> lots of other stuff missing or optional).  The weakest 32 bit-only
> armv8-a chip I could find in a quick search was the Cortex A32 but it
> looks like it has FP.  I don't know enough about any of this stuff to
> guess what's going on here.

I'm confused too, because the gcc documentation is quite definite
that -march=armv8-a assumes FP support by default.  It seems like
somebody broke that for no very good reason and didn't bother to
update the docs either.  Maybe it's actually a bug?

Mainline Linux distros like Fedora and Debian seem to have dropped
32-bit ARM altogether, but there's still a build available from
https://www.raspberrypi.com/software/operating-systems/
that says it is based on Debian bookworm.  I installed that on
my RPi4 and found that it has

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/12/lto-wrapper
Target: arm-linux-gnueabihf
Configured with: ... --with-arch=armv6+fp --with-float=hard ...
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (Raspbian 12.2.0-14+rpi1) 

So, just like NetBSD's "generic" ARM32 build, they are firmly in
the hard-float camp.  There may be builds out there that will
still work on soft-float ARM, but they've got to be scarce ...
and you'd have to wonder why anybody would be trying to run PG
on such hardware anyway.

Anyway, experimentation shows that this gcc version gives the same
"selected processor lacks an FPU" failure with "-march=armv8-a+crc".
It will not take "-march=armv8-a+crc+fp", but it will take
"-march=armv8-a+crc -mfpu=vfpv3"; the same holds for NetBSD 10.0's
gcc 10.5.0.  Possibly other -mfpu settings will work (I recall having
had success yesterday with -mfpu=neon on NetBSD), but I think this is
a pretty generic setting: per the gcc docs it corresponds to typical
armv7 hardware.

I've checked the attached patch on both the raspberrypi system and
NetBSD 10.0.

            regards, tom lane

diff --git a/configure b/configure
index 28719ed30c..d61a7e59dd 100755
--- a/configure
+++ b/configure
@@ -17578,8 +17578,10 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext
 # Check for ARMv8 CRC Extension intrinsics to do CRC calculations.
 #
 # First check if __crc32c* intrinsics can be used with the default compiler
-# flags. If not, check if adding -march=armv8-a+crc flag helps.
-# CFLAGS_CRC is set if the extra flag is required.
+# flags. If not, check if adding -march=armv8-a+crc flag helps. On systems
+# using hard-float ABI, it may be that "-march=armv8-a+crc" will be rejected
+# but "-march=armv8-a+crc -mfpu=vfpv3" will work, so next try that.
+# CFLAGS_CRC is set if extra flag(s) are required.
 { $as_echo "$as_me:${as_lineno-$LINENO}: checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS=" >&5
 $as_echo_n "checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS=... " >&6; }
 if ${pgac_cv_armv8_crc32c_intrinsics_+:} false; then :
@@ -17661,6 +17663,48 @@ if test x"$pgac_cv_armv8_crc32c_intrinsics__march_armv8_apcrc" = x"yes"; then
   pgac_armv8_crc32c_intrinsics=yes
 fi

+  if test x"$pgac_armv8_crc32c_intrinsics" != x"yes"; then
+    { $as_echo "$as_me:${as_lineno-$LINENO}: checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with
CFLAGS=-march=armv8-a+crc-mfpu=vfpv3" >&5 
+$as_echo_n "checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS=-march=armv8-a+crc -mfpu=vfpv3...
">&6; } 
+if ${pgac_cv_armv8_crc32c_intrinsics__march_armv8_apcrc__mfpu_vfpv3+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  pgac_save_CFLAGS=$CFLAGS
+CFLAGS="$pgac_save_CFLAGS -march=armv8-a+crc -mfpu=vfpv3"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+#include <arm_acle.h>
+int
+main ()
+{
+unsigned int crc = 0;
+   crc = __crc32cb(crc, 0);
+   crc = __crc32ch(crc, 0);
+   crc = __crc32cw(crc, 0);
+   crc = __crc32cd(crc, 0);
+   /* return computed value, to prevent the above being optimized away */
+   return crc == 0;
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  pgac_cv_armv8_crc32c_intrinsics__march_armv8_apcrc__mfpu_vfpv3=yes
+else
+  pgac_cv_armv8_crc32c_intrinsics__march_armv8_apcrc__mfpu_vfpv3=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+    conftest$ac_exeext conftest.$ac_ext
+CFLAGS="$pgac_save_CFLAGS"
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_armv8_crc32c_intrinsics__march_armv8_apcrc__mfpu_vfpv3" >&5
+$as_echo "$pgac_cv_armv8_crc32c_intrinsics__march_armv8_apcrc__mfpu_vfpv3" >&6; }
+if test x"$pgac_cv_armv8_crc32c_intrinsics__march_armv8_apcrc__mfpu_vfpv3" = x"yes"; then
+  CFLAGS_CRC="-march=armv8-a+crc -mfpu=vfpv3"
+  pgac_armv8_crc32c_intrinsics=yes
+fi
+
+  fi
 fi

 # Check for LoongArch CRC intrinsics to do CRC calculations.
diff --git a/configure.ac b/configure.ac
index 533f4ab78a..7bed68c7de 100644
--- a/configure.ac
+++ b/configure.ac
@@ -2093,11 +2093,16 @@ AC_COMPILE_IFELSE([AC_LANG_PROGRAM([], [
 # Check for ARMv8 CRC Extension intrinsics to do CRC calculations.
 #
 # First check if __crc32c* intrinsics can be used with the default compiler
-# flags. If not, check if adding -march=armv8-a+crc flag helps.
-# CFLAGS_CRC is set if the extra flag is required.
+# flags. If not, check if adding -march=armv8-a+crc flag helps. On systems
+# using hard-float ABI, it may be that "-march=armv8-a+crc" will be rejected
+# but "-march=armv8-a+crc -mfpu=vfpv3" will work, so next try that.
+# CFLAGS_CRC is set if extra flag(s) are required.
 PGAC_ARMV8_CRC32C_INTRINSICS([])
 if test x"$pgac_armv8_crc32c_intrinsics" != x"yes"; then
   PGAC_ARMV8_CRC32C_INTRINSICS([-march=armv8-a+crc])
+  if test x"$pgac_armv8_crc32c_intrinsics" != x"yes"; then
+    PGAC_ARMV8_CRC32C_INTRINSICS([-march=armv8-a+crc -mfpu=vfpv3])
+  fi
 fi

 # Check for LoongArch CRC intrinsics to do CRC calculations.
diff --git a/meson.build b/meson.build
index b64d253fe4..4bf730e40a 100644
--- a/meson.build
+++ b/meson.build
@@ -2294,6 +2294,13 @@ int main(void)
     cdata.set('USE_ARMV8_CRC32C', false)
     cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1)
     have_optimized_crc = true
+  elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd with -march=armv8-a+crc -mfpu=vfpv3',
+      args: test_c_args + ['-march=armv8-a+crc', '-mfpu=vfpv3'])
+    # Use ARM CRC Extension, with runtime check
+    cflags_crc += ['-march=armv8-a+crc', '-mfpu=vfpv3']
+    cdata.set('USE_ARMV8_CRC32C', false)
+    cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1)
+    have_optimized_crc = true
   endif

 elif host_cpu == 'loongarch64'

Re: Detection of hadware feature => please do not use signal

From
Thomas Munro
Date:
On Mon, Nov 25, 2024 at 9:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Anyway, experimentation shows that this gcc version gives the same
> "selected processor lacks an FPU" failure with "-march=armv8-a+crc".
> It will not take "-march=armv8-a+crc+fp", but it will take
> "-march=armv8-a+crc -mfpu=vfpv3"; the same holds for NetBSD 10.0's
> gcc 10.5.0.  Possibly other -mfpu settings will work (I recall having
> had success yesterday with -mfpu=neon on NetBSD), but I think this is
> a pretty generic setting: per the gcc docs it corresponds to typical
> armv7 hardware.

LGTM.

It might be slightly more logical to say vfpv4 if that also works,
since armv8 floats are based on vfpf4[1], but it hardly matters given
we're not actually using it...  [1] is also where you can read that
software implementation is still technically permitted in armv8
aarch32 state (but not aarch64), even though A profile chips without
it are apparently like unicorns.  Anyway, I can't find any evidence
that GCC knows that[2], or changed that deliberately any time
recently, or is out of sync with its own docs, or has local patches in
NetBSD's copy, so yeah, maybe it really is a bug.  There was some
recent-ish churn affecting -mfpu and -march interactions[3], maybe
relevant, or not, it beats me.

[1]
https://developer.arm.com/documentation/dui0801/l/Overview-of-the-Armv8-Architecture/Floating-point-hardware?lang=en
[2] https://github.com/gcc-mirror/gcc/blob/master/gcc/config/arm/arm-cpus.in#L583
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=28078



Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Mon, Nov 25, 2024 at 9:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Anyway, experimentation shows that this gcc version gives the same
>> "selected processor lacks an FPU" failure with "-march=armv8-a+crc".
>> It will not take "-march=armv8-a+crc+fp", but it will take
>> "-march=armv8-a+crc -mfpu=vfpv3"; the same holds for NetBSD 10.0's
>> gcc 10.5.0.  Possibly other -mfpu settings will work (I recall having
>> had success yesterday with -mfpu=neon on NetBSD), but I think this is
>> a pretty generic setting: per the gcc docs it corresponds to typical
>> armv7 hardware.

> LGTM.

> It might be slightly more logical to say vfpv4 if that also works,
> since armv8 floats are based on vfpf4[1], but it hardly matters given
> we're not actually using it...

Yeah, with no FP operations in the files that get compiled with these
flags, it probably doesn't matter.  But gcc 10.5.0 seems happy with
-mfpu=vfpv4, so I'll use that.

> [1] is also where you can read that
> software implementation is still technically permitted in armv8
> aarch32 state (but not aarch64), even though A profile chips without
> it are apparently like unicorns.  Anyway, I can't find any evidence
> that GCC knows that[2], or changed that deliberately any time
> recently, or is out of sync with its own docs, or has local patches in
> NetBSD's copy, so yeah, maybe it really is a bug.

It's very confusing, especially given this:

$ gcc -march=armv8-a+crc+fp testarm32.c
gcc: error: 'armv8-a' does not support feature 'fp'
gcc: note: valid feature names are: crc simd crypto nocrypto nofp sb predres; did you mean 'nofp'?

If they think that base armv8-a doesn't include FP, why is there a
+nofp option?  But whether it's a bug or somehow intentional, there's
an awful lot of copies of gcc with this behavior, so we need to cope.

I wonder whether "armv8-a" includes FP on aarch64 but not arm32, and
we only did the original testing of f044d71e3 on aarch64.  If so,
it might've been like this all along and we didn't notice.

> There was some
> recent-ish churn affecting -mfpu and -march interactions[3], maybe
> relevant, or not, it beats me.

That looks like incorrect assembly code being emitted.  Our
problem here is much further upstream, ie what compiler flags
are accepted.

            regards, tom lane



Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
I wrote:
> It's very confusing, especially given this:

> $ gcc -march=armv8-a+crc+fp testarm32.c
> gcc: error: 'armv8-a' does not support feature 'fp'
> gcc: note: valid feature names are: crc simd crypto nocrypto nofp sb predres; did you mean 'nofp'?

> If they think that base armv8-a doesn't include FP, why is there a
> +nofp option?

Bug filed at

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117766

We'll see what they say, but it seems like we need a workaround
in any case.

            regards, tom lane



Re: Detection of hadware feature => please do not use signal

From
Thomas Munro
Date:
On Mon, Nov 25, 2024 at 12:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wonder whether "armv8-a" includes FP on aarch64 but not arm32, and
> we only did the original testing of f044d71e3 on aarch64.  If so,
> it might've been like this all along and we didn't notice.

It does work on the "armv7" target BF animals, though.  I assume they
don't get any updates as they're all old/defunct distros (one of my
favourite features of our boneyard^Wbuildfarm), but one of them is GCC
"10.2.1 20210110", which is what made me suspicious of that -mfpu
stuff, beyond my comprehension without a lot more coffee, that got
backpatched in late 2021.



Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> It does work on the "armv7" target BF animals, though.  I assume they
> don't get any updates as they're all old/defunct distros (one of my
> favourite features of our boneyard^Wbuildfarm), but one of them is GCC
> "10.2.1 20210110", which is what made me suspicious of that -mfpu
> stuff, beyond my comprehension without a lot more coffee, that got
> backpatched in late 2021.

Ah, you're looking at mereswine, which is hard-float according to

checking build system type... armv7l-unknown-linux-gnueabihf
checking host system type... armv7l-unknown-linux-gnueabihf

I also confirmed that our existing code works on Fedora 30 armv7l
(also hard-float) with gcc 9.3.1.  That's the only other ARM32
image I have at hand :-(

So yeah, it would seem they broke it between 10.2.1 and 10.5.0.

            regards, tom lane



Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
I wrote:
> I also confirmed that our existing code works on Fedora 30 armv7l
> (also hard-float) with gcc 9.3.1.  That's the only other ARM32
> image I have at hand :-(

> So yeah, it would seem they broke it between 10.2.1 and 10.5.0.

After discussion with the gcc folk (see bug report linked upthread)
it emerges that there was indeed a change somewhere in that time
frame.  It used to be that if the platform defaults were
"-march=A -mfpu=X" then writing "-march=B" on the command line
would still leave you with "-mfpu=X", possibly a nonsensical
combination.  So now they ignore any build-time default, with
the effect that writing just "-march=armv8-a+crc" selects soft
float, which fails on platforms that require hard-float ABI.
(I didn't ask whether this was the same change prompted by the
other bug report you found, but perhaps it was.)

Also, there's no +fp option for "-march=armv8-a" because the
ARMv8 specs require chips to implement both or neither of FP
and SIMD.  So the idiomatic thing for us to write is not a
separate -mfpu spec but "-march=armv8-a+crc+simd".  I'll
adjust my patch to do it like that and push later today.

            regards, tom lane



Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
I wrote:
> Also, there's no +fp option for "-march=armv8-a" because the
> ARMv8 specs require chips to implement both or neither of FP
> and SIMD.  So the idiomatic thing for us to write is not a
> separate -mfpu spec but "-march=armv8-a+crc+simd".  I'll
> adjust my patch to do it like that and push later today.

And done.  In the event I decided to try the +simd option first,
reasoning that that's correct on the majority of current ARM systems.
I'm interested to see what the older buildfarm ARM animals pick.

            regards, tom lane



Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
I wrote:
> And done.  In the event I decided to try the +simd option first,
> reasoning that that's correct on the majority of current ARM systems.
> I'm interested to see what the older buildfarm ARM animals pick.

Early returns are that they're all picking the +simd option, and
seem to be just as happy with that as they were without it.
I wonder if we could just drop the "without" probe altogether.
It'd save some configure cycles for other platforms (since they'll
fruitlessly run through all of these tests).

            regards, tom lane



Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
I wrote:
> Early returns are that they're all picking the +simd option, and
> seem to be just as happy with that as they were without it.
> I wonder if we could just drop the "without" probe altogether.
> It'd save some configure cycles for other platforms (since they'll
> fruitlessly run through all of these tests).

Now that we have a nearly-complete census of the buildfarm,
the end result is that they are all happy with the +simd
spelling ... except for one.  "gull" rejects it:
  clang: error: unsupported argument 'armv8-a+crc+simd' to option '-march='
But without +simd it's okay, so removing the other spelling
would cause a performance regression on that machine.

I find this a bit surprising, because gull is using a reasonably
recent compiler (clang version 17.0.4).  We have much older
clang versions that accept +simd --- but they are all aarch64.
gull seems to be our only representative of clang on arm32.
So this seems like a gcc vs clang difference that we'll have
to put up with.

Anyway, I think this thread is now closed: nothing more to do
unless someone thinks of a platform we need more bespoke code
for.

            regards, tom lane