Thread: Re: Detection of hadware feature => please do not use signal
On 31/10/2024 17:41, Bastien Roucariès wrote: > Hi, > > On debian side we will like to avoid use of sigill to detect feature. > > https://sources.debian.org/src/postgresql-17/17.0-1/src/port/pg_crc32c_armv8_choose.c/#L55 > > is really bad. Why is it bad? > Could you use the canonical way under linux > > #include <sys/auxv.h> > #include <asm/hwcap.h> > > #if defined(__aarch64__) > return !(getauxval(AT_HWCAP) & HWCAP_CRC32); > #else > return !(getauxval(AT_HWCAP2) & HWCAP2_CRC32); > #endif We used to do that, but it was changed in commit 1c72ec6f. Looking at the discussion that led to it, it's not clear to me why we switched [1]. Portablity to non-glibc systems maybe? If there's a handy, portable way to check CPU capabilities, I agree it seem less hacky than catching SIGILL... [1] https://www.postgresql.org/message-id/CAEepm%3D02Run-Pk3xyt%2BRV3p1N%2B7cKZxN95_MamaJw8Cnw%2BDwjQ%40mail.gmail.com -- Heikki Linnakangas Neon (https://neon.tech)
Heikki Linnakangas <hlinnaka@iki.fi> writes: > On 31/10/2024 17:41, Bastien Roucariès wrote: >> Could you use the canonical way under linux > We used to do that, but it was changed in commit 1c72ec6f. Looking at > the discussion that led to it, it's not clear to me why we switched [1]. The complaint was precisely that it would not work under non-linux. Maybe we could do it one way on linux and the other way everywhere else, but I do not find that attractive; it just makes it harder to get full test coverage. regards, tom lane
On 31/10/2024 19:48, Tom Lane wrote: > Heikki Linnakangas <hlinnaka@iki.fi> writes: >> On 31/10/2024 17:41, Bastien Roucariès wrote: >>> Could you use the canonical way under linux > >> We used to do that, but it was changed in commit 1c72ec6f. Looking at >> the discussion that led to it, it's not clear to me why we switched [1]. > > The complaint was precisely that it would not work under non-linux. > > Maybe we could do it one way on linux and the other way everywhere > else, but I do not find that attractive; it just makes it harder > to get full test coverage. Bastien, would you happen to know how other portable libraries that need to run on other operating system do this? We added the SIGILL probe back in 2018, there might be better options available now. -- Heikki Linnakangas Neon (https://neon.tech)
Heikki Linnakangas <hlinnaka@iki.fi> writes: > On 31/10/2024 19:48, Tom Lane wrote: >> Maybe we could do it one way on linux and the other way everywhere >> else, but I do not find that attractive; it just makes it harder >> to get full test coverage. > Bastien, would you happen to know how other portable libraries that need > to run on other operating system do this? We added the SIGILL probe back > in 2018, there might be better options available now. It occurs to me to wonder whether the existing code works on Windows. Windows-on-ARM wasn't a thing we thought about in 2018, but it's a reasonable target now. regards, tom lane
Bastien =?ISO-8859-1?Q?Roucari=E8s?= <rouca@debian.org> writes: > Le jeudi 31 octobre 2024, 17:29:45 UTC Heikki Linnakangas a écrit : >> Why is it bad? > for a binary it is maybe safe for a library it is unsafe in Multi thread context to capture signal... [ shrug... ] If we're concerned about that, we should hardly be less concerned on other platforms than we are on Linux. (AFAIK, we do not execute the code in question in any libraries.) regards, tom lane
On Fri, Nov 1, 2024 at 7:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > It occurs to me to wonder whether the existing code works on Windows. > Windows-on-ARM wasn't a thing we thought about in 2018, but it's > a reasonable target now. I looked into that[1] and decided that I was going to ignore it completely, because: * Windows defines a dummy SIGILL signal number as required by the C standard, so stuff like this compiles, but EXCEPTION_ILLEGAL_INSTRUCTION isn't connected up to it and you'd just crash instead, but even if it were... * we don't install a native signal handler anyway, just one of our fake ones (that I would love to get rid of) * there is also a native API to check CPU features, but I don't see the point in even thinking about it for now, because... * Windows 11 effectively requires ARMv8.1-A to boot, and that's what comes installed on current machines you can buy (this is expressed differently officially, by saying that LSE atomics are required and by listing the supported CPUs by model not ISA, and the list of CPUs that are no longer supported can also be found online, so it's fairly clear; there are also fun investigative reports from the time when RPI4s and other never-officially-supported systems could suddenly no longer boot after some update, as they would croak on LSE instructions) * if someone is using Windows 10 for ARM (which I gather is getting harder to obtain by now) on an old enough low power laptop that lacks the instruction (and note that most ARMv8-A chips *did* have that instruction as an optional extra anyway, they just didn't have LSE), well then maybe it's an issue but I think for a new platform where we are so short of developers that we haven't even got past other really basic starter problems after a couple of years of talking about it, I think we should focus only on current systems instead of allowing even 1 nanosecond to be wasted on retro-computing topics... and in any case that hypothetical user is just about out of time because... * Windows 10's announced EOL (no more updates/support, full screen warning messages if past EOLs are anything to go by) coincides approximately with PostgreSQL 18's expected release (Huh, thinking about other synchronous signals, don't we also fail to install a native SIGFPE handler on Windows, that would convert eg div-by-zero in C code into an ereport on Unix?) Anyway I don't mind getting rid of the SIGILL stuff as long as the new coding is tidy and cross-platform enough, if people are insisting. I'm not aware that there's anything technically wrong with it on any Unix system, but I agree that it's not beautiful code. FreeBSD was my motivation at the time, but it has since gained elf_aux_info(), and OpenBSD too, and I can help write/test that part if we go that way. Or maybe we could just take some fragments of OpenSSL or libzlma or whatever code if the licensing aspect is OK. As for the other ARM platforms in our universe: Unfortunately it looks like NetBSD doesn't put AT_HWCAP into its auxv[], or even expose it to user space nicely, and those libraries don't seem to know of another way. Hopefully NetBSD will align with those other systems for portability's sake. The only other way I could find in a quick googling session is /usr/sbin/cpuctl, root only, no cigar (but a solid clue that the information is floating around somewhere, it just depends where exactly the root-only fencing is happening). Even if a way can be found, it'd be better to be able to do it the same way as other systems that we are actually testing if at all possible. Put that way, the build farm absence is a sort of vote for just not even trying to detect it on NetBSD. A NetBSD user who really wants to access the feature could still compile with -march=<new thing>, supply a build farm animal and a patch, or preferrably help get a compatible elf_auxv_info(AT_HWCAP) merged into NetBSD. That'd be my vote if we switch to auxv probing on the other systems. Macs don't do ELF or auxv, but there's a sysctl. I think it must always report true in practice. The first M1 was ARMv8.4-A. (There are old non-computer Apple devices that used ARMv8-A still in the wild, which is interesting to me only because it explains the recent invention of -moutline-atomics, about which more soon hopefully, to get faster lwlocks etc on our -march=armv8-a builds as shipped by Debian et al.) So I think we could skip the palaver and just hard-code the knowledge that Macs can do this stuff. (Archeological note: the reason several systems have the same auxv[] concept, ie a table of parameters that exec*() sets up alongside argv[] and environ[] to communicate stuff about memory layout etc primarily to libc, and the reason they even agree on some of the parameter names, is that it came from SVR4 with ELF. I think Sun's original version got AT_HWCAP from the ELF binary after selecting from several object variants that were compiled to match various CPU feature sets, as a way of shipping fat binaries that go faster on the right hardware, while these modern systems are just passing on whatever the CPU reports under the same hijacked name; it's related, but different... Anyway they needed some way for the kernel to give the features to user space on ARM, because its register that is equivalent to x86's CPUID can't be accessed from user space's privilege level and libc itself would like to be able to use some fancy features. Even for non-ARM architectures it's nice not to have to break out the assembler in user programs that want to do the same sorts of tricks. Amazingly, illumos can apparently run on ARM now so we might in theory encounter the old meaning of AT_HWCAP, but that's quite a hypothetical unicorn so I'm filing the thought under archeology and hiding it in parentheses.) Note that Andres recently wondered out loud[2] if CRC32 might be fundamentally the wrong tool for the job in a related thread, so perhaps this will all become moot if someone does the research and replaces it, but that's vapourware for now... [1] https://www.postgresql.org/message-id/CA%2BhUKGJ2B5rAGUncAob%3DChutCT%3Dfx0Ot7kwvio5cB7NpOGKG1Q%40mail.gmail.com [2] https://www.postgresql.org/message-id/flat/20240612193746.rjeiip4hcamjedgo%40awork3.anarazel.de#ab383730597411817c69516ec6c1a65c
Thomas Munro <thomas.munro@gmail.com> writes: > On Fri, Nov 1, 2024 at 7:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It occurs to me to wonder whether the existing code works on Windows. >> Windows-on-ARM wasn't a thing we thought about in 2018, but it's >> a reasonable target now. > I looked into that[1] and decided that I was going to ignore it > completely, because: > [ oodles o' details ] Hmm. So it seems like we could do this: * Linux, FreeBSD, OpenBSD: do run-time test as recommended * Windows, macOS: assume that supported ARM hardware can do this * anything else: assume no hardware CRC The only thing I'd be even a bit sad about there is not having NetBSD support. Maybe we just need to research that a bit more, or maybe we have to wait for/lobby for them to do what the other BSDen have done. But in any case I'm not seeing NetBSD-on-ARM as a platform that's important enough to block movement on this, let alone anything else in the "anything else" category. An alternative to "assume no hardware CRC" could be * anything else: use the existing SIGILL test But I do sympathize with Bastien's concerns about that. regards, tom lane
I wrote: > Hmm. So it seems like we could do this: > * Linux, FreeBSD, OpenBSD: do run-time test as recommended > * Windows, macOS: assume that supported ARM hardware can do this > * anything else: assume no hardware CRC Actually, after chewing on that second point awhile longer, how about this modest proposal: * Drop all code for run-time determination of ARM CRC support. Assume it's there unless user builds with a -march option that says it definitely isn't. Realistically, exactly who is going to be running Postgres 18+ on ARM hardware that lacks CRC support? I can think of lots of projects that are more worthy of our time than this. (Perhaps it's time to apply the same mindset to x86 CRC support too?) regards, tom lane
On Fri, Nov 1, 2024 at 7:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: > > Hmm. So it seems like we could do this: > > * Linux, FreeBSD, OpenBSD: do run-time test as recommended > > * Windows, macOS: assume that supported ARM hardware can do this > > * anything else: assume no hardware CRC > > Actually, after chewing on that second point awhile longer, > how about this modest proposal: > > * Drop all code for run-time determination of ARM CRC support. > Assume it's there unless user builds with a -march option that > says it definitely isn't. For CRC32, that would work (and already does via configure tests) for macOS (I'd forgotten about that in my earlier message, Macs never even compile the CRC32 "choose" code because the system compiler targets M1 by default), recent RHEL (see below), and presumably others, but it would penalise Debian, FreeBSD (using the standard binary package repo) and others if we didn't have the runtime check due to their conservative choice of baseline arch (as Bastien just said about Debian in a crossed email). I've been wondering about what to do about all this for a while... There are also other features we aren't using yet, but should, or single instructions that we are calling through function pointers, preventing automatic vectorisation etc. > Realistically, exactly who is going to be running Postgres 18+ > on ARM hardware that lacks CRC support? I can think of lots of > projects that are more worthy of our time than this. > > (Perhaps it's time to apply the same mindset to x86 CRC support > too?) I think it's quite interesting that RHEL has moved its baseline to x86_86-v2, having recently worked with Intel and AMD to standardise new levels "-vN" that group and "linearise" the feature sets, and others were already talking about standarding on -v3 last time I looked, while Debian still targets x86_86 which means the instruction set of the first AMD K8 chip from 2003. Debian has a nice wiki page (see thread ↓) explaining various workaround strategies, ranging from runtime tests like ours, per-sub-architecture library selection, through to (last resort) depending on pseudo-packages like "x86-64-v2-support" so that your package becomes uninstallable on ancient hardware. I'm not involved in packaging but I was wondering out loud whether the PGDG repos might potentially consider making a different choice than the official Debian repos, always or as an option, or ... various other ideas proposed by others. Perhaps we could continue the topic on that thread: https://www.postgresql.org/message-id/flat/CA%2BhUKGKS64zJezV9y9mPcB-J0i%2BfLGiv3FAdwSH_3SCaVdrjyQ%40mail.gmail.com In the meantime I'll see about the Linux/*BSD patch for this thing, more tomorrow hopefully. And I might see if I can find a NetBSD person to ask...
On Sat, Nov 02, 2024 at 06:11:33PM +1300, Thomas Munro wrote: > On Fri, Nov 1, 2024 at 7:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Actually, after chewing on that second point awhile longer, >> how about this modest proposal: >> >> * Drop all code for run-time determination of ARM CRC support. >> Assume it's there unless user builds with a -march option that >> says it definitely isn't. > > For CRC32, that would work (and already does via configure tests) for > macOS (I'd forgotten about that in my earlier message, Macs never even > compile the CRC32 "choose" code because the system compiler targets M1 > by default), recent RHEL (see below), and presumably others, but it > would penalise Debian, FreeBSD (using the standard binary package > repo) and others if we didn't have the runtime check due to their > conservative choice of baseline arch (as Bastien just said about > Debian in a crossed email). I've been wondering about what to do > about all this for a while... There are also other features we aren't > using yet, but should, or single instructions that we are calling > through function pointers, preventing automatic vectorisation etc. One small improvement in this area that I'm working on (and about to commit for the AVX-512 stuff) is using __attribute__((target(...))) instead of special -march options for specific files [0]. Among other things, this simplifies the configure checks and eliminates the need to put chunks of code in separate files. However, I found that until recently, it wasn't possible to include arm_acle.h without the -march flag [1], which means we likely won't be able to make the switch for the ARM CRC stuff for a few more years. More to the point, until the baseline moves, I think we'll end up adding checks in header files so that we only take the function pointer overhead when it's likely to be worth it (e.g., pg_popcount() in pg_bitutils.h). It might be worth doing that for the CRC-32C code, too, since we are still quite reliant on the runtime check. [0] https://commitfest.postgresql.org/50/5305/ [1] https://postgr.es/m/ZwXsE0KG_wh6_heU%40nathan -- nathan
On Fri, Nov 8, 2024 at 4:28 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > One small improvement in this area that I'm working on (and about to commit > for the AVX-512 stuff) is using __attribute__((target(...))) instead of > special -march options for specific files [0]. (Catching up with that stuff). Neat. I tried sprinkling __attribute__((target_clones("default,lse"))) in front of some functions like LWLockAquire() just to see if this multifunction stuff worked enough yet on ARM. FreeBSD 15/clang 18: yes, I got LSE CAS-based lwlocks with loader-time feature detection! Debian 12/gcc 12: no, it fails with "error: target does not support function version dispatcher". I think maybe gcc 14 will do it (from quick look at release notes)? I realise that's yet a different issue than the <arm_acle.h> visibility thing you mentioned for the CRC32 builtins. Anyway, it's apparently still too soon to help on all systems with this specific topic as you already said, so the explicit auxv check still looks like the only reasonable way for now.
Thomas Munro <thomas.munro@gmail.com> writes: > Unfortunately it looks like NetBSD doesn't put AT_HWCAP into its > auxv[], or even expose it to user space nicely, and those libraries > don't seem to know of another way. Hopefully NetBSD will align with > those other systems for portability's sake. The only other way I > could find in a quick googling session is /usr/sbin/cpuctl, root only, > no cigar (but a solid clue that the information is floating around > somewhere, it just depends where exactly the root-only fencing is > happening). I poked into this with NetBSD and found that cpuctl's "identify" option works just fine without root. So that motivated me to dig into its source code, and I end up with the attached. Sadly it only works in 64-bit; there is presumably a way to detect this in 32-bit as well, but cpuctl doesn't know it. That doesn't bother me a whole lot though, at least not nearly as much as the 64-bit case. Anybody running PG on 32-bit ARM shouldn't be expecting great performance. I've checked this on NetBSD 9.2 and 10.0. I don't have hardware that should return false (and there may not be any such 64-bit hardware anyway...), so there's not much more I can test. regards, tom lane diff --git a/src/port/pg_crc32c_armv8_choose.c b/src/port/pg_crc32c_armv8_choose.c index 306500154e..f035ac9855 100644 --- a/src/port/pg_crc32c_armv8_choose.c +++ b/src/port/pg_crc32c_armv8_choose.c @@ -31,6 +31,11 @@ #endif #endif +#if defined(__NetBSD__) && defined(__aarch64__) +#include <sys/sysctl.h> +#include <aarch64/armreg.h> +#endif + #include "port/pg_crc32c.h" static bool @@ -52,6 +57,42 @@ pg_crc32c_armv8_available(void) #else return (getauxval(AT_HWCAP2) & HWCAP2_CRC32) != 0; #endif +#elif defined(__NetBSD__) && defined(__aarch64__) + /* + * On NetBSD we can read AArch64 Instruction Set Attribute Register 0 via + * sysctl. We could probably do something similar on arm32 as well, but + * it's undocumented. Note we assume cpu0 is representative of all the + * machine's CPUs. + */ + const char *path = "machdep.cpu0.cpu_id"; + size_t len; +#define SYSCTL_CPU_ID_MAXSIZE 64 + uint64 sysctlbuf[SYSCTL_CPU_ID_MAXSIZE]; + struct aarch64_sysctl_cpu_id *id = + (struct aarch64_sysctl_cpu_id *) sysctlbuf; + uint64 reg; + uint64 fld; + + len = sizeof(sysctlbuf); + memset(sysctlbuf, 0, len); + if (sysctlbyname(path, id, &len, NULL, 0) != 0) + return false; + if (len != sizeof(struct aarch64_sysctl_cpu_id)) + return false; /* kernel incompatibility? */ + +#define ISAR0_CRC32_BITPOS 16 +#define ISAR0_CRC32_BITWIDTH 4 +#define WIDTHMASK(w) ((((uint64) 1) << (w)) - 1) + + reg = id->ac_aa64isar0; + fld = (reg >> ISAR0_CRC32_BITPOS) & WIDTHMASK(ISAR0_CRC32_BITWIDTH); + + /* + * Available documentation defines only the field values 0 (No CRC32) and + * 1 (CRC32B/CRC32H/CRC32W/CRC32X/CRC32CB/CRC32CH/CRC32CW/CRC32CX). Assume + * that any future nonzero value will be a superset of 1. + */ + return (fld != 0); #else return false; #endif
On Sat, Nov 23, 2024 at 2:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I poked into this with NetBSD and found that cpuctl's "identify" > option works just fine without root. So that motivated me to dig > into its source code, and I end up with the attached. Sadly it only > works in 64-bit; there is presumably a way to detect this in 32-bit > as well, but cpuctl doesn't know it. That doesn't bother me a whole > lot though, at least not nearly as much as the 64-bit case. Anybody > running PG on 32-bit ARM shouldn't be expecting great performance. (Nerd sniped) No NetBSD here but think a 32 bit kernel should expose sysctl machdep.id_isar, which should be an array of 6 ints, and bits 16-19 of id_isar[5] should give you the answer: https://github.com/NetBSD/src/blob/312631d82871c81540b81abea3baa09313478bd3/sys/arch/arm/arm32/arm32_machdep.c#L142 https://github.com/NetBSD/src/blob/312631d82871c81540b81abea3baa09313478bd3/sys/arch/arm/arm32/cpu.c#L869 https://developer.arm.com/documentation/ddi0601/2024-09/AArch32-Registers/ID-ISAR5--Instruction-Set-Attribute-Register-5?lang=en A real 32 bit armv7 or similar like RPI2 must say no. An armv8+ like RPI3+ running a 32 bit kernel (ie a 64 bit chip running in aarch32 mode) should say yes. With a 64 bit kernel running a 32 bit process, I think your code should work, except that it wouldn't be used because it's gated on __aarch64__. So there are combinations to consider. However, beyond satisfying curiosity about how it works, I agree that it's all a complete waste of time and what you have is perfectly fine. (I wonder why there isn't a sysctl that gives you the id_aa64isar0_el1 register more directly, instead of having to slurp it out of struct aarch64_sysctl_cpu_id. I mean, why expose similar "isar" registers differently for aarch32 and aarch64 kernels... but OK.) +#define ISAR0_CRC32_BITPOS 16 +#define ISAR0_CRC32_BITWIDTH 4 +#define WIDTHMASK(w) ((((uint64) 1) << (w)) - 1) + + reg = id->ac_aa64isar0; + fld = (reg >> ISAR0_CRC32_BITPOS) & WIDTHMASK(ISAR0_CRC32_BITWIDTH); + + /* + * Available documentation defines only the field values 0 (No CRC32) and + * 1 (CRC32B/CRC32H/CRC32W/CRC32X/CRC32CB/CRC32CH/CRC32CW/CRC32CX). Assume + * that any future nonzero value will be a superset of 1. + */ + return (fld != 0); Since you included <aarch64/armreg.h>, and only care about non-zero, couldn't you use the mask from the header, instead of all that bitswizzling? return (id->ac_aa64isar0 & ID_AA64ISAR0_EL1_CRC32) != 0; > I've checked this on NetBSD 9.2 and 10.0. I don't have hardware > that should return false (and there may not be any such 64-bit > hardware anyway...), so there's not much more I can test. I wish qemu supported a cpu that didn't have that...
Thomas Munro <thomas.munro@gmail.com> writes: > (Nerd sniped) No NetBSD here but think a 32 bit kernel should expose > sysctl machdep.id_isar, which should be an array of 6 ints, and bits > 16-19 of id_isar[5] should give you the answer: Huh. cpuctl's arm32 module knows of machdep.id_isar, but it does not appear to know that that word exists: http://cvsweb.netbsd.org/bsdweb.cgi/src/usr.sbin/cpuctl/arch/arm.c?rev=1.6;content-type=text%2Fplain But that developer.arm.com page seems pretty definitive. Tomorrow I'll have a go at spinning up a 32-bit NetBSD installation and test it. Thanks for the research! > Since you included <aarch64/armreg.h>, and only care about non-zero, > couldn't you use the mask from the header, instead of all that > bitswizzling? > return (id->ac_aa64isar0 & ID_AA64ISAR0_EL1_CRC32) != 0; I didn't actually look into that header ;-) ... I was just basing this on what cpuctl's code does. That version does look cleaner, at least for aarch64 --- but given that the field is defined to be in the same place in the relevant register for each arch, I'm a bit inclined to keep the bit-swizzling and just have two paths for getting the register contents. Unless maybe that header is also installed in arm32 builds, in which case we could do it as you show for both arches. I'll find out mañana. regards, tom lane
I wrote: > But that developer.arm.com page seems pretty definitive. Tomorrow > I'll have a go at spinning up a 32-bit NetBSD installation and test > it. Thanks for the research! Conclusions: 1. Unsurprisingly, a 64-bit kernel exposes machdep.cpuN.cpu_id but not machdep.id_isar; a 32-bit kernel the reverse. 2. 32-bit userland does not provide <aarch64/armreg.h>. (So I did not try to use the ID_AA64ISAR0_EL1_CRC32 macro: we'd have had to hard-code the field position anyway for 32-bit.) 3. It doesn't look to me like NetBSD really supports 32-bit userland under a 64-bit kernel. Maybe it'd kind of work, but utilities like cpuctl seem to be built for only one of the kernel APIs. So I end up proposing the attached. If we do somehow find ourselves running 32-bit with a 64-bit kernel, this will report "no CRC" because the sysctl call will fail. I don't think it's worth sweating over that case. BTW, I found out that by default we'll build a software-CRC-only implementation on 32-bit ARM NetBSD, at least in the "generic" eabihf build: checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS=... no checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS=-march=armv8-a+crc... no checking which CRC-32C implementation to use... slicing-by-8 It turns out that -march=armv8-a+crc would work, except: configure:17624: checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS=-march=armv8-a+crc configure:17648: ccache gcc -o conftest -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla-Wendif-labels -Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local-Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation-Wno-stringop-truncation -g -O2 -march=armv8-a+crc -Wl,-z,now -Wl,-z,relro conftest.c -lz -ledit-lcurses -lpthread -lexecinfo -lrt -lm >&5 cc1: error: '-mfloat-abi=hard': selected processor lacks an FPU If you force it with a compatible -mfpu switch (I used -mfpu=neon) then you get a build with runtime CRC check, and all seems to work. I did not try very hard to determine what the default -mfpu is here or why -march=armv8-a+crc overrides whatever the default is. This probably means that no end user is going to see any benefit from this exercise, because they are not likely to try adding -mfpu by itself. If they add "-march=armv8-a+crc -mfpu=whatever" to CFLAGS manually then configure will deem that a runtime check is not required, so this code still doesn't get exercised. tl;dr: a lot of time spent on a probably-useless exercise. Still, now that we've got the code we may as well commit it. The configure issue was there all along. regards, tom lane diff --git a/src/port/pg_crc32c_armv8_choose.c b/src/port/pg_crc32c_armv8_choose.c index 306500154e..11e27978fd 100644 --- a/src/port/pg_crc32c_armv8_choose.c +++ b/src/port/pg_crc32c_armv8_choose.c @@ -31,6 +31,13 @@ #endif #endif +#if defined(__NetBSD__) +#include <sys/sysctl.h> +#if defined(__aarch64__) +#include <aarch64/armreg.h> +#endif +#endif + #include "port/pg_crc32c.h" static bool @@ -52,6 +59,47 @@ pg_crc32c_armv8_available(void) #else return (getauxval(AT_HWCAP2) & HWCAP2_CRC32) != 0; #endif +#elif defined(__NetBSD__) + /* + * On NetBSD we can read the Instruction Set Attribute Registers via + * sysctl. For doubtless-historical reasons the sysctl interface is + * completely different on 64-bit than 32-bit, but the underlying + * registers contain the same fields. + */ +#define ISAR0_CRC32_BITPOS 16 +#define ISAR0_CRC32_BITWIDTH 4 +#define WIDTHMASK(w) ((1 << (w)) - 1) +#define SYSCTL_CPU_ID_MAXSIZE 64 + + size_t len; + uint64 sysctlbuf[SYSCTL_CPU_ID_MAXSIZE]; +#if defined(__aarch64__) + /* We assume cpu0 is representative of all the machine's CPUs. */ + const char *path = "machdep.cpu0.cpu_id"; + size_t expected_len = sizeof(struct aarch64_sysctl_cpu_id); +#define ISAR0 ((struct aarch64_sysctl_cpu_id *) sysctlbuf)->ac_aa64isar0 +#else + const char *path = "machdep.id_isar"; + size_t expected_len = 6 * sizeof(int); +#define ISAR0 ((int *) sysctlbuf)[5] +#endif + uint64 fld; + + len = sizeof(sysctlbuf); + memset(sysctlbuf, 0, len); + if (sysctlbyname(path, sysctlbuf, &len, NULL, 0) != 0) + return false; /* perhaps kernel is 64-bit and we aren't? */ + if (len != expected_len) + return false; /* kernel API change? */ + + fld = (ISAR0 >> ISAR0_CRC32_BITPOS) & WIDTHMASK(ISAR0_CRC32_BITWIDTH); + + /* + * Current documentation defines only the field values 0 (No CRC32) and 1 + * (CRC32B/CRC32H/CRC32W/CRC32X/CRC32CB/CRC32CH/CRC32CW/CRC32CX). Assume + * that any future nonzero value will be a superset of 1. + */ + return (fld != 0); #else return false; #endif
On Sun, Nov 24, 2024 at 11:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > cc1: error: '-mfloat-abi=hard': selected processor lacks an FPU Interesting. Happens also on Linux starting with GCC 11, apparently. Here's one with some clues: https://github.com/checkpoint-restore/criu/issues/1653#issuecomment-986034428 We have a few Linux armv7 animals in the farm, but using GCC 10 and older so not seeing it yet.
Thomas Munro <thomas.munro@gmail.com> writes: > On Sun, Nov 24, 2024 at 11:02 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> cc1: error: '-mfloat-abi=hard': selected processor lacks an FPU > Interesting. Happens also on Linux starting with GCC 11, apparently. > Here's one with some clues: > https://github.com/checkpoint-restore/criu/issues/1653#issuecomment-986034428 Oh, very interesting! Maybe we need to try -march=armv8-a+crc+fp (or some spelling like that) if -march=armv8-a+crc doesn't work? regards, tom lane
On Sun, Nov 24, 2024 at 1:09 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Oh, very interesting! Maybe we need to try -march=armv8-a+crc+fp > (or some spelling like that) if -march=armv8-a+crc doesn't work? Hmm, they were talking about armv7-a, not armv8-a, but it seems related... does it work? The GCC docs give only +nofp, not +fp as an option for 32 bit armv8-a (that is, under "ARM options", as opposed to "aarch64 options"). But I guess GCC is pretty configurable and NetBSD targets some weird hardware... as for why it could be like that (if not a mistake), it looks like it might be technically possible for armv8 (including -a) to lack FP in aarch32 state (unlike aarch64 state), but I went looking for real existing 32-bit only armv8 chips lacking FP and found only stuff like Cortex M23, and they're using armv8-m (embedded profile, lots of other stuff missing or optional). The weakest 32 bit-only armv8-a chip I could find in a quick search was the Cortex A32 but it looks like it has FP. I don't know enough about any of this stuff to guess what's going on here.
Thomas Munro <thomas.munro@gmail.com> writes: > The GCC docs give only +nofp, not +fp as an option for 32 bit armv8-a > (that is, under "ARM options", as opposed to "aarch64 options"). But > I guess GCC is pretty configurable and NetBSD targets some weird > hardware... as for why it could be like that (if not a mistake), it > looks like it might be technically possible for armv8 (including -a) > to lack FP in aarch32 state (unlike aarch64 state), but I went looking > for real existing 32-bit only armv8 chips lacking FP and found only > stuff like Cortex M23, and they're using armv8-m (embedded profile, > lots of other stuff missing or optional). The weakest 32 bit-only > armv8-a chip I could find in a quick search was the Cortex A32 but it > looks like it has FP. I don't know enough about any of this stuff to > guess what's going on here. I'm confused too, because the gcc documentation is quite definite that -march=armv8-a assumes FP support by default. It seems like somebody broke that for no very good reason and didn't bother to update the docs either. Maybe it's actually a bug? Mainline Linux distros like Fedora and Debian seem to have dropped 32-bit ARM altogether, but there's still a build available from https://www.raspberrypi.com/software/operating-systems/ that says it is based on Debian bookworm. I installed that on my RPi4 and found that it has $ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/12/lto-wrapper Target: arm-linux-gnueabihf Configured with: ... --with-arch=armv6+fp --with-float=hard ... Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 12.2.0 (Raspbian 12.2.0-14+rpi1) So, just like NetBSD's "generic" ARM32 build, they are firmly in the hard-float camp. There may be builds out there that will still work on soft-float ARM, but they've got to be scarce ... and you'd have to wonder why anybody would be trying to run PG on such hardware anyway. Anyway, experimentation shows that this gcc version gives the same "selected processor lacks an FPU" failure with "-march=armv8-a+crc". It will not take "-march=armv8-a+crc+fp", but it will take "-march=armv8-a+crc -mfpu=vfpv3"; the same holds for NetBSD 10.0's gcc 10.5.0. Possibly other -mfpu settings will work (I recall having had success yesterday with -mfpu=neon on NetBSD), but I think this is a pretty generic setting: per the gcc docs it corresponds to typical armv7 hardware. I've checked the attached patch on both the raspberrypi system and NetBSD 10.0. regards, tom lane diff --git a/configure b/configure index 28719ed30c..d61a7e59dd 100755 --- a/configure +++ b/configure @@ -17578,8 +17578,10 @@ rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext # Check for ARMv8 CRC Extension intrinsics to do CRC calculations. # # First check if __crc32c* intrinsics can be used with the default compiler -# flags. If not, check if adding -march=armv8-a+crc flag helps. -# CFLAGS_CRC is set if the extra flag is required. +# flags. If not, check if adding -march=armv8-a+crc flag helps. On systems +# using hard-float ABI, it may be that "-march=armv8-a+crc" will be rejected +# but "-march=armv8-a+crc -mfpu=vfpv3" will work, so next try that. +# CFLAGS_CRC is set if extra flag(s) are required. { $as_echo "$as_me:${as_lineno-$LINENO}: checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS=" >&5 $as_echo_n "checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS=... " >&6; } if ${pgac_cv_armv8_crc32c_intrinsics_+:} false; then : @@ -17661,6 +17663,48 @@ if test x"$pgac_cv_armv8_crc32c_intrinsics__march_armv8_apcrc" = x"yes"; then pgac_armv8_crc32c_intrinsics=yes fi + if test x"$pgac_armv8_crc32c_intrinsics" != x"yes"; then + { $as_echo "$as_me:${as_lineno-$LINENO}: checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS=-march=armv8-a+crc-mfpu=vfpv3" >&5 +$as_echo_n "checking for __crc32cb, __crc32ch, __crc32cw, and __crc32cd with CFLAGS=-march=armv8-a+crc -mfpu=vfpv3... ">&6; } +if ${pgac_cv_armv8_crc32c_intrinsics__march_armv8_apcrc__mfpu_vfpv3+:} false; then : + $as_echo_n "(cached) " >&6 +else + pgac_save_CFLAGS=$CFLAGS +CFLAGS="$pgac_save_CFLAGS -march=armv8-a+crc -mfpu=vfpv3" +cat confdefs.h - <<_ACEOF >conftest.$ac_ext +/* end confdefs.h. */ +#include <arm_acle.h> +int +main () +{ +unsigned int crc = 0; + crc = __crc32cb(crc, 0); + crc = __crc32ch(crc, 0); + crc = __crc32cw(crc, 0); + crc = __crc32cd(crc, 0); + /* return computed value, to prevent the above being optimized away */ + return crc == 0; + ; + return 0; +} +_ACEOF +if ac_fn_c_try_link "$LINENO"; then : + pgac_cv_armv8_crc32c_intrinsics__march_armv8_apcrc__mfpu_vfpv3=yes +else + pgac_cv_armv8_crc32c_intrinsics__march_armv8_apcrc__mfpu_vfpv3=no +fi +rm -f core conftest.err conftest.$ac_objext \ + conftest$ac_exeext conftest.$ac_ext +CFLAGS="$pgac_save_CFLAGS" +fi +{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $pgac_cv_armv8_crc32c_intrinsics__march_armv8_apcrc__mfpu_vfpv3" >&5 +$as_echo "$pgac_cv_armv8_crc32c_intrinsics__march_armv8_apcrc__mfpu_vfpv3" >&6; } +if test x"$pgac_cv_armv8_crc32c_intrinsics__march_armv8_apcrc__mfpu_vfpv3" = x"yes"; then + CFLAGS_CRC="-march=armv8-a+crc -mfpu=vfpv3" + pgac_armv8_crc32c_intrinsics=yes +fi + + fi fi # Check for LoongArch CRC intrinsics to do CRC calculations. diff --git a/configure.ac b/configure.ac index 533f4ab78a..7bed68c7de 100644 --- a/configure.ac +++ b/configure.ac @@ -2093,11 +2093,16 @@ AC_COMPILE_IFELSE([AC_LANG_PROGRAM([], [ # Check for ARMv8 CRC Extension intrinsics to do CRC calculations. # # First check if __crc32c* intrinsics can be used with the default compiler -# flags. If not, check if adding -march=armv8-a+crc flag helps. -# CFLAGS_CRC is set if the extra flag is required. +# flags. If not, check if adding -march=armv8-a+crc flag helps. On systems +# using hard-float ABI, it may be that "-march=armv8-a+crc" will be rejected +# but "-march=armv8-a+crc -mfpu=vfpv3" will work, so next try that. +# CFLAGS_CRC is set if extra flag(s) are required. PGAC_ARMV8_CRC32C_INTRINSICS([]) if test x"$pgac_armv8_crc32c_intrinsics" != x"yes"; then PGAC_ARMV8_CRC32C_INTRINSICS([-march=armv8-a+crc]) + if test x"$pgac_armv8_crc32c_intrinsics" != x"yes"; then + PGAC_ARMV8_CRC32C_INTRINSICS([-march=armv8-a+crc -mfpu=vfpv3]) + fi fi # Check for LoongArch CRC intrinsics to do CRC calculations. diff --git a/meson.build b/meson.build index b64d253fe4..4bf730e40a 100644 --- a/meson.build +++ b/meson.build @@ -2294,6 +2294,13 @@ int main(void) cdata.set('USE_ARMV8_CRC32C', false) cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) have_optimized_crc = true + elif cc.links(prog, name: '__crc32cb, __crc32ch, __crc32cw, and __crc32cd with -march=armv8-a+crc -mfpu=vfpv3', + args: test_c_args + ['-march=armv8-a+crc', '-mfpu=vfpv3']) + # Use ARM CRC Extension, with runtime check + cflags_crc += ['-march=armv8-a+crc', '-mfpu=vfpv3'] + cdata.set('USE_ARMV8_CRC32C', false) + cdata.set('USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 1) + have_optimized_crc = true endif elif host_cpu == 'loongarch64'
On Mon, Nov 25, 2024 at 9:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Anyway, experimentation shows that this gcc version gives the same > "selected processor lacks an FPU" failure with "-march=armv8-a+crc". > It will not take "-march=armv8-a+crc+fp", but it will take > "-march=armv8-a+crc -mfpu=vfpv3"; the same holds for NetBSD 10.0's > gcc 10.5.0. Possibly other -mfpu settings will work (I recall having > had success yesterday with -mfpu=neon on NetBSD), but I think this is > a pretty generic setting: per the gcc docs it corresponds to typical > armv7 hardware. LGTM. It might be slightly more logical to say vfpv4 if that also works, since armv8 floats are based on vfpf4[1], but it hardly matters given we're not actually using it... [1] is also where you can read that software implementation is still technically permitted in armv8 aarch32 state (but not aarch64), even though A profile chips without it are apparently like unicorns. Anyway, I can't find any evidence that GCC knows that[2], or changed that deliberately any time recently, or is out of sync with its own docs, or has local patches in NetBSD's copy, so yeah, maybe it really is a bug. There was some recent-ish churn affecting -mfpu and -march interactions[3], maybe relevant, or not, it beats me. [1] https://developer.arm.com/documentation/dui0801/l/Overview-of-the-Armv8-Architecture/Floating-point-hardware?lang=en [2] https://github.com/gcc-mirror/gcc/blob/master/gcc/config/arm/arm-cpus.in#L583 [3] https://sourceware.org/bugzilla/show_bug.cgi?id=28078
Thomas Munro <thomas.munro@gmail.com> writes: > On Mon, Nov 25, 2024 at 9:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Anyway, experimentation shows that this gcc version gives the same >> "selected processor lacks an FPU" failure with "-march=armv8-a+crc". >> It will not take "-march=armv8-a+crc+fp", but it will take >> "-march=armv8-a+crc -mfpu=vfpv3"; the same holds for NetBSD 10.0's >> gcc 10.5.0. Possibly other -mfpu settings will work (I recall having >> had success yesterday with -mfpu=neon on NetBSD), but I think this is >> a pretty generic setting: per the gcc docs it corresponds to typical >> armv7 hardware. > LGTM. > It might be slightly more logical to say vfpv4 if that also works, > since armv8 floats are based on vfpf4[1], but it hardly matters given > we're not actually using it... Yeah, with no FP operations in the files that get compiled with these flags, it probably doesn't matter. But gcc 10.5.0 seems happy with -mfpu=vfpv4, so I'll use that. > [1] is also where you can read that > software implementation is still technically permitted in armv8 > aarch32 state (but not aarch64), even though A profile chips without > it are apparently like unicorns. Anyway, I can't find any evidence > that GCC knows that[2], or changed that deliberately any time > recently, or is out of sync with its own docs, or has local patches in > NetBSD's copy, so yeah, maybe it really is a bug. It's very confusing, especially given this: $ gcc -march=armv8-a+crc+fp testarm32.c gcc: error: 'armv8-a' does not support feature 'fp' gcc: note: valid feature names are: crc simd crypto nocrypto nofp sb predres; did you mean 'nofp'? If they think that base armv8-a doesn't include FP, why is there a +nofp option? But whether it's a bug or somehow intentional, there's an awful lot of copies of gcc with this behavior, so we need to cope. I wonder whether "armv8-a" includes FP on aarch64 but not arm32, and we only did the original testing of f044d71e3 on aarch64. If so, it might've been like this all along and we didn't notice. > There was some > recent-ish churn affecting -mfpu and -march interactions[3], maybe > relevant, or not, it beats me. That looks like incorrect assembly code being emitted. Our problem here is much further upstream, ie what compiler flags are accepted. regards, tom lane
I wrote: > It's very confusing, especially given this: > $ gcc -march=armv8-a+crc+fp testarm32.c > gcc: error: 'armv8-a' does not support feature 'fp' > gcc: note: valid feature names are: crc simd crypto nocrypto nofp sb predres; did you mean 'nofp'? > If they think that base armv8-a doesn't include FP, why is there a > +nofp option? Bug filed at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117766 We'll see what they say, but it seems like we need a workaround in any case. regards, tom lane
On Mon, Nov 25, 2024 at 12:34 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wonder whether "armv8-a" includes FP on aarch64 but not arm32, and > we only did the original testing of f044d71e3 on aarch64. If so, > it might've been like this all along and we didn't notice. It does work on the "armv7" target BF animals, though. I assume they don't get any updates as they're all old/defunct distros (one of my favourite features of our boneyard^Wbuildfarm), but one of them is GCC "10.2.1 20210110", which is what made me suspicious of that -mfpu stuff, beyond my comprehension without a lot more coffee, that got backpatched in late 2021.
Thomas Munro <thomas.munro@gmail.com> writes: > It does work on the "armv7" target BF animals, though. I assume they > don't get any updates as they're all old/defunct distros (one of my > favourite features of our boneyard^Wbuildfarm), but one of them is GCC > "10.2.1 20210110", which is what made me suspicious of that -mfpu > stuff, beyond my comprehension without a lot more coffee, that got > backpatched in late 2021. Ah, you're looking at mereswine, which is hard-float according to checking build system type... armv7l-unknown-linux-gnueabihf checking host system type... armv7l-unknown-linux-gnueabihf I also confirmed that our existing code works on Fedora 30 armv7l (also hard-float) with gcc 9.3.1. That's the only other ARM32 image I have at hand :-( So yeah, it would seem they broke it between 10.2.1 and 10.5.0. regards, tom lane
I wrote: > I also confirmed that our existing code works on Fedora 30 armv7l > (also hard-float) with gcc 9.3.1. That's the only other ARM32 > image I have at hand :-( > So yeah, it would seem they broke it between 10.2.1 and 10.5.0. After discussion with the gcc folk (see bug report linked upthread) it emerges that there was indeed a change somewhere in that time frame. It used to be that if the platform defaults were "-march=A -mfpu=X" then writing "-march=B" on the command line would still leave you with "-mfpu=X", possibly a nonsensical combination. So now they ignore any build-time default, with the effect that writing just "-march=armv8-a+crc" selects soft float, which fails on platforms that require hard-float ABI. (I didn't ask whether this was the same change prompted by the other bug report you found, but perhaps it was.) Also, there's no +fp option for "-march=armv8-a" because the ARMv8 specs require chips to implement both or neither of FP and SIMD. So the idiomatic thing for us to write is not a separate -mfpu spec but "-march=armv8-a+crc+simd". I'll adjust my patch to do it like that and push later today. regards, tom lane
I wrote: > Also, there's no +fp option for "-march=armv8-a" because the > ARMv8 specs require chips to implement both or neither of FP > and SIMD. So the idiomatic thing for us to write is not a > separate -mfpu spec but "-march=armv8-a+crc+simd". I'll > adjust my patch to do it like that and push later today. And done. In the event I decided to try the +simd option first, reasoning that that's correct on the majority of current ARM systems. I'm interested to see what the older buildfarm ARM animals pick. regards, tom lane
I wrote: > And done. In the event I decided to try the +simd option first, > reasoning that that's correct on the majority of current ARM systems. > I'm interested to see what the older buildfarm ARM animals pick. Early returns are that they're all picking the +simd option, and seem to be just as happy with that as they were without it. I wonder if we could just drop the "without" probe altogether. It'd save some configure cycles for other platforms (since they'll fruitlessly run through all of these tests). regards, tom lane
I wrote: > Early returns are that they're all picking the +simd option, and > seem to be just as happy with that as they were without it. > I wonder if we could just drop the "without" probe altogether. > It'd save some configure cycles for other platforms (since they'll > fruitlessly run through all of these tests). Now that we have a nearly-complete census of the buildfarm, the end result is that they are all happy with the +simd spelling ... except for one. "gull" rejects it: clang: error: unsupported argument 'armv8-a+crc+simd' to option '-march=' But without +simd it's okay, so removing the other spelling would cause a performance regression on that machine. I find this a bit surprising, because gull is using a reasonably recent compiler (clang version 17.0.4). We have much older clang versions that accept +simd --- but they are all aarch64. gull seems to be our only representative of clang on arm32. So this seems like a gcc vs clang difference that we'll have to put up with. Anyway, I think this thread is now closed: nothing more to do unless someone thinks of a platform we need more bespoke code for. regards, tom lane