Thread: Re: Detection of hadware feature => please do not use signal
On 31/10/2024 17:41, Bastien Roucariès wrote: > Hi, > > On debian side we will like to avoid use of sigill to detect feature. > > https://sources.debian.org/src/postgresql-17/17.0-1/src/port/pg_crc32c_armv8_choose.c/#L55 > > is really bad. Why is it bad? > Could you use the canonical way under linux > > #include <sys/auxv.h> > #include <asm/hwcap.h> > > #if defined(__aarch64__) > return !(getauxval(AT_HWCAP) & HWCAP_CRC32); > #else > return !(getauxval(AT_HWCAP2) & HWCAP2_CRC32); > #endif We used to do that, but it was changed in commit 1c72ec6f. Looking at the discussion that led to it, it's not clear to me why we switched [1]. Portablity to non-glibc systems maybe? If there's a handy, portable way to check CPU capabilities, I agree it seem less hacky than catching SIGILL... [1] https://www.postgresql.org/message-id/CAEepm%3D02Run-Pk3xyt%2BRV3p1N%2B7cKZxN95_MamaJw8Cnw%2BDwjQ%40mail.gmail.com -- Heikki Linnakangas Neon (https://neon.tech)
Heikki Linnakangas <hlinnaka@iki.fi> writes: > On 31/10/2024 17:41, Bastien Roucariès wrote: >> Could you use the canonical way under linux > We used to do that, but it was changed in commit 1c72ec6f. Looking at > the discussion that led to it, it's not clear to me why we switched [1]. The complaint was precisely that it would not work under non-linux. Maybe we could do it one way on linux and the other way everywhere else, but I do not find that attractive; it just makes it harder to get full test coverage. regards, tom lane
On 31/10/2024 19:48, Tom Lane wrote: > Heikki Linnakangas <hlinnaka@iki.fi> writes: >> On 31/10/2024 17:41, Bastien Roucariès wrote: >>> Could you use the canonical way under linux > >> We used to do that, but it was changed in commit 1c72ec6f. Looking at >> the discussion that led to it, it's not clear to me why we switched [1]. > > The complaint was precisely that it would not work under non-linux. > > Maybe we could do it one way on linux and the other way everywhere > else, but I do not find that attractive; it just makes it harder > to get full test coverage. Bastien, would you happen to know how other portable libraries that need to run on other operating system do this? We added the SIGILL probe back in 2018, there might be better options available now. -- Heikki Linnakangas Neon (https://neon.tech)
Heikki Linnakangas <hlinnaka@iki.fi> writes: > On 31/10/2024 19:48, Tom Lane wrote: >> Maybe we could do it one way on linux and the other way everywhere >> else, but I do not find that attractive; it just makes it harder >> to get full test coverage. > Bastien, would you happen to know how other portable libraries that need > to run on other operating system do this? We added the SIGILL probe back > in 2018, there might be better options available now. It occurs to me to wonder whether the existing code works on Windows. Windows-on-ARM wasn't a thing we thought about in 2018, but it's a reasonable target now. regards, tom lane
Bastien =?ISO-8859-1?Q?Roucari=E8s?= <rouca@debian.org> writes: > Le jeudi 31 octobre 2024, 17:29:45 UTC Heikki Linnakangas a écrit : >> Why is it bad? > for a binary it is maybe safe for a library it is unsafe in Multi thread context to capture signal... [ shrug... ] If we're concerned about that, we should hardly be less concerned on other platforms than we are on Linux. (AFAIK, we do not execute the code in question in any libraries.) regards, tom lane
On Fri, Nov 1, 2024 at 7:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > It occurs to me to wonder whether the existing code works on Windows. > Windows-on-ARM wasn't a thing we thought about in 2018, but it's > a reasonable target now. I looked into that[1] and decided that I was going to ignore it completely, because: * Windows defines a dummy SIGILL signal number as required by the C standard, so stuff like this compiles, but EXCEPTION_ILLEGAL_INSTRUCTION isn't connected up to it and you'd just crash instead, but even if it were... * we don't install a native signal handler anyway, just one of our fake ones (that I would love to get rid of) * there is also a native API to check CPU features, but I don't see the point in even thinking about it for now, because... * Windows 11 effectively requires ARMv8.1-A to boot, and that's what comes installed on current machines you can buy (this is expressed differently officially, by saying that LSE atomics are required and by listing the supported CPUs by model not ISA, and the list of CPUs that are no longer supported can also be found online, so it's fairly clear; there are also fun investigative reports from the time when RPI4s and other never-officially-supported systems could suddenly no longer boot after some update, as they would croak on LSE instructions) * if someone is using Windows 10 for ARM (which I gather is getting harder to obtain by now) on an old enough low power laptop that lacks the instruction (and note that most ARMv8-A chips *did* have that instruction as an optional extra anyway, they just didn't have LSE), well then maybe it's an issue but I think for a new platform where we are so short of developers that we haven't even got past other really basic starter problems after a couple of years of talking about it, I think we should focus only on current systems instead of allowing even 1 nanosecond to be wasted on retro-computing topics... and in any case that hypothetical user is just about out of time because... * Windows 10's announced EOL (no more updates/support, full screen warning messages if past EOLs are anything to go by) coincides approximately with PostgreSQL 18's expected release (Huh, thinking about other synchronous signals, don't we also fail to install a native SIGFPE handler on Windows, that would convert eg div-by-zero in C code into an ereport on Unix?) Anyway I don't mind getting rid of the SIGILL stuff as long as the new coding is tidy and cross-platform enough, if people are insisting. I'm not aware that there's anything technically wrong with it on any Unix system, but I agree that it's not beautiful code. FreeBSD was my motivation at the time, but it has since gained elf_aux_info(), and OpenBSD too, and I can help write/test that part if we go that way. Or maybe we could just take some fragments of OpenSSL or libzlma or whatever code if the licensing aspect is OK. As for the other ARM platforms in our universe: Unfortunately it looks like NetBSD doesn't put AT_HWCAP into its auxv[], or even expose it to user space nicely, and those libraries don't seem to know of another way. Hopefully NetBSD will align with those other systems for portability's sake. The only other way I could find in a quick googling session is /usr/sbin/cpuctl, root only, no cigar (but a solid clue that the information is floating around somewhere, it just depends where exactly the root-only fencing is happening). Even if a way can be found, it'd be better to be able to do it the same way as other systems that we are actually testing if at all possible. Put that way, the build farm absence is a sort of vote for just not even trying to detect it on NetBSD. A NetBSD user who really wants to access the feature could still compile with -march=<new thing>, supply a build farm animal and a patch, or preferrably help get a compatible elf_auxv_info(AT_HWCAP) merged into NetBSD. That'd be my vote if we switch to auxv probing on the other systems. Macs don't do ELF or auxv, but there's a sysctl. I think it must always report true in practice. The first M1 was ARMv8.4-A. (There are old non-computer Apple devices that used ARMv8-A still in the wild, which is interesting to me only because it explains the recent invention of -moutline-atomics, about which more soon hopefully, to get faster lwlocks etc on our -march=armv8-a builds as shipped by Debian et al.) So I think we could skip the palaver and just hard-code the knowledge that Macs can do this stuff. (Archeological note: the reason several systems have the same auxv[] concept, ie a table of parameters that exec*() sets up alongside argv[] and environ[] to communicate stuff about memory layout etc primarily to libc, and the reason they even agree on some of the parameter names, is that it came from SVR4 with ELF. I think Sun's original version got AT_HWCAP from the ELF binary after selecting from several object variants that were compiled to match various CPU feature sets, as a way of shipping fat binaries that go faster on the right hardware, while these modern systems are just passing on whatever the CPU reports under the same hijacked name; it's related, but different... Anyway they needed some way for the kernel to give the features to user space on ARM, because its register that is equivalent to x86's CPUID can't be accessed from user space's privilege level and libc itself would like to be able to use some fancy features. Even for non-ARM architectures it's nice not to have to break out the assembler in user programs that want to do the same sorts of tricks. Amazingly, illumos can apparently run on ARM now so we might in theory encounter the old meaning of AT_HWCAP, but that's quite a hypothetical unicorn so I'm filing the thought under archeology and hiding it in parentheses.) Note that Andres recently wondered out loud[2] if CRC32 might be fundamentally the wrong tool for the job in a related thread, so perhaps this will all become moot if someone does the research and replaces it, but that's vapourware for now... [1] https://www.postgresql.org/message-id/CA%2BhUKGJ2B5rAGUncAob%3DChutCT%3Dfx0Ot7kwvio5cB7NpOGKG1Q%40mail.gmail.com [2] https://www.postgresql.org/message-id/flat/20240612193746.rjeiip4hcamjedgo%40awork3.anarazel.de#ab383730597411817c69516ec6c1a65c
Thomas Munro <thomas.munro@gmail.com> writes: > On Fri, Nov 1, 2024 at 7:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> It occurs to me to wonder whether the existing code works on Windows. >> Windows-on-ARM wasn't a thing we thought about in 2018, but it's >> a reasonable target now. > I looked into that[1] and decided that I was going to ignore it > completely, because: > [ oodles o' details ] Hmm. So it seems like we could do this: * Linux, FreeBSD, OpenBSD: do run-time test as recommended * Windows, macOS: assume that supported ARM hardware can do this * anything else: assume no hardware CRC The only thing I'd be even a bit sad about there is not having NetBSD support. Maybe we just need to research that a bit more, or maybe we have to wait for/lobby for them to do what the other BSDen have done. But in any case I'm not seeing NetBSD-on-ARM as a platform that's important enough to block movement on this, let alone anything else in the "anything else" category. An alternative to "assume no hardware CRC" could be * anything else: use the existing SIGILL test But I do sympathize with Bastien's concerns about that. regards, tom lane
I wrote: > Hmm. So it seems like we could do this: > * Linux, FreeBSD, OpenBSD: do run-time test as recommended > * Windows, macOS: assume that supported ARM hardware can do this > * anything else: assume no hardware CRC Actually, after chewing on that second point awhile longer, how about this modest proposal: * Drop all code for run-time determination of ARM CRC support. Assume it's there unless user builds with a -march option that says it definitely isn't. Realistically, exactly who is going to be running Postgres 18+ on ARM hardware that lacks CRC support? I can think of lots of projects that are more worthy of our time than this. (Perhaps it's time to apply the same mindset to x86 CRC support too?) regards, tom lane
On Fri, Nov 1, 2024 at 7:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: > > Hmm. So it seems like we could do this: > > * Linux, FreeBSD, OpenBSD: do run-time test as recommended > > * Windows, macOS: assume that supported ARM hardware can do this > > * anything else: assume no hardware CRC > > Actually, after chewing on that second point awhile longer, > how about this modest proposal: > > * Drop all code for run-time determination of ARM CRC support. > Assume it's there unless user builds with a -march option that > says it definitely isn't. For CRC32, that would work (and already does via configure tests) for macOS (I'd forgotten about that in my earlier message, Macs never even compile the CRC32 "choose" code because the system compiler targets M1 by default), recent RHEL (see below), and presumably others, but it would penalise Debian, FreeBSD (using the standard binary package repo) and others if we didn't have the runtime check due to their conservative choice of baseline arch (as Bastien just said about Debian in a crossed email). I've been wondering about what to do about all this for a while... There are also other features we aren't using yet, but should, or single instructions that we are calling through function pointers, preventing automatic vectorisation etc. > Realistically, exactly who is going to be running Postgres 18+ > on ARM hardware that lacks CRC support? I can think of lots of > projects that are more worthy of our time than this. > > (Perhaps it's time to apply the same mindset to x86 CRC support > too?) I think it's quite interesting that RHEL has moved its baseline to x86_86-v2, having recently worked with Intel and AMD to standardise new levels "-vN" that group and "linearise" the feature sets, and others were already talking about standarding on -v3 last time I looked, while Debian still targets x86_86 which means the instruction set of the first AMD K8 chip from 2003. Debian has a nice wiki page (see thread ↓) explaining various workaround strategies, ranging from runtime tests like ours, per-sub-architecture library selection, through to (last resort) depending on pseudo-packages like "x86-64-v2-support" so that your package becomes uninstallable on ancient hardware. I'm not involved in packaging but I was wondering out loud whether the PGDG repos might potentially consider making a different choice than the official Debian repos, always or as an option, or ... various other ideas proposed by others. Perhaps we could continue the topic on that thread: https://www.postgresql.org/message-id/flat/CA%2BhUKGKS64zJezV9y9mPcB-J0i%2BfLGiv3FAdwSH_3SCaVdrjyQ%40mail.gmail.com In the meantime I'll see about the Linux/*BSD patch for this thing, more tomorrow hopefully. And I might see if I can find a NetBSD person to ask...
On Sat, Nov 02, 2024 at 06:11:33PM +1300, Thomas Munro wrote: > On Fri, Nov 1, 2024 at 7:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Actually, after chewing on that second point awhile longer, >> how about this modest proposal: >> >> * Drop all code for run-time determination of ARM CRC support. >> Assume it's there unless user builds with a -march option that >> says it definitely isn't. > > For CRC32, that would work (and already does via configure tests) for > macOS (I'd forgotten about that in my earlier message, Macs never even > compile the CRC32 "choose" code because the system compiler targets M1 > by default), recent RHEL (see below), and presumably others, but it > would penalise Debian, FreeBSD (using the standard binary package > repo) and others if we didn't have the runtime check due to their > conservative choice of baseline arch (as Bastien just said about > Debian in a crossed email). I've been wondering about what to do > about all this for a while... There are also other features we aren't > using yet, but should, or single instructions that we are calling > through function pointers, preventing automatic vectorisation etc. One small improvement in this area that I'm working on (and about to commit for the AVX-512 stuff) is using __attribute__((target(...))) instead of special -march options for specific files [0]. Among other things, this simplifies the configure checks and eliminates the need to put chunks of code in separate files. However, I found that until recently, it wasn't possible to include arm_acle.h without the -march flag [1], which means we likely won't be able to make the switch for the ARM CRC stuff for a few more years. More to the point, until the baseline moves, I think we'll end up adding checks in header files so that we only take the function pointer overhead when it's likely to be worth it (e.g., pg_popcount() in pg_bitutils.h). It might be worth doing that for the CRC-32C code, too, since we are still quite reliant on the runtime check. [0] https://commitfest.postgresql.org/50/5305/ [1] https://postgr.es/m/ZwXsE0KG_wh6_heU%40nathan -- nathan
On Fri, Nov 8, 2024 at 4:28 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > One small improvement in this area that I'm working on (and about to commit > for the AVX-512 stuff) is using __attribute__((target(...))) instead of > special -march options for specific files [0]. (Catching up with that stuff). Neat. I tried sprinkling __attribute__((target_clones("default,lse"))) in front of some functions like LWLockAquire() just to see if this multifunction stuff worked enough yet on ARM. FreeBSD 15/clang 18: yes, I got LSE CAS-based lwlocks with loader-time feature detection! Debian 12/gcc 12: no, it fails with "error: target does not support function version dispatcher". I think maybe gcc 14 will do it (from quick look at release notes)? I realise that's yet a different issue than the <arm_acle.h> visibility thing you mentioned for the CRC32 builtins. Anyway, it's apparently still too soon to help on all systems with this specific topic as you already said, so the explicit auxv check still looks like the only reasonable way for now.