Re: Detection of hadware feature => please do not use signal - Mailing list pgsql-bugs
From | Thomas Munro |
---|---|
Subject | Re: Detection of hadware feature => please do not use signal |
Date | |
Msg-id | CA+hUKGLRXq0-s0n2=dy_A5iY-Omg628osyJEJMjPQ=6m566UgA@mail.gmail.com Whole thread Raw |
In response to | Re: Detection of hadware feature => please do not use signal (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Detection of hadware feature => please do not use signal
|
List | pgsql-bugs |
On Fri, Nov 1, 2024 at 7:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > It occurs to me to wonder whether the existing code works on Windows. > Windows-on-ARM wasn't a thing we thought about in 2018, but it's > a reasonable target now. I looked into that[1] and decided that I was going to ignore it completely, because: * Windows defines a dummy SIGILL signal number as required by the C standard, so stuff like this compiles, but EXCEPTION_ILLEGAL_INSTRUCTION isn't connected up to it and you'd just crash instead, but even if it were... * we don't install a native signal handler anyway, just one of our fake ones (that I would love to get rid of) * there is also a native API to check CPU features, but I don't see the point in even thinking about it for now, because... * Windows 11 effectively requires ARMv8.1-A to boot, and that's what comes installed on current machines you can buy (this is expressed differently officially, by saying that LSE atomics are required and by listing the supported CPUs by model not ISA, and the list of CPUs that are no longer supported can also be found online, so it's fairly clear; there are also fun investigative reports from the time when RPI4s and other never-officially-supported systems could suddenly no longer boot after some update, as they would croak on LSE instructions) * if someone is using Windows 10 for ARM (which I gather is getting harder to obtain by now) on an old enough low power laptop that lacks the instruction (and note that most ARMv8-A chips *did* have that instruction as an optional extra anyway, they just didn't have LSE), well then maybe it's an issue but I think for a new platform where we are so short of developers that we haven't even got past other really basic starter problems after a couple of years of talking about it, I think we should focus only on current systems instead of allowing even 1 nanosecond to be wasted on retro-computing topics... and in any case that hypothetical user is just about out of time because... * Windows 10's announced EOL (no more updates/support, full screen warning messages if past EOLs are anything to go by) coincides approximately with PostgreSQL 18's expected release (Huh, thinking about other synchronous signals, don't we also fail to install a native SIGFPE handler on Windows, that would convert eg div-by-zero in C code into an ereport on Unix?) Anyway I don't mind getting rid of the SIGILL stuff as long as the new coding is tidy and cross-platform enough, if people are insisting. I'm not aware that there's anything technically wrong with it on any Unix system, but I agree that it's not beautiful code. FreeBSD was my motivation at the time, but it has since gained elf_aux_info(), and OpenBSD too, and I can help write/test that part if we go that way. Or maybe we could just take some fragments of OpenSSL or libzlma or whatever code if the licensing aspect is OK. As for the other ARM platforms in our universe: Unfortunately it looks like NetBSD doesn't put AT_HWCAP into its auxv[], or even expose it to user space nicely, and those libraries don't seem to know of another way. Hopefully NetBSD will align with those other systems for portability's sake. The only other way I could find in a quick googling session is /usr/sbin/cpuctl, root only, no cigar (but a solid clue that the information is floating around somewhere, it just depends where exactly the root-only fencing is happening). Even if a way can be found, it'd be better to be able to do it the same way as other systems that we are actually testing if at all possible. Put that way, the build farm absence is a sort of vote for just not even trying to detect it on NetBSD. A NetBSD user who really wants to access the feature could still compile with -march=<new thing>, supply a build farm animal and a patch, or preferrably help get a compatible elf_auxv_info(AT_HWCAP) merged into NetBSD. That'd be my vote if we switch to auxv probing on the other systems. Macs don't do ELF or auxv, but there's a sysctl. I think it must always report true in practice. The first M1 was ARMv8.4-A. (There are old non-computer Apple devices that used ARMv8-A still in the wild, which is interesting to me only because it explains the recent invention of -moutline-atomics, about which more soon hopefully, to get faster lwlocks etc on our -march=armv8-a builds as shipped by Debian et al.) So I think we could skip the palaver and just hard-code the knowledge that Macs can do this stuff. (Archeological note: the reason several systems have the same auxv[] concept, ie a table of parameters that exec*() sets up alongside argv[] and environ[] to communicate stuff about memory layout etc primarily to libc, and the reason they even agree on some of the parameter names, is that it came from SVR4 with ELF. I think Sun's original version got AT_HWCAP from the ELF binary after selecting from several object variants that were compiled to match various CPU feature sets, as a way of shipping fat binaries that go faster on the right hardware, while these modern systems are just passing on whatever the CPU reports under the same hijacked name; it's related, but different... Anyway they needed some way for the kernel to give the features to user space on ARM, because its register that is equivalent to x86's CPUID can't be accessed from user space's privilege level and libc itself would like to be able to use some fancy features. Even for non-ARM architectures it's nice not to have to break out the assembler in user programs that want to do the same sorts of tricks. Amazingly, illumos can apparently run on ARM now so we might in theory encounter the old meaning of AT_HWCAP, but that's quite a hypothetical unicorn so I'm filing the thought under archeology and hiding it in parentheses.) Note that Andres recently wondered out loud[2] if CRC32 might be fundamentally the wrong tool for the job in a related thread, so perhaps this will all become moot if someone does the research and replaces it, but that's vapourware for now... [1] https://www.postgresql.org/message-id/CA%2BhUKGJ2B5rAGUncAob%3DChutCT%3Dfx0Ot7kwvio5cB7NpOGKG1Q%40mail.gmail.com [2] https://www.postgresql.org/message-id/flat/20240612193746.rjeiip4hcamjedgo%40awork3.anarazel.de#ab383730597411817c69516ec6c1a65c
pgsql-bugs by date: