Thread: Re: Detection of hadware feature => please do not use signal

Re: Detection of hadware feature => please do not use signal

From
Heikki Linnakangas
Date:
On 31/10/2024 17:41, Bastien Roucariès wrote:
> Hi,
> 
> On debian side we will like to avoid use of sigill to detect feature.
> 
> https://sources.debian.org/src/postgresql-17/17.0-1/src/port/pg_crc32c_armv8_choose.c/#L55
> 
> is really bad.

Why is it bad?

> Could you use the canonical way under linux
> 
> #include <sys/auxv.h>
> #include <asm/hwcap.h>
> 
>   #if defined(__aarch64__)
>     return !(getauxval(AT_HWCAP) &  HWCAP_CRC32);
>   #else
>     return !(getauxval(AT_HWCAP2) & HWCAP2_CRC32);
>   #endif

We used to do that, but it was changed in commit 1c72ec6f. Looking at 
the discussion that led to it, it's not clear to me why we switched [1]. 
Portablity to non-glibc systems maybe?

If there's a handy, portable way to check CPU capabilities, I agree it 
seem less hacky than catching SIGILL...

[1] 
https://www.postgresql.org/message-id/CAEepm%3D02Run-Pk3xyt%2BRV3p1N%2B7cKZxN95_MamaJw8Cnw%2BDwjQ%40mail.gmail.com

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 31/10/2024 17:41, Bastien Roucariès wrote:
>> Could you use the canonical way under linux

> We used to do that, but it was changed in commit 1c72ec6f. Looking at 
> the discussion that led to it, it's not clear to me why we switched [1]. 

The complaint was precisely that it would not work under non-linux.

Maybe we could do it one way on linux and the other way everywhere
else, but I do not find that attractive; it just makes it harder
to get full test coverage.

            regards, tom lane



Re: Detection of hadware feature => please do not use signal

From
Heikki Linnakangas
Date:
On 31/10/2024 19:48, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> On 31/10/2024 17:41, Bastien Roucariès wrote:
>>> Could you use the canonical way under linux
> 
>> We used to do that, but it was changed in commit 1c72ec6f. Looking at
>> the discussion that led to it, it's not clear to me why we switched [1].
> 
> The complaint was precisely that it would not work under non-linux.
> 
> Maybe we could do it one way on linux and the other way everywhere
> else, but I do not find that attractive; it just makes it harder
> to get full test coverage.

Bastien, would you happen to know how other portable libraries that need 
to run on other operating system do this? We added the SIGILL probe back 
in 2018, there might be better options available now.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 31/10/2024 19:48, Tom Lane wrote:
>> Maybe we could do it one way on linux and the other way everywhere
>> else, but I do not find that attractive; it just makes it harder
>> to get full test coverage.

> Bastien, would you happen to know how other portable libraries that need 
> to run on other operating system do this? We added the SIGILL probe back 
> in 2018, there might be better options available now.

It occurs to me to wonder whether the existing code works on Windows.
Windows-on-ARM wasn't a thing we thought about in 2018, but it's
a reasonable target now.

            regards, tom lane



Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
Bastien =?ISO-8859-1?Q?Roucari=E8s?= <rouca@debian.org> writes:
> Le jeudi 31 octobre 2024, 17:29:45 UTC Heikki Linnakangas a écrit :
>> Why is it bad?

> for a binary it is maybe safe for a library it is unsafe in Multi thread context to capture signal...

[ shrug... ] If we're concerned about that, we should hardly be less
concerned on other platforms than we are on Linux.

(AFAIK, we do not execute the code in question in any libraries.)

            regards, tom lane



Re: Detection of hadware feature => please do not use signal

From
Thomas Munro
Date:
On Fri, Nov 1, 2024 at 7:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It occurs to me to wonder whether the existing code works on Windows.
> Windows-on-ARM wasn't a thing we thought about in 2018, but it's
> a reasonable target now.

I looked into that[1] and decided that I was going to ignore it
completely, because:

* Windows defines a dummy SIGILL signal number as required by the C
standard, so stuff like this compiles, but
EXCEPTION_ILLEGAL_INSTRUCTION isn't connected up to it and you'd just
crash instead, but even if it were...
* we don't install a native signal handler anyway, just one of our
fake ones (that I would love to get rid of)
* there is also a native API to check CPU features, but I don't see
the point in even thinking about it for now, because...
* Windows 11 effectively requires ARMv8.1-A to boot, and that's what
comes installed on current machines you can buy (this is expressed
differently officially, by saying that LSE atomics are required and by
listing the supported CPUs by model not ISA, and the list of CPUs that
are no longer supported can also be found online, so it's fairly
clear; there are also fun investigative reports from the time when
RPI4s and other never-officially-supported systems could suddenly no
longer boot after some update, as they would croak on LSE
instructions)
* if someone is using Windows 10 for ARM (which I gather is getting
harder to obtain by now) on an old enough low power laptop that lacks
the instruction (and note that most ARMv8-A chips *did* have that
instruction as an optional extra anyway, they just didn't have LSE),
well then maybe it's an issue but I think for a new platform where we
are so short of developers that we haven't even got past other really
basic starter problems after a couple of years of talking about it, I
think we should focus only on current systems instead of allowing even
1 nanosecond to be wasted on retro-computing topics... and in any case
that hypothetical user is just about out of time because...
* Windows 10's announced EOL (no more updates/support, full screen
warning messages if past EOLs are anything to go by) coincides
approximately with PostgreSQL 18's expected release

(Huh, thinking about other synchronous signals, don't we also fail to
install a native SIGFPE handler on Windows, that would convert eg
div-by-zero in C code into an ereport on Unix?)

Anyway I don't mind getting rid of the SIGILL stuff as long as the new
coding is tidy and cross-platform enough, if people are insisting.
I'm not aware that there's anything technically wrong with it on any
Unix system, but I agree that it's not beautiful code.  FreeBSD was my
motivation at the time, but it has since gained elf_aux_info(), and
OpenBSD too, and I can help write/test that part if we go that way.
Or maybe we could just take some fragments of OpenSSL or libzlma or
whatever code if the licensing aspect is OK.

As for the other ARM platforms in our universe:

Unfortunately it looks like NetBSD doesn't put AT_HWCAP into its
auxv[], or even expose it to user space nicely, and those libraries
don't seem to know of another way.  Hopefully NetBSD will align with
those other systems for portability's sake.  The only other way I
could find in a quick googling session is /usr/sbin/cpuctl, root only,
no cigar (but a solid clue that the information is floating around
somewhere, it just depends where exactly the root-only fencing is
happening).  Even if a way can be found, it'd be better to be able to
do it the same way as other systems that we are actually testing if at
all possible.  Put that way, the build farm absence is a sort of vote
for just not even trying to detect it on NetBSD.  A NetBSD user who
really wants to access the feature could still compile with
-march=<new thing>, supply a build farm animal and a patch, or
preferrably help get a compatible elf_auxv_info(AT_HWCAP) merged into
NetBSD.  That'd be my vote if we switch to auxv probing on the other
systems.

Macs don't do ELF or auxv, but there's a sysctl.  I think it must
always report true in practice.  The first M1 was ARMv8.4-A.  (There
are old non-computer Apple devices that used ARMv8-A still in the
wild, which is interesting to me only because it explains the recent
invention of -moutline-atomics, about which more soon hopefully, to
get faster lwlocks etc on our -march=armv8-a builds as shipped by
Debian et al.)  So I think we could skip the palaver and just
hard-code the knowledge that Macs can do this stuff.

(Archeological note: the reason several systems have the same auxv[]
concept, ie a table of parameters that exec*() sets up alongside
argv[] and environ[] to communicate stuff about memory layout etc
primarily to libc, and the reason they even agree on some of the
parameter names, is that it came from SVR4 with ELF.  I think Sun's
original version got AT_HWCAP from the ELF binary after selecting from
several object variants that were compiled to match various CPU
feature sets, as a way of shipping fat binaries that go faster on the
right hardware, while these modern systems are just passing on
whatever the CPU reports under the same hijacked name; it's related,
but different...  Anyway they needed some way for the kernel to give
the features to user space on ARM, because its register that is
equivalent to x86's CPUID can't be accessed from user space's
privilege level and libc itself would like to be able to use some
fancy features.  Even for non-ARM architectures it's nice not to have
to break out the assembler in user programs that want to do the same
sorts of tricks.  Amazingly, illumos can apparently run on ARM now so
we might in theory encounter the old meaning of AT_HWCAP, but that's
quite a hypothetical unicorn so I'm filing the thought under
archeology and hiding it in parentheses.)

Note that Andres recently wondered out loud[2] if CRC32 might be
fundamentally the wrong tool for the job in a related thread, so
perhaps this will all become moot if someone does the research and
replaces it, but that's vapourware for now...

[1] https://www.postgresql.org/message-id/CA%2BhUKGJ2B5rAGUncAob%3DChutCT%3Dfx0Ot7kwvio5cB7NpOGKG1Q%40mail.gmail.com
[2]
https://www.postgresql.org/message-id/flat/20240612193746.rjeiip4hcamjedgo%40awork3.anarazel.de#ab383730597411817c69516ec6c1a65c



Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Fri, Nov 1, 2024 at 7:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It occurs to me to wonder whether the existing code works on Windows.
>> Windows-on-ARM wasn't a thing we thought about in 2018, but it's
>> a reasonable target now.

> I looked into that[1] and decided that I was going to ignore it
> completely, because:
> [ oodles o' details ]

Hmm.  So it seems like we could do this:

* Linux, FreeBSD, OpenBSD: do run-time test as recommended
* Windows, macOS: assume that supported ARM hardware can do this
* anything else: assume no hardware CRC

The only thing I'd be even a bit sad about there is not having
NetBSD support.  Maybe we just need to research that a bit more,
or maybe we have to wait for/lobby for them to do what the other
BSDen have done.  But in any case I'm not seeing NetBSD-on-ARM
as a platform that's important enough to block movement on this,
let alone anything else in the "anything else" category.

An alternative to "assume no hardware CRC" could be

* anything else: use the existing SIGILL test

But I do sympathize with Bastien's concerns about that.

            regards, tom lane



Re: Detection of hadware feature => please do not use signal

From
Tom Lane
Date:
I wrote:
> Hmm.  So it seems like we could do this:
> * Linux, FreeBSD, OpenBSD: do run-time test as recommended
> * Windows, macOS: assume that supported ARM hardware can do this
> * anything else: assume no hardware CRC

Actually, after chewing on that second point awhile longer,
how about this modest proposal:

* Drop all code for run-time determination of ARM CRC support.
Assume it's there unless user builds with a -march option that
says it definitely isn't.

Realistically, exactly who is going to be running Postgres 18+
on ARM hardware that lacks CRC support?  I can think of lots of
projects that are more worthy of our time than this.

(Perhaps it's time to apply the same mindset to x86 CRC support
too?)

            regards, tom lane



Re: Detection of hadware feature => please do not use signal

From
Thomas Munro
Date:
On Fri, Nov 1, 2024 at 7:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
> > Hmm.  So it seems like we could do this:
> > * Linux, FreeBSD, OpenBSD: do run-time test as recommended
> > * Windows, macOS: assume that supported ARM hardware can do this
> > * anything else: assume no hardware CRC
>
> Actually, after chewing on that second point awhile longer,
> how about this modest proposal:
>
> * Drop all code for run-time determination of ARM CRC support.
> Assume it's there unless user builds with a -march option that
> says it definitely isn't.

For CRC32, that would work (and already does via configure tests) for
macOS (I'd forgotten about that in my earlier message, Macs never even
compile the CRC32 "choose" code because the system compiler targets M1
by default), recent RHEL (see below), and presumably others, but it
would penalise Debian, FreeBSD (using the standard binary package
repo) and others if we didn't have the runtime check due to their
conservative choice of baseline arch (as Bastien just said about
Debian in a crossed email).  I've been wondering about what to do
about all this for a while...  There are also other features we aren't
using yet, but should, or single instructions that we are calling
through function pointers, preventing automatic vectorisation etc.

> Realistically, exactly who is going to be running Postgres 18+
> on ARM hardware that lacks CRC support?  I can think of lots of
> projects that are more worthy of our time than this.
>
> (Perhaps it's time to apply the same mindset to x86 CRC support
> too?)

I think it's quite interesting that RHEL has moved its baseline to
x86_86-v2, having recently worked with Intel and AMD to standardise
new levels "-vN" that group and "linearise" the feature sets, and
others were already talking about standarding on -v3 last time I
looked, while Debian still targets x86_86 which means the instruction
set of the first AMD K8 chip from 2003.  Debian has a nice wiki page
(see thread ↓) explaining various workaround strategies, ranging from
runtime tests like ours, per-sub-architecture library selection,
through to (last resort) depending on pseudo-packages like
"x86-64-v2-support" so that your package becomes uninstallable on
ancient hardware.  I'm not involved in packaging but I was wondering
out loud whether the PGDG repos might potentially consider making a
different choice than the official Debian repos, always or as an
option, or ... various other ideas proposed by others.  Perhaps we
could continue the topic on that thread:

https://www.postgresql.org/message-id/flat/CA%2BhUKGKS64zJezV9y9mPcB-J0i%2BfLGiv3FAdwSH_3SCaVdrjyQ%40mail.gmail.com

In the meantime I'll see about the Linux/*BSD patch for this thing,
more tomorrow hopefully.  And I might see if I can find a NetBSD
person to ask...



Re: Detection of hadware feature => please do not use signal

From
Nathan Bossart
Date:
On Sat, Nov 02, 2024 at 06:11:33PM +1300, Thomas Munro wrote:
> On Fri, Nov 1, 2024 at 7:16 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Actually, after chewing on that second point awhile longer,
>> how about this modest proposal:
>>
>> * Drop all code for run-time determination of ARM CRC support.
>> Assume it's there unless user builds with a -march option that
>> says it definitely isn't.
> 
> For CRC32, that would work (and already does via configure tests) for
> macOS (I'd forgotten about that in my earlier message, Macs never even
> compile the CRC32 "choose" code because the system compiler targets M1
> by default), recent RHEL (see below), and presumably others, but it
> would penalise Debian, FreeBSD (using the standard binary package
> repo) and others if we didn't have the runtime check due to their
> conservative choice of baseline arch (as Bastien just said about
> Debian in a crossed email).  I've been wondering about what to do
> about all this for a while...  There are also other features we aren't
> using yet, but should, or single instructions that we are calling
> through function pointers, preventing automatic vectorisation etc.

One small improvement in this area that I'm working on (and about to commit
for the AVX-512 stuff) is using __attribute__((target(...))) instead of
special -march options for specific files [0].  Among other things, this
simplifies the configure checks and eliminates the need to put chunks of
code in separate files.  However, I found that until recently, it wasn't
possible to include arm_acle.h without the -march flag [1], which means we
likely won't be able to make the switch for the ARM CRC stuff for a few
more years.

More to the point, until the baseline moves, I think we'll end up adding
checks in header files so that we only take the function pointer overhead
when it's likely to be worth it (e.g., pg_popcount() in pg_bitutils.h).  It
might be worth doing that for the CRC-32C code, too, since we are still
quite reliant on the runtime check.

[0] https://commitfest.postgresql.org/50/5305/
[1] https://postgr.es/m/ZwXsE0KG_wh6_heU%40nathan

-- 
nathan



Re: Detection of hadware feature => please do not use signal

From
Thomas Munro
Date:
On Fri, Nov 8, 2024 at 4:28 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> One small improvement in this area that I'm working on (and about to commit
> for the AVX-512 stuff) is using __attribute__((target(...))) instead of
> special -march options for specific files [0].

(Catching up with that stuff).  Neat.

I tried sprinkling __attribute__((target_clones("default,lse"))) in
front of some functions like LWLockAquire() just to see if this
multifunction stuff worked enough yet on ARM.  FreeBSD 15/clang 18:
yes, I got LSE CAS-based lwlocks with loader-time feature detection!
Debian 12/gcc 12: no, it fails with "error: target does not support
function version dispatcher".  I think maybe gcc 14 will do it (from
quick look at release notes)?  I realise that's yet a different issue
than the <arm_acle.h> visibility thing you mentioned for the CRC32
builtins.  Anyway, it's apparently still too soon to help on all
systems with this specific topic as you already said, so the explicit
auxv check still looks like the only reasonable way for now.