Thread: BUG #18610: llvm error: __aarch64_swp4_acq_rel which could not be resolved

BUG #18610: llvm error: __aarch64_swp4_acq_rel which could not be resolved

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18610
Logged by:          Alexander Kozhemyakin
Email address:      a.kozhemyakin@postgrespro.ru
PostgreSQL version: 17rc1
Operating system:   ubuntu-24.04
Description:

Hi, On the master branch (3beb945d) built with llvm, the following request
fails with an error

CREATE SCHEMA addr_nsp;
CREATE FOREIGN DATA WRAPPER addr_fdw;
CREATE SERVER addr_fserv FOREIGN DATA WRAPPER addr_fdw;
CREATE TEXT SEARCH DICTIONARY addr_ts_dict (template=simple);
CREATE FOREIGN TABLE addr_nsp.genftable (a int) SERVER addr_fserv;
CREATE PUBLICATION addr_pub FOR TABLE addr_nsp.gentable;

select
    pg_last_wal_receive_lsn()
from
  pg_publication as ref_1
  inner join pg_foreign_table as ref_2
  inner join pg_aggregate as ref_4
  on ( (select partclass from pg_partitioned_table limit 1) < (select
indcollation from pg_index limit 1))
  on ( (select 1 from pg_stat_bgwriter limit 1) <> (select total_time from
pg_stat_xact_user_functions limit 1) )
  right join pg_aggregate as ref_5 on ( ref_5.aggtransspace <= (9764) );

ERROR:  relation "addr_nsp.gentable" does not exist
WARNING:  failed to resolve name __aarch64_swp4_acq_rel
FATAL:  fatal llvm error: Program used external function
'__aarch64_swp4_acq_rel' which could not be resolved!
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.


On Wed, Sep 11, 2024 at 7:58 PM PG Bug reporting form
<noreply@postgresql.org> wrote:
> FATAL:  fatal llvm error: Program used external function
> '__aarch64_swp4_acq_rel' which could not be resolved!

Hmm, I think it is inlining spinlock code from
pg_last_wal_receive_lsn()'s call to GetWalRcvFlushRecPtr(), and then
failing to find fallbacks for pre-ARMv8.1 systems that didn't have the
LSE atomic instructions.  I wonder if there could be some mismatch in
the default -march for parts of your toolchain, so that it doesn't
link the library that has that stuff, but then the clang that built
walreceiverfuncs.bc expects it to be found.  I wonder if it goes away
if you add "-mno-outline-atomics" or "-march=armv8.1-a" or
"-march=armv8-a+lse" to BITCODE_CXXFLAGS (not saying that's a fix,
just trying to understand what's happening...).  Assuming you're using
GCC, maybe "gcc -Q --help=target | grep march" could show what Ubuntu
24 has set as the baseline, and "clang -### -c -x c /dev/null" might
show what clang is selecting... just ideas, I'm not sure, I don't have
such a system, I just noticed a few distros cranking up the baseline
instruction sets recently...



Re: BUG #18610: llvm error: __aarch64_swp4_acq_rel which could not be resolved

From
"a.kozhemyakin"
Date:

Thank you, the error no longer occurs after adding -mno-outline-atomics or -march=armv8.1-a

I installed packages on debian-12 (arm64) from https://apt.postgresql.org/pub/repos/apt, the error is repeated.


11.09.2024 16:55, Thomas Munro пишет:
Hmm, I think it is inlining spinlock code from
pg_last_wal_receive_lsn()'s call to GetWalRcvFlushRecPtr(), and then
failing to find fallbacks for pre-ARMv8.1 systems that didn't have the
LSE atomic instructions.  I wonder if there could be some mismatch in
the default -march for parts of your toolchain, so that it doesn't
link the library that has that stuff, but then the clang that built
walreceiverfuncs.bc expects it to be found.  I wonder if it goes away
if you add "-mno-outline-atomics" or "-march=armv8.1-a" or
"-march=armv8-a+lse" to BITCODE_CXXFLAGS (not saying that's a fix,
just trying to understand what's happening...).  Assuming you're using
GCC, maybe "gcc -Q --help=target | grep march" could show what Ubuntu
24 has set as the baseline, and "clang -### -c -x c /dev/null" might
show what clang is selecting... just ideas, I'm not sure, I don't have
such a system, I just noticed a few distros cranking up the baseline
instruction sets recently...
On Thu, Sep 12, 2024 at 2:30 PM a.kozhemyakin
<a.kozhemyakin@postgrespro.ru> wrote:
> Thank you, the error no longer occurs after adding -mno-outline-atomics or -march=armv8.1-a
>
> I installed packages on debian-12 (arm64) from https://apt.postgresql.org/pub/repos/apt, the error is repeated.

Ahh, I see that we already have a thread proposing to add
-moutline-atomics to our main executable, over here:

https://www.postgresql.org/message-id/flat/099F69EE-51D3-4214-934A-1F28C0A1A7A7%40amazon.com

If clang has changed to assuming -moutline-atomics now (perhaps you
can see which version of clang starting adding that switch by default
the command I showed earlier, if you have a few versions of clang
around, perhaps from apt.llvm.org), then we either need to stop it
from doing that with -mno-outline-atomics, or compile our executable
(or at least llvmjit.so?) to use that too so the helper library that
defines those functions is linked in.  The people who developed that
stuff are, I think, interested in using the faster LSE stuff on modern
phones, while still being able to run apps on the older phones that
are still in circulation, but of course we have the same issue: we
want our stuff to use LSE on Ampere/Grativon etc server chips, while
still being able to run on Raspberry Pi 4 etc.  And that'll speed up
many things related to locking in the server itself, not only in this
fairly obscure JIT-inlined thing.



On Fri, Nov 8, 2024 at 6:15 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> Here's a patch for approach #3.  This is low-urgency and a code freeze
> begins in a day or two, so I'm *not* proposing it for next week's
> release.

Hearing no objections, I pushed this.