Thread: BUG #18503: Reproducible 'Segmentation fault' in 16.3 on ARM64

BUG #18503: Reproducible 'Segmentation fault' in 16.3 on ARM64

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18503
Logged by:          Stefan Heine
Email address:      github.stheine@heine7.de
PostgreSQL version: 16.3
Operating system:   Ubuntu 24.04, Debian bookworm
Description:

This is a followup of
https://www.postgresql.org/message-id/flat/18471-4e01d7601cedf1b0%40postgresql.org
and maybe related to
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1059476

The query described in
https://www.postgresql.org/message-id/flat/18471-4e01d7601cedf1b0%40postgresql.org
is causing a reproducible 'Segmentation fault'.
I have tried various versions of postgresql on different OS versions, trying
to find one that works fine, but this happens in 14.8, 14.12, 16.3 on Debian
bookworm.
It also happens in 16.3 on Ubuntu 24.04 when installing the standard
OS-provided version of postgresql.
I also tried installing the 16.3 on Ubuntu 24.04 from
https://wiki.postgresql.org/wiki/Apt, and it's still failing.

The issue is clearly related to jit, since it only reproduces if jit is
enabled and forced to kick in (jit_above_cost = 1, jit_inline_above_cost =
1,
jit_optimize_above_cost = 1). disabling jit makes the query run fine.

in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1059476 there was a
similar issue, that pointed to llvm v14, but the postgresql version from
https://wiki.postgresql.org/wiki/Apt mentions `libllvm17t64`, so this seems
to include a newer version and still aborts.

That situation is clearly reproducible, so we can help troubleshooting in
case you want to look into details.


Re: BUG #18503: Reproducible 'Segmentation fault' in 16.3 on ARM64

From
Thomas Munro
Date:
On Tue, Jun 11, 2024 at 10:08 PM PG Bug reporting form
<noreply@postgresql.org> wrote:
> The query described in
> https://www.postgresql.org/message-id/flat/18471-4e01d7601cedf1b0%40postgresql.org
> is causing a reproducible 'Segmentation fault'.

Also on ARM?

That other report said that memory is leaking and implies (?) the OOM
killer, but you're talking about a segmentation fault, so it seems
like a different symptom (even if the root cause is same), right?

> That situation is clearly reproducible, so we can help troubleshooting in
> case you want to look into details.

Can you get a core file, and gdb backtrace?



Re: BUG #18503: Reproducible 'Segmentation fault' in 16.3 on ARM64

From
Stefan Heine
Date:


On 2024-06-12 02:59, Thomas Munro wrote:
On Tue, Jun 11, 2024 at 10:08 PM PG Bug reporting form
<noreply@postgresql.org> wrote:
The query described in
https://www.postgresql.org/message-id/flat/18471-4e01d7601cedf1b0%40postgresql.org
is causing a reproducible 'Segmentation fault'.
Also on ARM?

yes, this is ARM (aarch64).
That other report said that memory is leaking and implies (?) the OOM
killer, but you're talking about a segmentation fault, so it seems
like a different symptom (even if the root cause is same), right?

when jit is disabled, the query runs ok in < 2 seconds.
when the issue happens, it takes > 30 seconds and then results in a 6GB core.


That situation is clearly reproducible, so we can help troubleshooting in
case you want to look into details.
Can you get a core file, and gdb backtrace?

find a core here:
https://my.hidrive.com/share/bz2zb2alkp

do you have instructions for the gdb backtrace?

Re: BUG #18503: Reproducible 'Segmentation fault' in 16.3 on ARM64

From
Thomas Munro
Date:
On Thu, Jun 13, 2024 at 1:47 AM Stefan Heine <github.stheine@heine7.de> wrote:
> do you have instructions for the gdb backtrace?

gdb /path/to/executable -c /path/to/core
... loads stuff ...
(gdb) bt
... prints out function call stack ...

It will probably just show some library names and addresses, but so
far we don't even know if this is crashing in LLVM or in PostgreSQL
code so that'd be a clue.  Maybe function names would appear if you
set up DEBUGINFOD_URLS, depending on where you got your packages from:

https://wiki.debian.org/HowToGetABacktrace

Hoping to find time to repro this later on a cloud host.  If this is a
cloud host, can you tell me which cloud, instance type, memory size
etc?  I had already been trying on some local ARM hardware with no
luck (same versions but diferrent OS, so going to try making more
things match you case)...



Re: BUG #18503: Reproducible 'Segmentation fault' in 16.3 on ARM64

From
Thomas Munro
Date:
On Thu, Jun 13, 2024 at 9:41 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> cloud host, can you tell me which cloud, instance type, memory size
> etc?

(I realise that the email from the other thread gives an AWS instance
type that I can try, but that report is about memory usage and yours
has a segfault so I'm curious to know what conditions are different
for you..)



Re: BUG #18503: Reproducible 'Segmentation fault' in 16.3 on ARM64

From
Stefan Heine
Date:
On 2024-06-12 23:41, Thomas Munro wrote:
On Thu, Jun 13, 2024 at 1:47 AM Stefan Heine <github.stheine@heine7.de> wrote:
do you have instructions for the gdb backtrace?
gdb /path/to/executable -c /path/to/core
... loads stuff ...
(gdb) bt
... prints out function call stack ...

It will probably just show some library names and addresses, but so
far we don't even know if this is crashing in LLVM or in PostgreSQL
code so that'd be a clue.  Maybe function names would appear if you
set up DEBUGINFOD_URLS, depending on where you got your packages from:

https://wiki.debian.org/HowToGetABacktrace

# gdb /usr/lib/postgresql/16/bin/postgres -c core.19  
GNU gdb (Debian 13.1-3) 13.1
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
   <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/postgresql/16/bin/postgres...
(No debugging symbols found in /usr/lib/postgresql/16/bin/postgres)
warning: Can't open file /dev/shm/PostgreSQL.384567174 during file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.2343312096 during file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.1247406204 during file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.50860586 during file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.4136010652 during file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.2304500154 during file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.817475720 during file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.526004662 during file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.1223723046 during file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.4190931822 during file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.3836724180 during file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.1707942452 during file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.4107375064 during file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.2885303254 during file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.4136268764 during file-backed mapping note processing
warning: Can't open file /dev/zero (deleted) during file-backed mapping note processing
warning: Can't open file /dev/shm/PostgreSQL.3153232120 during file-backed mapping note processing
warning: Can't open file /SYSV03e40001 (deleted) during file-backed mapping note processing
[New LWP 19]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: sa postgres 164.99.242.100(57456) EXPLAIN                           '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000fffe0fb635b8 in ?? ()
(gdb) bt
#0  0x0000fffe0fb635b8 in ?? ()
#1  0x0000aaaaefd84330 in ?? ()
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
(gdb) quit



Hoping to find time to repro this later on a cloud host.  If this is a
cloud host, can you tell me which cloud, instance type, memory size
etc?  I had already been trying on some local ARM hardware with no
luck (same versions but diferrent OS, so going to try making more
things match you case)...
(I realise that the email from the other thread gives an AWS instance
type that I can try, but that report is about memory usage and yours
has a segfault so I'm curious to know what conditions are different
for you..)

it's running on AWS, t4g.large, 8GB RAM. this server is running Ubuntu 22.04.3 LTS and hosting docker.
inside docker, there is a container running postgres, based on the official postgres:16.3 (Based on Debian Bookwork) from https://hub.docker.com/_/postgres .