Thread: BUG #18471: Possible JIT memory leak resulting in signal 11: Segmentation fault on ARM

The following bug has been logged on the website:

Bug reference:      18471
Logged by:          Joachim Haecker-Becker
Email address:      joachim.haecker-becker@arcor.de
PostgreSQL version: 16.3
Operating system:   Debian Bookworm
Description:

We have a reproducible way to force a postgres process to consume more and
more RAM until it crashes on ARM.
The same works on X86 without any issue.
With jit=off it runs on ARM as well.

We run into this situation in a real-life database situation with a lot of
joins and aggregate functions. 
The following code is just a mock to reproduce a similar situation without
needing access to our real data.
This issue blocks us from upgrading or ARM-hosted databases into something
newer than 14.7.

Systems:

ARM:
postgres 16.3
debian bookworm
ARM64 (AWS, t4g.xlarge, Graviton2, 64bit)
latest docker image 16.3 from https://hub.docker.com/_/postgres

X86:
postgres 16.3
debian bookworm
ARM64 (AWS, t3.xlarge, 64bit)
latest docker image 16.3 from https://hub.docker.com/_/postgres

JIT settings:
jit_above_cost = 1
jit_inline_above_cost = 1
jit_optimize_above_cost = 1

How to reproduce:

SELECT TO_CHAR(day, 'YYYY-MM-DD')::varchar(10) AS day_varchar, day::varchar
AS day_t_varchar, day::date day_date 

INTO public.generated_days
    FROM generate_series(
      timestamp without time zone '2000-01-01',
      timestamp without time zone '3000-01-01',
      '1 day'
    )
  AS gs(day);

SELECT
    min(days.day_varchar),
    max(days.day_varchar),
    min(days.day_t_varchar),
    max(days.day_t_varchar),
    min(days.day_date),
    max(days.day_date),

    min(days1.day_varchar),
    max(days1.day_varchar),
    min(days1.day_t_varchar),
    max(days1.day_t_varchar),
    min(days1.day_date),
    max(days1.day_date),

    min(days2.day_varchar),
    max(days2.day_varchar),
    min(days2.day_t_varchar),
    max(days2.day_t_varchar),
    min(days2.day_date),
    max(days2.day_date),

    min(days3.day_varchar),
    max(days3.day_varchar),
    min(days3.day_t_varchar),
    max(days3.day_t_varchar),
    min(days3.day_date),
     max(days3.day_date),

    min(days4.day_varchar)


FROM public.generated_days days
LEFT JOIN public.generated_days days1 on days1.day_varchar =
days.day_varchar
LEFT JOIN public.generated_days days2 on days2.day_varchar =
days.day_varchar
LEFT JOIN public.generated_days days3 on days3.day_varchar =
days.day_varchar
LEFT JOIN public.generated_days days4 on days4.day_varchar =
days.day_varchar

With 24 selected columns it runs, with 25 it fails, it doesnt matter which
column is removed from the query.
Leaving out one of the columns using EXPLAIN (ANALYZE) on ARM:
"JIT:"
"  Functions: 104"
"  Options: Inlining true, Optimization true, Expressions true, Deforming
true"
"  Timing: Generation 9.120 ms, Inlining 208.976 ms, Optimization 1171.179
ms, Emission 1096.749 ms, Total 2486.023 ms"

Let me know if you need more information.


> On Fri, May 17, 2024 at 01:13:06PM +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference:      18471
> Logged by:          Joachim Haecker-Becker
> Email address:      joachim.haecker-becker@arcor.de
> PostgreSQL version: 16.3
> Operating system:   Debian Bookworm
> Description:
>
> We have a reproducible way to force a postgres process to consume more and
> more RAM until it crashes on ARM.
> The same works on X86 without any issue.
> With jit=off it runs on ARM as well.
>
> We run into this situation in a real-life database situation with a lot of
> joins and aggregate functions.
> The following code is just a mock to reproduce a similar situation without
> needing access to our real data.
> This issue blocks us from upgrading or ARM-hosted databases into something
> newer than 14.7.

I think it would be useful to know how much memory difference are we
talking about and, just to make everything clear, how exactly postgres
crashes (OOM kill I assume)? It's important to differentiate between the
case "ARM with jit crashes, ARM without jit doesn't" and "ARM with jit
crashes, ARM without jit crashes with even more columns" (the same goes
for x86).

I've tried to reproduce it on an arm64 VM (16.3 build with llvm 17), and
although I could observe some difference in memory consumption between
JIT on/off, but it wasn't huge (around 10% or so). Running it under
valgrind shows only complains about memory allocated for bitcode
modules, which is expected -- as far as I recall postgres is somewhat
wasteful when it comes to allocating memory for those modules, even more
so for parallel workers. This is the case here, where there is growing
number of parallel hash workers. This would not explain any difference
from x86 of course, but there might be different baseline memory
consumption for different architectures.