BUG #17793: Query with large number of joins crashes PostgreSQL - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #17793: Query with large number of joins crashes PostgreSQL
Date
Msg-id 17793-21fa598adbc21118@postgresql.org
Whole thread Raw
Responses Re: BUG #17793: Query with large number of joins crashes PostgreSQL
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17793
Logged by:          Tom
Email address:      tom@quist.de
PostgreSQL version: 13.10
Operating system:   Debian 10.2.1-6 (Docker image postgres:13.10)
Description:

Hey,

we had a unit test that produced a similar query to the following one and
caused Postgres to crash and reinitialize with automatic recovery:
https://gist.github.com/tomquist/a58b62d1a15ce99a67657ed636129e9b

These are the client logs:
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

These are the server logs when the crash happens:
2023-02-14 09:17:31.440 UTC [1] LOG:  server process (PID 35) was terminated
by signal 9: Killed
2023-02-14 09:17:31.440 UTC [1] DETAIL:  Failed process was running: with
app_user as (select 'a' as id, 'organizationId' as organization_id,
'Africa/Abidjan' as timezone),
    employee as (select 'a' as id, 'organizationId' as organization_id, 'a' as
user_id, 'a' as manager_id)
    SELECT DISTINCT app_user.id FROM app_user LEFT OUTER JOIN employee AS
employee_1 ON 
    employee_1.user_id = app_user.id LEFT OUTER JOIN employee AS
manager_employee_1 ON (employee_1.manager_id = manager_employee_1.id AND
manager_employee_1.user_id IN (SELECT app_user.id FROM app_user WHERE
(app_user.timezone = 'Africa/Abidjan' AND app_user.organization_id =
'organizationId')))
     LEFT OUTER JOIN employee AS employee_2 ON 
    employee_2.user_id = app_user.id LEFT OUTER JOIN employee AS
manager_employee_2 ON (employee_2.manager_id = manager_employee_2.id AND
manager_employee_2.user_id IN (SELECT app_user.id FROM app_user WHERE
(app_user.timezone = 'Africa/Algiers' AND app_user.organization_id =
'organizationId')))
     LEFT OUTER JOIN employee AS employee_3 ON 
    employee_3.user_id = app_user.id LEFT OUTER JOIN employee
2023-02-14 09:17:31.442 UTC [1] LOG:  terminating any other active server
processes
2023-02-14 09:17:31.446 UTC [32] WARNING:  terminating connection because of
crash of another server process
2023-02-14 09:17:31.446 UTC [32] DETAIL:  The postmaster has commanded this
server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2023-02-14 09:17:31.446 UTC [32] HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2023-02-14 09:17:31.461 UTC [37] FATAL:  the database system is in recovery
mode
2023-02-14 09:17:31.522 UTC [1] LOG:  all server processes terminated;
reinitializing
2023-02-14 09:17:31.546 UTC [38] LOG:  database system was interrupted; last
known up at 2023-02-14 09:16:05 UTC
2023-02-14 09:17:31.779 UTC [38] LOG:  database system was not properly shut
down; automatic recovery in progress
2023-02-14 09:17:31.781 UTC [38] LOG:  redo starts at 0/15D7070
2023-02-14 09:17:31.782 UTC [38] LOG:  invalid record length at 0/15D70A8:
wanted 24, got 0
2023-02-14 09:17:31.782 UTC [38] LOG:  redo done at 0/15D7070
2023-02-14 09:17:31.791 UTC [1] LOG:  database system is ready to accept
connections

It looks like the OS is killing the process due to running OOM, which is not
very surprising when looking at the query. Is this expected, or should PG
have guards in place to prevent this from happening?

I could reproduce it using a fresh docker container of the image
"postgres:13.10".
SELECT version();
> PostgreSQL 13.10 (Debian 13.10-1.pgdg110+1) on aarch64-unknown-linux-gnu,
compiled by gcc (Debian 10.2.1-6) 10.2.1 20210110, 64-bit


pgsql-bugs by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: BUG #17789: process_pgfdw_appname() fails for autovacuum workers
Next
From: Alvaro Herrera
Date:
Subject: Re: BUG #17792: MERGE uses uninitialized pointer and crashes when target tuple is updated concurrently