Re: Server crash with parallel workers with Postgres 14.7 - Mailing list pgsql-bugs

From Jaime Casanova
Subject Re: Server crash with parallel workers with Postgres 14.7
Date
Msg-id CAJKUy5jCaxACx8hfaHmazeTKYhGjVErQzftgY_N4T30FRYRAzQ@mail.gmail.com
Whole thread Raw
In response to Re: Server crash with parallel workers with Postgres 14.7  (José Lorenzo Urdaneta Rodriguez <lorenzo@kronor.io>)
Responses Re: Server crash with parallel workers with Postgres 14.7  (Michael Paquier <michael@paquier.xyz>)
List pgsql-bugs
On Mon, May 29, 2023 at 10:38 AM José Lorenzo Urdaneta Rodriguez
<lorenzo@kronor.io> wrote:
>
> I just wanted to confirm this was the right place to report the issue. Can anyone confirm, please?
>

yes, this is the right place to report... only there is no guaranted
SLA and because this report is not that useful (read below for
details) that makes a lot of people not follow

> On Fri, 19 May 2023 at 11:14, José Lorenzo Urdaneta Rodriguez <lorenzo@kronor.io> wrote:
>>
>> Hi,
>>
>> I've been having intermittent server crashes when executing certain queries. I have narrowed the cases to queries
thatscan large tables, and the most recent cases when the planner uses parallel workers. 
>>

intermittent means is not reproducible all times? I mean, just
executing this query does not cause the crash?

>> I managed to collect a core dump of the crash, here's the result of `bt` using `gdb`:
>>
>> ```
>> Reading symbols from /usr/lib/postgresql/14/bin/postgres...
>> Reading symbols from /usr/lib/debug/.build-id/4a/4ff1b11a45a428e502b992679932bc188f92c1.debug...
>> [New LWP 3008897]
>> [Thread debugging using libthread_db enabled]
>> Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
>> Core was generated by `postgres: 14/kronor: parallel worker for PID 3008825               '.
>> Program terminated with signal SIGSEGV, Segmentation fault.
>> #0  0x0000fffea2ac7a68 in ?? ()
>> (gdb) bt
>> #0  0x0000fffea2ac7a68 in ?? ()
>> #1  0x0000aaaabb378020 in ExecProcNode (node=0xaaaae311d068) at ./build/../src/include/executor/executor.h:257
>> #2  ExecAppend (pstate=0xaaaae30dd358) at ./build/../src/backend/executor/nodeAppend.c:360
>> #3  0x0000aaaabb378020 in ExecProcNode (node=0xaaaae30dd358) at ./build/../src/include/executor/executor.h:257
>> #4  ExecAppend (pstate=0xaaaae30bf258) at ./build/../src/backend/executor/nodeAppend.c:360
>> #5  0x0000000000000001 in ?? ()
>> Backtrace stopped: previous frame identical to this frame (corrupt stack?)
>> ```
>>

this backtrace doesn't have all debug symbols, did you install the
postgresql-14-dbgsym module?
without the names of the functions we don't really know what is happening.

also, have you installed any extensions? you can execute "\dx" on psql
to see what extensions are installed (remember that extensions are
installed by database so executing that commando on only one database
is not enough).

>> The query that was running was:

- a big query goes here - the query itself is not useful if you don't
provide the table structues and a minimal amount of data (fake data)
to make the problem appear

>> The plan for this query was:
>>
- A typical plan for a partitioned table -

>>  JIT:
>>    Functions: 375
>>    Options: Inlining false, Optimization false, Expressions true, Deforming true
>> ```
>>

the backtrace says this is a segmentation fault, but anyway I will
suggest deactivate JIT before the query: just "SET jit TO off;" should
be enough
and try to cause the problem again, JIT is known to have a leak memory
problem (which is not consistent with a segmentation fault, but who
knows)


>> Operating System: Ubuntu 20
>> Architecture: aarch64
>> Server version: 14.7
>>

try to update to v14.8 which has some fixes on it



--
Jaime Casanova
Consultores de PostgreSQL
SYSTEMGUARDS S.A.



pgsql-bugs by date:

Previous
From: Ba Jinsheng
Date:
Subject: Suspicious Estimated Number of Returned Rows
Next
From: PG Bug reporting form
Date:
Subject: BUG #17951: hashtext('input') returning non-integer value for certain inputs