BUG #18076: Consistently receiving Signal 7 and Signal 11 errors - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #18076: Consistently receiving Signal 7 and Signal 11 errors |
Date | |
Msg-id | 18076-acf41deb61815178@postgresql.org Whole thread Raw |
Responses |
Re: BUG #18076: Consistently receiving Signal 7 and Signal 11 errors
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 18076 Logged by: Robert Sanford Email address: wobbet@wobbet.com PostgreSQL version: 15.4 Operating system: Linux 5.15.0-1042-azure #49~20.04.1-Ubuntu aarch64 Description: Installed v15 via Docker on my local dev machine (Intel x64). Created the database schema, ran a bunch of tests. All very good. pg_dump the database and then imported into a new instance running on ARM64 VM in Azure (8 cores, 64 GB RAM, Ubuntu 20.04.1). This is a full VM, not a docker instance. ARM64 CPUs are, according to Azure, physical CPUs and not virtual. Logs from the Azure machine show Signal 7 and Signal 11 errors consistently in the same place of the processing every time (log entries below). It's executing a specific function. Server restarts and has to catch up from the WAL. None of the other functions show this behavior. Running the stored procedure from DBeaver does not cause the same issues. My ETL app is a .NET Core 6 console application running on my local machine and connecting to either the DB on my local machine or the VM in Azure. It connects via NPGSQL. It doesn't seem to be horribly expensive in terms of CPU usage and generally finishes in < 5s. That being said, there can be as many as 32 simultaneous connections running the specific stored procedure. After reading other bug reports of Signal 7 I set huge_pages=off and disabled extensions such as autoanalyze. Other non-default config entries include: - max_connections = 100 - shared_buffers = 8GB - work_mem = 8GB - maintenance_work_mem = 1GB - max_wal_size = 4GB - min_wal_size = 500MB I am hopeful that someone will be able to help me out with this. Let me know if there's any other details that I need to provide. Thanks! rjsjr Log entries... 2023-08-29 01:50:57.225 UTC [432307] LOG: server process (PID 433701) was terminated by signal 7: Bus error 2023-08-29 01:50:57.225 UTC [432307] DETAIL: Failed process was running: select * from process_route_compliance($1, $2, $3) 2023-08-29 01:50:57.225 UTC [432307] LOG: terminating any other active server processes 2023-08-29 01:50:57.231 UTC [432307] LOG: all server processes terminated; reinitializing 2023-08-29 01:50:57.441 UTC [438947] LOG: database system was interrupted; last known up at 2023-08-29 01:44:22 UTC 2023-08-29 01:50:57.547 UTC [438947] LOG: database system was not properly shut down; automatic recovery in progress 2023-08-29 01:50:57.553 UTC [438947] LOG: redo starts at F/3CD6C068 2023-08-29 01:50:57.896 UTC [438950] FATAL: the database system is not yet accepting connections 2023-08-29 01:50:57.896 UTC [438950] DETAIL: Consistent recovery state has not been yet reached. 2023-08-29 01:50:58.612 UTC [438947] LOG: unexpected pageaddr E/99864000 in log segment 000000010000000F00000050, offset 8798208 2023-08-29 01:50:58.612 UTC [438947] LOG: redo done at F/50863FB0 system usage: CPU: user: 0.66 s, system: 0.32 s, elapsed: 1.05 s 2023-08-29 01:50:58.629 UTC [438948] LOG: checkpoint starting: end-of-recovery immediate wait 2023-08-29 01:51:07.996 UTC [438948] LOG: checkpoint complete: wrote 35042 buffers (3.3%); 0 WAL file(s) added, 20 removed, 0 recycled; write=8.989 s, sync=0.197 s, total=9.372 s; sync files=254, longest=0.193 s, average=0.001 s; distance=322528 kB, estimate=322528 kB 2023-08-29 01:51:08.027 UTC [432307] LOG: database system is ready to accept connections 2023-08-29 01:52:07.472 UTC [432307] LOG: server process (PID 440110) was terminated by signal 7: Bus error 2023-08-29 01:52:07.472 UTC [432307] DETAIL: Failed process was running: select * from process_route_compliance($1, $2, $3) 2023-08-29 01:52:07.472 UTC [432307] LOG: terminating any other active server processes 2023-08-29 01:52:07.500 UTC [432307] LOG: all server processes terminated; reinitializing 2023-08-29 01:52:07.689 UTC [440184] LOG: database system was interrupted; last known up at 2023-08-29 01:51:07 UTC 2023-08-29 01:52:07.797 UTC [440184] LOG: database system was not properly shut down; automatic recovery in progress 2023-08-29 01:52:07.803 UTC [440184] LOG: redo starts at F/508640C0 2023-08-29 01:52:07.892 UTC [440184] LOG: unexpected pageaddr E/C8C98000 in log segment 000000010000000F00000051, offset 13205504 2023-08-29 01:52:07.892 UTC [440184] LOG: redo done at F/51C97F28 system usage: CPU: user: 0.06 s, system: 0.02 s, elapsed: 0.08 s 2023-08-29 01:52:07.908 UTC [440185] LOG: checkpoint starting: end-of-recovery immediate wait 2023-08-29 01:52:08.085 UTC [440185] LOG: checkpoint complete: wrote 1216 buffers (0.1%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.140 s, sync=0.011 s, total=0.181 s; sync files=39, longest=0.005 s, average=0.001 s; distance=20688 kB, estimate=20688 kB 2023-08-29 01:52:08.104 UTC [432307] LOG: database system is ready to accept connections 2023-08-29 01:52:08.153 UTC [440187] FATAL: the database system is not yet accepting connections 2023-08-29 01:52:08.153 UTC [440187] DETAIL: Consistent recovery state has not been yet reached. 2023-08-29 01:52:09.716 UTC [432307] LOG: server process (PID 440192) was terminated by signal 11: Segmentation fault 2023-08-29 01:52:09.716 UTC [432307] DETAIL: Failed process was running: select * from process_route_compliance($1, $2, $3) 2023-08-29 01:52:09.716 UTC [432307] LOG: terminating any other active server processes 2023-08-29 01:52:09.720 UTC [432307] LOG: all server processes terminated; reinitializing 2023-08-29 01:52:09.873 UTC [440196] LOG: database system was interrupted; last known up at 2023-08-29 01:52:08 UTC 2023-08-29 01:52:09.984 UTC [440196] LOG: database system was not properly shut down; automatic recovery in progress 2023-08-29 01:52:09.998 UTC [440196] LOG: redo starts at F/51C980C0 2023-08-29 01:52:10.001 UTC [440196] LOG: unexpected pageaddr E/C8CEA000 in log segment 000000010000000F00000051, offset 13541376 2023-08-29 01:52:10.001 UTC [440196] LOG: redo done at F/51CE9AB0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s 2023-08-29 01:52:10.024 UTC [440197] LOG: checkpoint starting: end-of-recovery immediate wait 2023-08-29 01:52:10.132 UTC [440197] LOG: checkpoint complete: wrote 58 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.077 s, sync=0.014 s, total=0.113 s; sync files=26, longest=0.005 s, average=0.001 s; distance=328 kB, estimate=328 kB 2023-08-29 01:52:10.141 UTC [432307] LOG: database system is ready to accept connections 2023-08-29 01:52:11.501 UTC [432307] LOG: server process (PID 440272) was terminated by signal 11: Segmentation fault 2023-08-29 01:52:11.501 UTC [432307] DETAIL: Failed process was running: select * from process_route_compliance($1, $2, $3) 2023-08-29 01:52:11.501 UTC [432307] LOG: terminating any other active server processes 2023-08-29 01:52:11.504 UTC [432307] LOG: all server processes terminated; reinitializing 2023-08-29 01:52:11.648 UTC [440279] LOG: database system was interrupted; last known up at 2023-08-29 01:52:10 UTC 2023-08-29 01:52:11.748 UTC [440279] LOG: database system was not properly shut down; automatic recovery in progress 2023-08-29 01:52:11.754 UTC [440279] LOG: redo starts at F/51CEA0C0 2023-08-29 01:52:11.759 UTC [440279] LOG: unexpected pageaddr E/C8D86000 in log segment 000000010000000F00000051, offset 14180352 2023-08-29 01:52:11.759 UTC [440279] LOG: redo done at F/51D85F90 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s 2023-08-29 01:52:11.778 UTC [440280] LOG: checkpoint starting: end-of-recovery immediate wait 2023-08-29 01:52:11.874 UTC [440280] LOG: checkpoint complete: wrote 89 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.062 s, sync=0.016 s, total=0.103 s; sync files=31, longest=0.008 s, average=0.001 s; distance=624 kB, estimate=624 kB 2023-08-29 01:52:11.883 UTC [432307] LOG: database system is ready to accept connections 2023-08-29 01:53:16.516 UTC [432307] LOG: server process (PID 440285) was terminated by signal 11: Segmentation fault 2023-08-29 01:53:16.516 UTC [432307] DETAIL: Failed process was running: select * from process_route_compliance($1, $2, $3)
pgsql-bugs by date: