BUG #16990: Random PANIC in qemu user context - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #16990: Random PANIC in qemu user context
Date
Msg-id 16990-10b586bc699fd234@postgresql.org
Whole thread Raw
Responses Re: BUG #16990: Random PANIC in qemu user context  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      16990
Logged by:          Paul Guyot
Email address:      pguyot@kallisys.net
PostgreSQL version: 11.11
Operating system:   qemu-arm-static chrooted raspios inside ubuntu
Description:

Within GitHub Actions Workflow, a qemu chrooted environment is created from
a RaspiOS lite image, within which latest availble postgresql is installed
from apt (postgresql 11.11).
Then tests of embedded software are executed, which includes creating a
postgresql database and performing few benign operations (as far as
PostgreSQL is concerned). Tests run perfectly fine in a desktop-like
environment as well as on real devices.

Within this qemu context, randomly yet quite frequently, postgresql
PANICs.
Latest log was the following :
2021-05-02 09:22:21.591 BST [15024] PANIC:  stuck spinlock detected at
LWLockWaitListLock,
/build/postgresql-11-rRyn74/postgresql-11-11.11/build/../src/backend/storage/lmgr/lwlock.c:832
qemu: uncaught target signal 6 (Aborted) - core dumped
2021-05-02 09:22:21.597 BST [15022] PANIC:  stuck spinlock detected at
LWLockWaitListLock,
/build/postgresql-11-rRyn74/postgresql-11-11.11/build/../src/backend/storage/lmgr/lwlock.c:832
qemu: uncaught target signal 6 (Aborted) - core dumped
2021-05-02 09:22:21.762 BST [15423] pynab@test_pynab PANIC:  stuck spinlock
detected at LWLockWaitListLock,
/build/postgresql-11-rRyn74/postgresql-11-11.11/build/../src/backend/storage/lmgr/lwlock.c:832
2021-05-02 09:22:21.762 BST [15423] pynab@test_pynab STATEMENT:  SELECT
"django_content_type"."id", "django_content_type"."app_label",
"django_content_type"."model" FROM "django_content_type" WHERE
"django_content_type"."app_label" = 'auth'
qemu: uncaught target signal 6 (Aborted) - core dumped
2021-05-02 09:22:24.481 BST [15011] LOG:  server process (PID 15423) was
terminated by signal 6: Aborted
2021-05-02 09:22:24.481 BST [15011] DETAIL:  Failed process was running:
SELECT "django_content_type"."id", "django_content_type"."app_label",
"django_content_type"."model" FROM "django_content_type" WHERE
"django_content_type"."app_label" = 'auth'
2021-05-02 09:22:24.481 BST [15011] LOG:  terminating any other active
server processes
2021-05-02 09:22:24.567 BST [15011] LOG:  all server processes terminated;
reinitializing
2021-05-02 09:22:24.601 BST [15512] LOG:  database system was interrupted;
last known up at 2021-05-02 09:18:11 BST
2021-05-02 09:22:24.692 BST [15512] LOG:  database system was not properly
shut down; automatic recovery in progress
2021-05-02 09:22:24.699 BST [15512] LOG:  redo starts at 0/171E170
2021-05-02 09:22:25.045 BST [15512] LOG:  invalid record length at
0/1957948: wanted 24, got 0
2021-05-02 09:22:25.046 BST [15512] LOG:  redo done at 0/1957910
2021-05-02 09:22:25.048 BST [15512] LOG:  last completed transaction was at
log time 2021-05-02 09:20:04.917746+01
2021-05-02 09:22:25.096 BST [15011] LOG:  database system is ready to accept
connections

The log is publicly available here :
https://github.com/pguyot/pynab/runs/2485660214?check_suite_focus=true

Notice how sluggish the test is compared to when PostgreSQL doesn't PANIC,
with the same environment. For example, this run worked perfectly under 20
minutes:
https://github.com/pguyot/pynab/runs/2483559259?check_suite_focus=true

I tried to update CI script to upload the full raspbian image in case of
panics to get my hands on the core dump, but it's so sluggish I'm not sure
it will not timeout eventually. I wonder if this sluggishness is not a cause
of the PANIC. Could you please advise about how to investigate further this
crash?


pgsql-bugs by date:

Previous
From: Brar Piening
Date:
Subject: Re: BUG #16988: Spurious "SET LOCAL can only be used in transaction blocks" warning using implicit transaction block
Next
From: Alexander Korotkov
Date:
Subject: Re: BUG #16986: reindex error on ltree index