BUG #18031: Segmentation fault after deadlock within VACUUM's parallel worker - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #18031: Segmentation fault after deadlock within VACUUM's parallel worker |
Date | |
Msg-id | 18031-ebe2d08cb405f6cc@postgresql.org Whole thread Raw |
Responses |
Re: BUG #18031: Segmentation fault after deadlock within VACUUM's parallel worker
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 18031 Logged by: Alexander Lakhin Email address: exclusion@gmail.com PostgreSQL version: 16beta2 Operating system: Ubuntu 22.04 Description: The following script: numclients=100 for ((c=1;c<=$numclients;c++)); do createdb db$c done for ((i=1;i<=50;i++)); do echo "ITERATION $i" for ((c=1;c<=$numclients;c++)); do echo "SELECT format('CREATE TABLE t1_%s_$i (t TEXT);', g) FROM generate_series(1,10) g \\gexec" | psql db$c >psql1_$c.out 2>&1 & echo "SELECT format('CREATE TABLE t2_%s_$i (t TEXT);', g) FROM generate_series(1,10) g \\gexec" | psql db$c >psql2_$c.out 2>&1 & echo "SELECT 'VACUUM FULL pg_class' FROM generate_series(1,10) g \\gexec" | psql db$c >psql3_$c.out 2>&1 & done wait grep -C1 'signal 11' server.log && break; done when executed with the custom settings: fsync = off max_connections = 1000 deadlock_timeout = 100ms min_parallel_table_scan_size = 1kB Leads to a server crash: ... ITERATION 15 2023-07-19 17:01:46.542 MSK [1449673] STATEMENT: VACUUM FULL pg_class 2023-07-19 17:01:46.617 MSK [1434014] LOG: background worker "parallel worker" (PID 1450094) was terminated by signal 11: Segmentation fault 2023-07-19 17:01:46.617 MSK [1434014] DETAIL: Failed process was running: VACUUM FULL pg_class With the following stack trace: Core was generated by `postgres: parallel worker for PID 1449148 waiting '. Program terminated with signal SIGSEGV, Segmentation fault. warning: Section `.reg-xstate/1450094' in core file too small. #0 RemoveFromWaitQueue (proc=0x7fbae3910400, hashcode=1106083722) at lock.c:1874 1874 LOCKMETHODID lockmethodid = LOCK_LOCKMETHOD(*waitLock); (gdb) bt #0 RemoveFromWaitQueue (proc=0x7fbae3910400, hashcode=1106083722) at lock.c:1874 #1 0x000055dcfc60f595 in LockErrorCleanup () at proc.c:735 #2 0x000055dcfc618757 in ProcessInterrupts () at postgres.c:3207 #3 0x000055dcfc60fbe1 in ProcSleep (locallock=locallock@entry=0x55dcfe964380, lockMethodTable=lockMethodTable@entry=0x55dcfca92800 <default_lockmethod>) at proc.c:1295 #4 0x000055dcfc5ffe0b in WaitOnLock (locallock=locallock@entry=0x55dcfe964380, owner=owner@entry=0x55dcfe978ed8) at lock.c:1818 #5 0x000055dcfc601ba5 in LockAcquireExtended (locktag=locktag@entry=0x7ffd32e28110, lockmode=lockmode@entry=5, sessionLock=sessionLock@entry=false, dontWait=dontWait@entry=false, reportMemoryError=reportMemoryError@entry=true, locallockp=locallockp@entry=0x0) at lock.c:1082 #6 0x000055dcfc601c67 in LockAcquire (locktag=locktag@entry=0x7ffd32e28110, lockmode=lockmode@entry=5, sessionLock=sessionLock@entry=false, dontWait=dontWait@entry=false) at lock.c:740 #7 0x000055dcfc5fe85f in XactLockTableWait (xid=xid@entry=43588, rel=rel@entry=0x7fbad92f4348, ctid=ctid@entry=0x55dcfe9f74b4, oper=oper@entry=XLTW_InsertIndexUnique) at lmgr.c:702 #8 0x000055dcfc2720d2 in heapam_index_build_range_scan (heapRelation=0x7fbad92f4348, indexRelation=0x7fbad92f90c8, indexInfo=0x55dcfe9f7138, allow_sync=<optimized out>, anyvisible=false, progress=false, start_blockno=0, numblocks=4294967295, callback=0x55dcfc29aba6 <_bt_build_callback>, callback_state=0x7ffd32e28610, scan=0x55dcfe9f7458) at heapam_handler.c:1496 #9 0x000055dcfc29af07 in table_index_build_scan (scan=<optimized out>, callback_state=0x7ffd32e28610, callback=0x55dcfc29aba6 <_bt_build_callback>, progress=false, allow_sync=true, index_info=0x55dcfe9f7138, index_rel=<optimized out>, table_rel=<optimized out>) at ../../../../src/include/access/tableam.h:1781 #10 _bt_parallel_scan_and_sort (btspool=btspool@entry=0x55dcfe9f7d08, btspool2=btspool2@entry=0x0, btshared=btshared@entry=0x7fbad93122a0, sharedsort=sharedsort@entry=0x7fbad9312240, sharedsort2=sharedsort2@entry=0x0, sortmem=<optimized out>, progress=false) at nbtsort.c:1985 #11 0x000055dcfc29d343 in _bt_parallel_build_main (seg=0x55dcfe979608, toc=0x7fbad9312000) at nbtsort.c:1888 #12 0x000055dcfc2c64a6 in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1520 #13 0x000055dcfc56faf0 in StartBackgroundWorker () at bgworker.c:861 #14 0x000055dcfc57631a in do_start_bgworker (rw=rw@entry=0x55dcfe969ad0) at postmaster.c:5762 #15 0x000055dcfc576496 in maybe_start_bgworkers () at postmaster.c:5986 #16 0x000055dcfc576dbf in process_pm_pmsignal () at postmaster.c:5149 #17 0x000055dcfc57736b in ServerLoop () at postmaster.c:1770 #18 0x000055dcfc5788ef in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x55dcfe938620) at postmaster.c:1463 #19 0x000055dcfc4964f5 in main (argc=3, argv=0x55dcfe938620) at main.c:198 (I got also another stack trace: (gdb) bt #0 RemoveFromWaitQueue (proc=0x7f763dde44c0, hashcode=1514899366) at lock.c:1874 #1 0x0000559984e4e595 in LockErrorCleanup () at proc.c:735 #2 0x0000559984b125cc in AbortTransaction () at xact.c:2735 #3 0x0000559984b1307d in AbortCurrentTransaction () at xact.c:3338 #4 0x0000559984e5b68a in PostgresMain (dbname=<optimized out>, username=0x55998640c6d8 "law") at postgres.c:4354 #5 0x0000559984db3183 in BackendRun (port=port@entry=0x559986445a60) at postmaster.c:4461 #6 0x0000559984db623a in BackendStartup (port=port@entry=0x559986445a60) at postmaster.c:4189 #7 0x0000559984db63d8 in ServerLoop () at postmaster.c:1779 #8 0x0000559984db78ef in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x55998640a620) at postmaster.c:1463 #9 0x0000559984cd54f5 in main (argc=3, argv=0x55998640a620) at main.c:198 ) server.log also contains: 2023-07-19 17:01:46.542 MSK [1449148] ERROR: deadlock detected 2023-07-19 17:01:46.542 MSK [1449148] DETAIL: Process 1449148 waits for ShareLock on transaction 43588; blocked by process 1449150. Process 1449150 waits for RowExclusiveLock on relation 1259 of database 16385; blocked by process 1449148. Process 1449148: VACUUM FULL pg_class Process 1449150: CREATE TABLE t2_2_15 (t TEXT); 2023-07-19 17:01:46.542 MSK [1449148] HINT: See server log for query details. 2023-07-19 17:01:46.542 MSK [1449148] CONTEXT: while checking uniqueness of tuple (28,39) in relation "pg_class" 2023-07-19 17:01:46.542 MSK [1449148] STATEMENT: VACUUM FULL pg_class It looks like in LockErrorCleanup(): if (!dlist_node_is_detached(&MyProc->links)) { /* We could not have been granted the lock yet */ RemoveFromWaitQueue(MyProc, lockAwaited->hashcode); } MyProc->links happened to be not detached, but MyProc->waitLock is NULL. `git bisect` for this anomaly points at 5764f611e.
pgsql-bugs by date: