Thread: BUG #18236: Backend processing a parallel query terminates badly when postmaster killed with SIGKILL
BUG #18236: Backend processing a parallel query terminates badly when postmaster killed with SIGKILL
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18236 Logged by: Alexander Lakhin Email address: exclusion@gmail.com PostgreSQL version: 16.1 Operating system: Ubuntu 22.04 Description: The following script, which starts a backend with parallel workers and then kills postmaster: cat << 'EOF' | psql & CREATE TABLE t (a int) WITH (parallel_workers = 2); INSERT INTO t SELECT g FROM generate_series(1, 10000) g; CREATE FUNCTION f(i int) RETURNS int PARALLEL SAFE LANGUAGE plpgsql AS $$ BEGIN PERFORM pg_sleep(0.001); RETURN i; END; $$; SET parallel_setup_cost = 0; SET parallel_tuple_cost = 0; SELECT avg(f(a)) FROM t; EOF sleep 1 kill -9 $(head -1 "$PGDATA/postmaster.pid") causes an assertion failure: TRAP: failed Assert("!IsTransactionOrTransactionBlock()"), File: "pgstat.c", Line: 591, PID: 2893946 (discovered while testing [1]) with the following call stack: ... #5 0x00005593007db894 in ExceptionalCondition (conditionName=0x5593009cdbe0 "!IsTransactionOrTransactionBlock()", fileName=0x5593009cda87 "pgstat.c", lineNumber=591) at assert.c:66 #6 0x000055930061b581 in pgstat_report_stat (force=true) at pgstat.c:591 #7 0x000055930061b499 in pgstat_shutdown_hook (code=1, arg=0) at pgstat.c:520 #8 0x00005593005b6fdf in shmem_exit (code=1) at ipc.c:243 #9 0x00005593005b6e83 in proc_exit_prepare (code=1) at ipc.c:198 #10 0x00005593005b6dc7 in proc_exit (code=1) at ipc.c:111 #11 0x00005593007dc8e2 in errfinish (filename=0x559300882a7e "parallel.c", lineno=908, funcname=0x559300882e70 <__func__.8> "WaitForParallelWorkersToExit") at elog.c:591 #12 0x000055930012fb03 in WaitForParallelWorkersToExit (pcxt=0x55930229ca28) at parallel.c:908 #13 0x000055930012fccb in DestroyParallelContext (pcxt=0x55930229ca28) at parallel.c:981 #14 0x00005593001304cf in AtEOXact_Parallel (isCommit=false) at parallel.c:1254 #15 0x000055930013ee49 in AbortTransaction () at xact.c:2792 #16 0x00005593001419a8 in AbortOutOfAnyTransaction () at xact.c:4755 #17 0x00005593007f6109 in ShutdownPostgres (code=1, arg=0) at postinit.c:1349 #18 0x00005593005b6fdf in shmem_exit (code=1) at ipc.c:243 #19 0x00005593005b6e83 in proc_exit_prepare (code=1) at ipc.c:198 #20 0x00005593005b6dc7 in proc_exit (code=1) at ipc.c:111 #21 0x00005593005b92b5 in WaitEventSetWaitBlock (set=0x559302263d08, cur_timeout=2, occurred_events=0x7ffdf264c7d0, nevents=1) at latch.c:1600 #22 0x00005593005b9025 in WaitEventSetWait (set=0x559302263d08, timeout=2, occurred_events=0x7ffdf264c7d0, nevents=1, wait_event_info=150994946) at latch.c:1475 #23 0x00005593005b82c2 in WaitLatch (latch=0x7f94b3362184, wakeEvents=41, timeout=2, wait_event_info=150994946) at latch.c:513 #24 0x00005593006d8505 in pg_sleep (fcinfo=0x559302399b30) at misc.c:406 #25 0x000055930032f3ee in ExecInterpExpr (state=0x559302399a58, econtext=0x559302399780, isnull=0x7ffdf264caff) at execExprInterp.c:758 ... With that Assert in pgstat.c removed, another failure can be seen: WARNING: buffer refcount leak: [1810] (rel=base/16384/16385, blockNum=6, flags=0x93800000, refcount=1 2) accompanied with an assertion failure: ... #5 0x00005619b49fb86d in ExceptionalCondition (conditionName=0x5619b4bda3be "RefCountErrors == 0", fileName=0x5619b4bd9ba8 "bufmgr.c", lineNumber=3224) at assert.c:66 #6 0x00005619b47c1156 in CheckForBufferLeaks () at bufmgr.c:3224 #7 0x00005619b47c1075 in AtProcExit_Buffers (code=1, arg=0) at bufmgr.c:3178 #8 0x00005619b47d7097 in shmem_exit (code=1) at ipc.c:276 #9 0x00005619b47d6e83 in proc_exit_prepare (code=1) at ipc.c:198 #10 0x00005619b47d6dc7 in proc_exit (code=1) at ipc.c:111 #11 0x00005619b49fc8bb in errfinish (filename=0x5619b4aa2a7e "parallel.c", lineno=908, funcname=0x5619b4aa2e70 <__func__.8> "WaitForParallelWorkersToExit") at elog.c:591 #12 0x00005619b434fb03 in WaitForParallelWorkersToExit (pcxt=0x5619b570fa28) at parallel.c:908 #13 0x00005619b434fccb in DestroyParallelContext (pcxt=0x5619b570fa28) at parallel.c:981 #14 0x00005619b43504cf in AtEOXact_Parallel (isCommit=false) at parallel.c:1254 #15 0x00005619b435ee49 in AbortTransaction () at xact.c:2792 #16 0x00005619b43619a8 in AbortOutOfAnyTransaction () at xact.c:4755 #17 0x00005619b4a160e2 in ShutdownPostgres (code=1, arg=0) at postinit.c:1349 #18 0x00005619b47d6fdf in shmem_exit (code=1) at ipc.c:243 #19 0x00005619b47d6e83 in proc_exit_prepare (code=1) at ipc.c:198 #20 0x00005619b47d6dc7 in proc_exit (code=1) at ipc.c:111 #21 0x00005619b47d92b5 in WaitEventSetWaitBlock (set=0x5619b56d6d08, cur_timeout=2, occurred_events=0x7ffda437c590, nevents=1) at latch.c:1600 #22 0x00005619b47d9025 in WaitEventSetWait (set=0x5619b56d6d08, timeout=2, occurred_events=0x7ffda437c590, nevents=1, wait_event_info=150994946) at latch.c:1475 #23 0x00005619b47d82c2 in WaitLatch (latch=0x7f2f0fc49184, wakeEvents=41, timeout=2, wait_event_info=150994946) at latch.c:513 #24 0x00005619b48f84de in pg_sleep (fcinfo=0x5619b580cb30) at misc.c:406 #25 0x00005619b454f3ee in ExecInterpExpr (state=0x5619b580ca58, econtext=0x5619b580c780, isnull=0x7ffda437c8bf) at execExprInterp.c:758 ... Or with the last query in a transaction: BEGIN; INSERT INTO t VALUES(0); SELECT avg(f(a)) FROM t; END; ... #5 0x0000562f2850886d in ExceptionalCondition (conditionName=0x562f286eb178 "!TransactionIdIsValid(ProcGlobal->xids[myoff])", fileName=0x562f286eb030 "procarray.c", lineNumber=606) at assert.c:66 #6 0x0000562f282e7ec9 in ProcArrayRemove (proc=0x7f0bcad62160, latestXid=0) at procarray.c:606 #7 0x0000562f283127ab in RemoveProcFromArray (code=1, arg=0) at proc.c:794 #8 0x0000562f282e4097 in shmem_exit (code=1) at ipc.c:276 #9 0x0000562f282e3e83 in proc_exit_prepare (code=1) at ipc.c:198 #10 0x0000562f282e3dc7 in proc_exit (code=1) at ipc.c:111 #11 0x0000562f285098bb in errfinish (filename=0x562f285afa7e "parallel.c", lineno=908, funcname=0x562f285afe70 <__func__.8> "WaitForParallelWorkersToExit") at elog.c:591 #12 0x0000562f27e5cb03 in WaitForParallelWorkersToExit (pcxt=0x562f29861b08) at parallel.c:908 #13 0x0000562f27e5cccb in DestroyParallelContext (pcxt=0x562f29861b08) at parallel.c:981 #14 0x0000562f27e5d4cf in AtEOXact_Parallel (isCommit=false) at parallel.c:1254 #15 0x0000562f27e6be49 in AbortTransaction () at xact.c:2792 #16 0x0000562f27e6e9a8 in AbortOutOfAnyTransaction () at xact.c:4755 #17 0x0000562f285230e2 in ShutdownPostgres (code=1, arg=0) at postinit.c:1349 #18 0x0000562f282e3fdf in shmem_exit (code=1) at ipc.c:243 #19 0x0000562f282e3e83 in proc_exit_prepare (code=1) at ipc.c:198 #20 0x0000562f282e3dc7 in proc_exit (code=1) at ipc.c:111 #21 0x0000562f282e62b5 in WaitEventSetWaitBlock (set=0x562f29828d08, cur_timeout=2, occurred_events=0x7ffe231d0af0, nevents=1) at latch.c:1600 #22 0x0000562f282e6025 in WaitEventSetWait (set=0x562f29828d08, timeout=2, occurred_events=0x7ffe231d0af0, nevents=1, wait_event_info=150994946) at latch.c:1475 #23 0x0000562f282e52c2 in WaitLatch (latch=0x7f0bcad62184, wakeEvents=41, timeout=2, wait_event_info=150994946) at latch.c:513 #24 0x0000562f284054de in pg_sleep (fcinfo=0x562f2995cb20) at misc.c:406 #25 0x0000562f2805c3ee in ExecInterpExpr (state=0x562f2995ca48, econtext=0x562f2995c770, isnull=0x7ffe231d0e1f) at execExprInterp.c:758 ... The backend terminates cleanly with WARNING instead of FATAL here: if (status == BGWH_POSTMASTER_DIED) ereport(FATAL, (errcode(ERRCODE_ADMIN_SHUTDOWN), errmsg("postmaster exited during a parallel transaction"))); [1] https://www.postgresql.org/message-id/5e976369-2925-e0cc-b5a1-e9e356264596%40gmail.com