BUG #17821: Assertion failed in heap_update() due to heap pruning - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #17821: Assertion failed in heap_update() due to heap pruning |
Date | |
Msg-id | 17821-dd8c334263399284@postgresql.org Whole thread Raw |
Responses |
Re: BUG #17821: Assertion failed in heap_update() due to heap pruning
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 17821 Logged by: Alexander Lakhin Email address: exclusion@gmail.com PostgreSQL version: 15.2 Operating system: Ubuntu 22.04 Description: When performing multi-threaded tests, I've stumbled upon a failed assertion with the following stack trace: Core was generated by `postgres: postgres regress003 127.0.0.1(35610) GRANT '. Program terminated with signal SIGABRT, Aborted. #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140650166969280) at ./nptl/pthread_kill.c:44 44 ./nptl/pthread_kill.c: No such file or directory. (gdb) bt #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140650166969280) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=140650166969280) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=140650166969280, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x00007febad28e476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007febad2747f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x0000556725c44981 in ExceptionalCondition (conditionName=conditionName@entry=0x556725dc6b00 "ItemIdIsNormal(lp)", fileName=fileName@entry=0x556725dc48e0 "heapam.c", lineNumber=lineNumber@entry=3024) at assert.c:66 #6 0x00005567249f3aa7 in heap_update (relation=relation@entry=0x7feb61671cd8, otid=otid@entry=0x61900002c3fc, newtup=newtup@entry=0x61900002c3f8, cid=<optimized out>, crosscheck=crosscheck@entry=0x0, wait=wait@entry=true, tmfd=<optimized out>, lockmode=<optimized out>) at heapam.c:3024 #7 0x00005567249f8d46 in simple_heap_update (relation=relation@entry=0x7feb61671cd8, otid=otid@entry=0x61900002c3fc, tup=tup@entry=0x61900002c3f8) at heapam.c:3969 #8 0x0000556724c367ba in CatalogTupleUpdate (heapRel=heapRel@entry=0x7feb61671cd8, otid=0x61900002c3fc, tup=0x61900002c3f8) at indexing.c:309 #9 0x0000556724bfc30c in ExecGrant_Relation (istmt=istmt@entry=0x7ffdeb8ca1e0) at aclchk.c:2041 #10 0x0000556724bffee3 in ExecGrantStmt_oids (istmt=istmt@entry=0x7ffdeb8ca1e0) at aclchk.c:598 #11 0x0000556724c00ab0 in ExecuteGrantStmt (stmt=stmt@entry=0x625000005df0) at aclchk.c:583 #12 0x00005567257ccac7 in ProcessUtilitySlow (pstate=pstate@entry=0x61900002c298, pstmt=pstmt@entry=0x625000005ec0, queryString=queryString@entry=0x625000005218 "GRANT SELECT ON pg_proc TO CURRENT_USER;", context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, dest=<optimized out>, qc=<optimized out>) at utility.c:1802 #13 0x00005567257c92a5 in standard_ProcessUtility (pstmt=pstmt@entry=0x625000005ec0, queryString=queryString@entry=0x625000005218 "GRANT SELECT ON pg_proc TO CURRENT_USER;", readOnlyTree=readOnlyTree@entry=false, context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0, queryEnv=queryEnv@entry=0x0, dest=<optimized out>, qc=<optimized out>) at utility.c:973 #14 0x00007feba7e91d97 in pgss_ProcessUtility (pstmt=<optimized out>, queryString=<optimized out>, readOnlyTree=<optimized out>, context=<optimized out>, params=<optimized out>, queryEnv=<optimized out>, dest=<optimized out>, qc=<optimized out>) at pg_stat_statements.c:1143 #15 0x00005567257c9906 in ProcessUtility (pstmt=pstmt@entry=0x625000005ec0, queryString=<optimized out>, readOnlyTree=<optimized out>, context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=<optimized out>, queryEnv=<optimized out>, dest=0x6250000066f0, qc=0x7ffdeb8cae50) at utility.c:526 #16 0x00005567257c2488 in PortalRunUtility (portal=portal@entry=0x62500002fa18, pstmt=pstmt@entry=0x625000005ec0, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x6250000066f0, qc=qc@entry=0x7ffdeb8cae50) at pquery.c:1158 #17 0x00005567257c3010 in PortalRunMulti (portal=portal@entry=0x62500002fa18, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x6250000066f0, altdest=altdest@entry=0x6250000066f0, qc=qc@entry=0x7ffdeb8cae50) at pquery.c:1315 #18 0x00005567257c3d82 in PortalRun (portal=portal@entry=0x62500002fa18, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x6250000066f0, altdest=altdest@entry=0x6250000066f0, qc=<optimized out>) at pquery.c:791 #19 0x00005567257b7f59 in exec_simple_query (query_string=query_string@entry=0x625000005218 "GRANT SELECT ON pg_proc TO CURRENT_USER;") at postgres.c:1240 #20 0x00005567257bcead in PostgresMain (dbname=dbname@entry=0x62900001b268 "regress003", username=username@entry=0x62900001b248 "postgres") at postgres.c:4572 #21 0x000055672555f72b in BackendRun (port=port@entry=0x614000230640) at postmaster.c:4461 #22 0x0000556725567fc7 in BackendStartup (port=port@entry=0x614000230640) at postmaster.c:4189 #23 0x00005567255685c4 in ServerLoop () at postmaster.c:1779 #24 0x000055672556acbb in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x603000000370) at postmaster.c:1463 #25 0x00005567251c4499 in main (argc=3, argv=0x603000000370) at main.c:200 After series of experiments I came to the following simple reproduction: numclients=2 for ((i=1;i<=300;i++)); do echo "iteration $i" for ((j=1;j<=numclients;j++)); do for ((k=1;k<=300;k++)); do echo "GRANT ALL ON pg_class TO public;"; done | psql >psql-$j.log 2>&1 & done wait grep 'TRAP:' server.log && { res=0; break; } psql -c "VACUUM FULL pg_class" >psql-vacuum.log done pg_ctl -w -t 10 -D "$PGDB" stop ... iteration 3 TRAP: failed Assert("ItemIdIsNormal(lp)"), File: "heapam.c", Line: 3024, PID: 1817582 waiting for server to shut down............. failed To ease the reproduction I did: sed 's/minfree = Max(minfree, BLCKSZ \/ 10)/minfree = Max(minfree, BLCKSZ \/ 1)/' -i src/backend/access/heap/pruneheap.c (I believe that that doesn't change modus vivendi of the page pruning too much.) With some debugging messages added, I see: 2023-03-02 15:22:13.027 MSK|[unknown]|[unknown]|640094f5.1ba5ab|LOG: connection received: host=[local] 2023-03-02 15:22:13.027 MSK|law|test1|640094f5.1ba5ab|LOG: connection authorized: user=law database=test1 application_name=psql 2023-03-02 15:22:13.027 MSK|[unknown]|[unknown]|640094f5.1ba5ac|LOG: connection received: host=[local] 2023-03-02 15:22:13.028 MSK|law|test1|640094f5.1ba5ac|LOG: connection authorized: user=law database=test1 application_name=psql 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|LOG: statement: GRANT ALL ON pg_class TO public; 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|LOG: ExecuteGrantStmt(): before objectNamesToOids() 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|STATEMENT: GRANT ALL ON pg_class TO public; 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|LOG: ExecuteGrantStmt(): after objectNamesToOids() 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|STATEMENT: GRANT ALL ON pg_class TO public; 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|LOG: ExecGrant_Relation() after SearchSysCache1(); ItemPointerGetOffsetNumber(&tuple->t_self): 24 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|STATEMENT: GRANT ALL ON pg_class TO public; 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|LOG: ExecGrant_Relationi(): before CatalogTupleUpdate(); ItemPointerGetOffsetNumber(&newtuple->t_self): 24 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|STATEMENT: GRANT ALL ON pg_class TO public; 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|LOG: heap_update(): after ReadBuffer(); buffer: 2012, ItemPointerGetOffsetNumber(otid): 24 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|STATEMENT: GRANT ALL ON pg_class TO public; 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|LOG: heap_update(): after PageGetItemId(); ItemPointerGetOffsetNumber(otid): 24, lp->lp_flags: 1 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|STATEMENT: GRANT ALL ON pg_class TO public; 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|LOG: statement: GRANT ALL ON pg_class TO public; 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|LOG: ExecuteGrantStmt(): before objectNamesToOids() 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|STATEMENT: GRANT ALL ON pg_class TO public; 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|LOG: heap_page_prune_execute() before ItemIdSetUnused(); buffer: 2012, off: 24 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|STATEMENT: GRANT ALL ON pg_class TO public; 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|LOG: ExecuteGrantStmt(): after objectNamesToOids() 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|STATEMENT: GRANT ALL ON pg_class TO public; 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|LOG: ExecGrant_Relation() after SearchSysCache1(); ItemPointerGetOffsetNumber(&tuple->t_self): 24 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|STATEMENT: GRANT ALL ON pg_class TO public; 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|LOG: ExecGrant_Relationi(): before CatalogTupleUpdate(); ItemPointerGetOffsetNumber(&newtuple->t_self): 24 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|STATEMENT: GRANT ALL ON pg_class TO public; 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|LOG: heap_update(): after ReadBuffer(); buffer: 2012, ItemPointerGetOffsetNumber(otid): 24 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|STATEMENT: GRANT ALL ON pg_class TO public; 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|LOG: heap_update(): after PageGetItemId(); ItemPointerGetOffsetNumber(otid): 24, lp->lp_flags: 0 2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|STATEMENT: GRANT ALL ON pg_class TO public; TRAP: failed Assert("ItemIdIsNormal(lp)"), File: "heapam.c", Line: 3025, PID: 1811884 So it looks like the session 1ba5ac made linepointer 24 unused while pruning in the guts of objectNamesToOids(), then got the same offset (24) from the SearchSysCache1() and finally was surprised inside heap_update() to see an unused line pointer. This raises two questions for me. 1) Was it legal to prune that linepointer (what if ExecGrant do not perform heap_modify_tuple() in this case, will an unused line pointer be left behind?)? 2) Should page pruning be reflected in SysCache somehow?
pgsql-bugs by date: