BUG #17821: Assertion failed in heap_update() due to heap pruning - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #17821: Assertion failed in heap_update() due to heap pruning
Date
Msg-id 17821-dd8c334263399284@postgresql.org
Whole thread Raw
Responses Re: BUG #17821: Assertion failed in heap_update() due to heap pruning
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17821
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 15.2
Operating system:   Ubuntu 22.04
Description:

When performing multi-threaded tests, I've stumbled upon a failed
assertion with the following stack trace:

Core was generated by `postgres: postgres regress003 127.0.0.1(35610) GRANT
                        '.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (no_tid=0, signo=6,
threadid=140650166969280) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6,
threadid=140650166969280) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140650166969280) at
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140650166969280, signo=signo@entry=6) at
./nptl/pthread_kill.c:89
#3  0x00007febad28e476 in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#4  0x00007febad2747f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x0000556725c44981 in ExceptionalCondition
(conditionName=conditionName@entry=0x556725dc6b00 "ItemIdIsNormal(lp)",
fileName=fileName@entry=0x556725dc48e0 "heapam.c",
lineNumber=lineNumber@entry=3024) at assert.c:66
#6  0x00005567249f3aa7 in heap_update
(relation=relation@entry=0x7feb61671cd8, otid=otid@entry=0x61900002c3fc,
newtup=newtup@entry=0x61900002c3f8, cid=<optimized out>,
crosscheck=crosscheck@entry=0x0, wait=wait@entry=true, 
    tmfd=<optimized out>, lockmode=<optimized out>) at heapam.c:3024
#7  0x00005567249f8d46 in simple_heap_update
(relation=relation@entry=0x7feb61671cd8, otid=otid@entry=0x61900002c3fc,
tup=tup@entry=0x61900002c3f8) at heapam.c:3969
#8  0x0000556724c367ba in CatalogTupleUpdate
(heapRel=heapRel@entry=0x7feb61671cd8, otid=0x61900002c3fc,
tup=0x61900002c3f8) at indexing.c:309
#9  0x0000556724bfc30c in ExecGrant_Relation
(istmt=istmt@entry=0x7ffdeb8ca1e0) at aclchk.c:2041
#10 0x0000556724bffee3 in ExecGrantStmt_oids
(istmt=istmt@entry=0x7ffdeb8ca1e0) at aclchk.c:598
#11 0x0000556724c00ab0 in ExecuteGrantStmt (stmt=stmt@entry=0x625000005df0)
at aclchk.c:583
#12 0x00005567257ccac7 in ProcessUtilitySlow
(pstate=pstate@entry=0x61900002c298, pstmt=pstmt@entry=0x625000005ec0,
queryString=queryString@entry=0x625000005218 "GRANT SELECT ON pg_proc TO
CURRENT_USER;", 
    context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0,
queryEnv=queryEnv@entry=0x0, dest=<optimized out>, qc=<optimized out>) at
utility.c:1802
#13 0x00005567257c92a5 in standard_ProcessUtility
(pstmt=pstmt@entry=0x625000005ec0,
queryString=queryString@entry=0x625000005218 "GRANT SELECT ON pg_proc TO
CURRENT_USER;", readOnlyTree=readOnlyTree@entry=false, 
    context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=params@entry=0x0,
queryEnv=queryEnv@entry=0x0, dest=<optimized out>, qc=<optimized out>) at
utility.c:973
#14 0x00007feba7e91d97 in pgss_ProcessUtility (pstmt=<optimized out>,
queryString=<optimized out>, readOnlyTree=<optimized out>,
context=<optimized out>, params=<optimized out>, queryEnv=<optimized out>,
dest=<optimized out>, 
    qc=<optimized out>) at pg_stat_statements.c:1143
#15 0x00005567257c9906 in ProcessUtility (pstmt=pstmt@entry=0x625000005ec0,
queryString=<optimized out>, readOnlyTree=<optimized out>,
context=context@entry=PROCESS_UTILITY_TOPLEVEL, params=<optimized out>,
queryEnv=<optimized out>, 
    dest=0x6250000066f0, qc=0x7ffdeb8cae50) at utility.c:526
#16 0x00005567257c2488 in PortalRunUtility
(portal=portal@entry=0x62500002fa18, pstmt=pstmt@entry=0x625000005ec0,
isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x6250000066f0,

    qc=qc@entry=0x7ffdeb8cae50) at pquery.c:1158
#17 0x00005567257c3010 in PortalRunMulti
(portal=portal@entry=0x62500002fa18, isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x6250000066f0,
altdest=altdest@entry=0x6250000066f0, 
    qc=qc@entry=0x7ffdeb8cae50) at pquery.c:1315
#18 0x00005567257c3d82 in PortalRun (portal=portal@entry=0x62500002fa18,
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true,
run_once=run_once@entry=true, dest=dest@entry=0x6250000066f0, 
    altdest=altdest@entry=0x6250000066f0, qc=<optimized out>) at
pquery.c:791
#19 0x00005567257b7f59 in exec_simple_query
(query_string=query_string@entry=0x625000005218 "GRANT SELECT ON pg_proc TO
CURRENT_USER;") at postgres.c:1240
#20 0x00005567257bcead in PostgresMain (dbname=dbname@entry=0x62900001b268
"regress003", username=username@entry=0x62900001b248 "postgres") at
postgres.c:4572
#21 0x000055672555f72b in BackendRun (port=port@entry=0x614000230640) at
postmaster.c:4461
#22 0x0000556725567fc7 in BackendStartup (port=port@entry=0x614000230640) at
postmaster.c:4189
#23 0x00005567255685c4 in ServerLoop () at postmaster.c:1779
#24 0x000055672556acbb in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x603000000370) at postmaster.c:1463
#25 0x00005567251c4499 in main (argc=3, argv=0x603000000370) at main.c:200

After series of experiments I came to the following simple reproduction:
numclients=2
for ((i=1;i<=300;i++)); do
  echo "iteration $i"
  for ((j=1;j<=numclients;j++)); do
    for ((k=1;k<=300;k++)); do echo "GRANT ALL ON pg_class TO public;"; done
| psql >psql-$j.log 2>&1 &
  done
  wait
  grep 'TRAP:' server.log && { res=0; break; }
  psql -c "VACUUM FULL pg_class" >psql-vacuum.log
done
pg_ctl -w -t 10 -D "$PGDB" stop

...
iteration 3
TRAP: failed Assert("ItemIdIsNormal(lp)"), File: "heapam.c", Line: 3024,
PID: 1817582
waiting for server to shut down............. failed

To ease the reproduction I did:
sed 's/minfree = Max(minfree, BLCKSZ \/ 10)/minfree = Max(minfree, BLCKSZ \/
1)/' -i src/backend/access/heap/pruneheap.c
(I believe that that doesn't change modus vivendi of the page pruning
too much.)
With some debugging messages added, I see:

2023-03-02 15:22:13.027 MSK|[unknown]|[unknown]|640094f5.1ba5ab|LOG:
connection received: host=[local]
2023-03-02 15:22:13.027 MSK|law|test1|640094f5.1ba5ab|LOG:  connection
authorized: user=law database=test1 application_name=psql
2023-03-02 15:22:13.027 MSK|[unknown]|[unknown]|640094f5.1ba5ac|LOG:
connection received: host=[local]
2023-03-02 15:22:13.028 MSK|law|test1|640094f5.1ba5ac|LOG:  connection
authorized: user=law database=test1 application_name=psql
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|LOG:  statement: GRANT
ALL ON pg_class TO public;
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|LOG:
ExecuteGrantStmt(): before objectNamesToOids()
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|STATEMENT:  GRANT ALL
ON pg_class TO public;
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|LOG:
ExecuteGrantStmt(): after objectNamesToOids()
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|STATEMENT:  GRANT ALL
ON pg_class TO public;
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|LOG:
ExecGrant_Relation() after SearchSysCache1();
ItemPointerGetOffsetNumber(&tuple->t_self): 24
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|STATEMENT:  GRANT ALL
ON pg_class TO public;
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|LOG:
ExecGrant_Relationi(): before CatalogTupleUpdate();
ItemPointerGetOffsetNumber(&newtuple->t_self): 24
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|STATEMENT:  GRANT ALL
ON pg_class TO public;
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|LOG:  heap_update():
after ReadBuffer(); buffer: 2012, ItemPointerGetOffsetNumber(otid): 24
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|STATEMENT:  GRANT ALL
ON pg_class TO public;
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|LOG:  heap_update():
after PageGetItemId(); ItemPointerGetOffsetNumber(otid): 24, lp->lp_flags:
1
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ab|STATEMENT:  GRANT ALL
ON pg_class TO public;
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|LOG:  statement: GRANT
ALL ON pg_class TO public;
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|LOG:
ExecuteGrantStmt(): before objectNamesToOids()
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|STATEMENT:  GRANT ALL
ON pg_class TO public;
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|LOG:
heap_page_prune_execute() before ItemIdSetUnused(); buffer: 2012, off: 24
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|STATEMENT:  GRANT ALL
ON pg_class TO public;
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|LOG:
ExecuteGrantStmt(): after objectNamesToOids()
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|STATEMENT:  GRANT ALL
ON pg_class TO public;
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|LOG:
ExecGrant_Relation() after SearchSysCache1();
ItemPointerGetOffsetNumber(&tuple->t_self): 24
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|STATEMENT:  GRANT ALL
ON pg_class TO public;
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|LOG:
ExecGrant_Relationi(): before CatalogTupleUpdate();
ItemPointerGetOffsetNumber(&newtuple->t_self): 24
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|STATEMENT:  GRANT ALL
ON pg_class TO public;
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|LOG:  heap_update():
after ReadBuffer(); buffer: 2012, ItemPointerGetOffsetNumber(otid): 24
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|STATEMENT:  GRANT ALL
ON pg_class TO public;
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|LOG:  heap_update():
after PageGetItemId(); ItemPointerGetOffsetNumber(otid): 24, lp->lp_flags:
0
2023-03-02 15:22:13.031 MSK|law|test1|640094f5.1ba5ac|STATEMENT:  GRANT ALL
ON pg_class TO public;
TRAP: failed Assert("ItemIdIsNormal(lp)"), File: "heapam.c", Line: 3025,
PID: 1811884

So it looks like the session 1ba5ac made linepointer 24 unused while
pruning in the guts of objectNamesToOids(), then got the same
offset (24) from the SearchSysCache1() and finally was surprised
inside heap_update() to see an unused line pointer.
This raises two questions for me.
1) Was it legal to prune that linepointer (what if ExecGrant do not
perform
heap_modify_tuple() in this case, will an unused line pointer be left
behind?)?
2) Should page pruning be reflected in SysCache somehow?


pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #17819: The assert failed #17334 has not been repaired
Next
From: "David G. Johnston"
Date:
Subject: BUG #17818: CTE is not returning the right data.