BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm() - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Date
Msg-id 17994-5c7100b51b4790e9@postgresql.org
Whole thread Raw
Responses Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17994
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 16beta1
Operating system:   Ubuntu 22.04
Description:

The following script:
for ((n=1;n<=10;n++)); do
echo "ITERATION $n"

createdb test
numclients=50
for ((c=1;c<=$numclients;c++)); do
  cat << EOF | psql test -o /dev/null &
CREATE PUBLICATION testpub_$c FOR ALL TABLES;
CREATE TYPE rec_$c AS (f1 text, f2 text);
CREATE TEMP TABLE tbl_$c (f rec_$c);
INSERT INTO tbl_$c VALUES ((repeat('1234567890', 100000), 'a'));

UPDATE tbl_$c SET f.f2 = 'b';
EOF
done
wait
dropdb test

grep 'TRAP:' server.log && break;
done

invokes errors or a server crash:
ITERATION 1
ITERATION 2
ERROR:  unsupported byval length: 32639
ITERATION 3
ITERATION 4
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
connection to server was lost
WARNING:  terminating connection because of crash of another server
process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
dropdb: error: database removal failed: server closed the connection
unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
TRAP: failed Assert("(thisatt->attalign) == TYPALIGN_SHORT"), File:
"heaptuple.c", Line: 1312, PID: 1737623

The stack trace of the crash:
Core was generated by `postgres: law test [local] UPDATE
                        '.
Program terminated with signal SIGABRT, Aborted.

warning: Section `.reg-xstate/1737623' in core file too small.
#0  __pthread_kill_implementation (no_tid=0, signo=6,
threadid=140071061268416) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6,
threadid=140071061268416) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140071061268416) at
./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140071061268416, signo=signo@entry=6) at
./nptl/pthread_kill.c:89
#3  0x00007f64d7c27476 in __GI_raise (sig=sig@entry=6) at
../sysdeps/posix/raise.c:26
#4  0x00007f64d7c0d7f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x0000555ede8dc3b3 in ExceptionalCondition (
    conditionName=0x555edeb49700 "(thisatt->attalign) == TYPALIGN_SHORT", 
    fileName=0x555edeb49020 "heaptuple.c", lineNumber=1312) at assert.c:66
#6  0x0000555edc7b76d7 in heap_deform_tuple (tuple=0x7ffe49cfe570,
tupleDesc=0x7f64ca4562d8, 
    values=0x62500005ad40, isnull=0x62500005b130) at heaptuple.c:1312
#7  0x0000555edd389d0b in ExecEvalFieldStoreDeForm (state=0x62500005b230,
op=0x62500005b338, 
    econtext=0x62500005ac20) at execExprInterp.c:3210
#8  0x0000555edd3729b4 in ExecInterpExpr (state=0x62500005b230,
econtext=0x62500005ac20, isnull=0x7ffe49cfe900)
    at execExprInterp.c:1446
#9  0x0000555edd3791f2 in ExecInterpExprStillValid (state=0x62500005b230,
econtext=0x62500005ac20, 
    isNull=0x7ffe49cfe900) at execExprInterp.c:1870
#10 0x0000555edd406e84 in ExecEvalExprSwitchContext (state=0x62500005b230,
econtext=0x62500005ac20, 
    isNull=0x7ffe49cfe900) at ../../../src/include/executor/executor.h:355
#11 0x0000555edd4070ab in ExecProject (projInfo=0x62500005b228) at
../../../src/include/executor/executor.h:389
#12 0x0000555edd40883e in ExecScan (node=0x62500005ab10,
accessMtd=0x555edd5843cf <SeqNext>, 
    recheckMtd=0x555edd5847cd <SeqRecheck>) at execScan.c:237
#13 0x0000555edd58484e in ExecSeqScan (pstate=0x62500005ab10) at
nodeSeqscan.c:112
#14 0x0000555edd3f5a71 in ExecProcNodeFirst (node=0x62500005ab10) at
execProcnode.c:464
#15 0x0000555edd5495a5 in ExecProcNode (node=0x62500005ab10) at
../../../src/include/executor/executor.h:273
#16 0x0000555edd5674b3 in ExecModifyTable (pstate=0x62500005a490) at
nodeModifyTable.c:3616
#17 0x0000555edd3f5a71 in ExecProcNodeFirst (node=0x62500005a490) at
execProcnode.c:464
#18 0x0000555edd3af500 in ExecProcNode (node=0x62500005a490) at
../../../src/include/executor/executor.h:273
#19 0x0000555edd3be25a in ExecutePlan (estate=0x62500005a218,
planstate=0x62500005a490, 
    use_parallel_mode=false, operation=CMD_UPDATE, sendTuples=false,
numberTuples=0, 
    direction=ForwardScanDirection, dest=0x625000072770, execute_once=true)
at execMain.c:1670
#20 0x0000555edd3b17f2 in standard_ExecutorRun (queryDesc=0x619000001598,
direction=ForwardScanDirection, 
    count=0, execute_once=true) at execMain.c:365
#21 0x0000555edd3b0e37 in ExecutorRun (queryDesc=0x619000001598,
direction=ForwardScanDirection, count=0, 
    execute_once=true) at execMain.c:309
#22 0x0000555ede0d688a in ProcessQuery (plan=0x625000006c28, 
    sourceText=0x625000005218 "UPDATE tbl_14 SET f.f2 = 'b';", params=0x0,
queryEnv=0x0, dest=0x625000072770, 
    qc=0x7ffe49cff1c0) at pquery.c:160
#23 0x0000555ede0deb02 in PortalRunMulti (portal=0x625000025a18,
isTopLevel=true, setHoldSnapshot=false, 
    dest=0x625000072770, altdest=0x625000072770, qc=0x7ffe49cff1c0) at
pquery.c:1277
#24 0x0000555ede0dba90 in PortalRun (portal=0x625000025a18,
count=9223372036854775807, isTopLevel=true, 
    run_once=true, dest=0x625000072770, altdest=0x625000072770,
qc=0x7ffe49cff1c0) at pquery.c:791
#25 0x0000555ede0c592d in exec_simple_query (query_string=0x625000005218
"UPDATE tbl_14 SET f.f2 = 'b';")
    at postgres.c:1274
#26 0x0000555ede0d347a in PostgresMain (dbname=0x62900001b358 "test",
username=0x6250000020f8 "law")
    at postgres.c:4632
#27 0x0000555eddcf401f in BackendRun (port=0x614000016040) at
postmaster.c:4461
#28 0x0000555eddcf2723 in BackendStartup (port=0x614000016040) at
postmaster.c:4189
#29 0x0000555eddce84dd in ServerLoop () at postmaster.c:1779
#30 0x0000555eddce69fe in PostmasterMain (argc=3, argv=0x6030000006a0) at
postmaster.c:1463
#31 0x0000555edd67850e in main (argc=3, argv=0x6030000006a0) at main.c:198

(gdb) frame 6
#6  0x0000555edc7b76d7 in heap_deform_tuple (tuple=0x7ffe49cfe570,
tupleDesc=0x7f64ca4562d8, 
    values=0x62500005ad40, isnull=0x62500005b130) at heaptuple.c:1312
1312                            off = att_align_nominal(off,
thisatt->attalign);
(gdb) p thisatt->attalign
$1 = 0 '\000'

(gdb) frame 7
#7  0x0000555edd389d0b in ExecEvalFieldStoreDeForm (state=0x62500005b230,
op=0x62500005b338, 
    econtext=0x62500005ac20) at execExprInterp.c:3210
3210                    heap_deform_tuple(&tmptup, tupDesc,
(gdb) p *tupDesc
$2 = {natts = 18406, tdtypeid = 1952409456, tdtypmod = 1953718639,
tdrefcount = 859320671, constr = 0x3338, 
  attrs = 0x7f64ca4562f0}

With some debug logging added I see (note the natts value):
2023-06-24 15:23:12.488 MSK [1741232] LOG:  get_cached_rowtype() after
lookup_type_cache| type_id: 16389, typentry->tupDesc: 0x7ff12f045640,
typentry->tupDesc->natts: 2
2023-06-24 15:23:12.488 MSK [1741232] STATEMENT:  UPDATE tbl_5 SET f.f2 =
'b';
2023-06-24 15:23:12.488 MSK [1741293] LOG:  statement: CREATE TYPE rec_23 AS
(f1 text, f2 text);
2023-06-24 15:23:12.488 MSK [1741232] LOG:  ExecEvalFieldStoreDeForm()
before heap_deform_tuple| tupDesc: 0x7ff12f045640, tupDesc->natts: 16427
2023-06-24 15:23:12.488 MSK [1741232] STATEMENT:  UPDATE tbl_5 SET f.f2 =
'b';
TRAP: failed Assert("(thisatt->attalign) == TYPALIGN_SHORT"), File:
"heaptuple.c", Line: 1312, PID: 1741232

So it looks like *tupDesc memory was overridden due to relcache invalidation
during ExecEvalFieldStoreDeForm() execution after the get_cached_rowtype()
call.

The pg_usleep(1000) call added inside ExecEvalFieldStoreDeForm() after
get_cached_rowtype() greatly increases the probability of the crash.

Reproduced on REL_11_STABLE (since 96e38fa5e) .. master.


pgsql-bugs by date:

Previous
From: PG Bug reporting form
Date:
Subject: BUG #17993: FK issue on logical replication table
Next
From: Dmitry Dolgov
Date:
Subject: Re: BUG #17949: Adding an index introduces serialisation anomalies.