The following bug has been logged on the website:
Bug reference: 17284
Logged by: Alexander Lakhin
Email address: exclusion@gmail.com
PostgreSQL version: 14.1
Operating system: Ubuntu 20.04
Description:
When running concurrent installchecks (x100) for src/test/isolation, I've
observed the following server crash:
Core was generated by `postgres: postgres regress30 127.0.0.1(55840) SEL'.
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007fd1b69d5535 in __GI_abort () at abort.c:79
#2 0x000055d441cc3a02 in ExceptionalCondition (
conditionName=conditionName@entry=0x55d441e34cd2
"TransactionIdIsValid(tailXid)",
errorType=errorType@entry=0x55d441d1c01d "FailedAssertion",
fileName=fileName@entry=0x55d441e34bc9 "predicate.c",
lineNumber=lineNumber@entry=928) at assert.c:67
#3 0x000055d441b8e0aa in SerialAdd
(minConflictCommitSeqNo=18446744073709551615, xid=95552353) at
predicate.c:928
#4 SummarizeOldestCommittedSxact () at predicate.c:1540
#5 GetSerializableTransactionSnapshotInt (snapshot=0x55d441fad980
<CurrentSnapshotData>,
sourcevxid=sourcevxid@entry=0x0, sourcepid=sourcepid@entry=-1) at
predicate.c:1816
#6 0x000055d441b90bc6 in GetSerializableTransactionSnapshot
(snapshot=<optimized out>) at predicate.c:1710
#7 0x000055d441d036f5 in GetTransactionSnapshot () at snapmgr.c:347
#8 GetTransactionSnapshot () at snapmgr.c:306
#9 0x000055d441b9da4d in exec_simple_query (query_string=0x55d443545990
"select id from D2;") at postgres.c:1128
#10 0x000055d441b9f0a9 in PostgresMain (argc=<optimized out>,
argv=argv@entry=0x55d4435793d8, dbname=<optimized out>,
username=<optimized out>) at postgres.c:4339
#11 0x000055d441b1236f in BackendRun (port=0x55d443574c30,
port=0x55d443574c30) at postmaster.c:4526
#12 BackendStartup (port=0x55d443574c30) at postmaster.c:4210
#13 ServerLoop () at postmaster.c:1739
#14 0x000055d441b1336c in PostmasterMain (argc=3, argv=0x55d443540330) at
postmaster.c:1412
#15 0x000055d441824f5b in main (argc=3, argv=0x55d443540330) at main.c:210
test two-ids ... FAILED (test process exited with exit
code 1) 4193 ms
It can be easily reproduced on the build made with 'CPPFLAGS="-O0
-DTEST_SUMMARIZE_SERIAL"' by the following script (based on the simplified
script by Thomas Munro from bug#17116):
concurrency=2
output=/tmp/junk
rm -rf $output
printf "$(printf "test: two-ids\n%.0s" `seq 100`)"
>/tmp/isolation_schedule
for c in `seq $concurrency`; do
mkdir -p $output/results_${c}
(EXTRA_REGRESS_OPTS="--dbname=regress_${c} --outputdir=$output/results_${c}
--schedule=/tmp/isolation_schedule" make -C src/test/isolation installcheck)
\
> $output/out.${c}.log 2>&1 &
done
wait
coredumpctl --no-pager
("fsync=off" facilitates the bug reproduction for me.)
Reproduced on REL_13_STABLE..master.