Re: BUG #18658: Assert in SerialAdd() due to race condition - Mailing list pgsql-bugs
From | Andrew Bille |
---|---|
Subject | Re: BUG #18658: Assert in SerialAdd() due to race condition |
Date | |
Msg-id | CAJnzarxBO18Ha==H3hX28yz2zd7cVuasXv1wxyJ7ERd9DzJf=Q@mail.gmail.com Whole thread Raw |
In response to | BUG #18658: Assert in SerialAdd() due to race condition (PG Bug reporting form <noreply@postgresql.org>) |
List | pgsql-bugs |
Hello again!
Using a build with no asserts enabled and without that define, I can also see the following errors using the script with increased concurrency (Initially we got a report from our client, who encountered these errors in production):
2024-10-17 04:58:29.774 UTC|test|regression|67109975.ed069|PANIC: could not access status of transaction 18204
2024-10-17 04:58:29.774 UTC|test|regression|67109975.ed069|DETAIL: Could not open file "pg_serial/0000": No such file or directory.
I changed ERROR to PANIC to get a core dump.
echo "CREATE TABLE t(f1 int);" | psql
res=0
for i in `seq 200`; do
echo "I $i"
for l in `seq 80`; do
echo "
SET default_transaction_isolation to 'SERIALIZABLE';
SELECT 'INSERT INTO t VALUES(42);' FROM generate_series(1, 100) g
\gexec
" | psql -q &
done
wait
grep 'PANIC: could not access status' server.log && { res=1; break; }
grep 'TRAP: ' server.log && { res=1; break; }
done
#5 0x00005f824729a10a in errfinish (filename=0x5f824732df83 "slru.c", lineno=1060, funcname=0x5f824732e220 <__func__.3> "SlruReportIOError") at elog.c:599
#6 0x00005f8246c16c26 in SlruReportIOError (ctl=0x5f8247686c80 <SerialSlruCtlData>, pageno=17, xid=18204) at slru.c:1060
#7 0x00005f8246c15cea in SimpleLruReadPage (ctl=0x5f8247686c80 <SerialSlruCtlData>, pageno=17, write_ok=true, xid=18204) at slru.c:579
#8 0x00005f82470988c2 in SerialAdd (xid=18204, minConflictCommitSeqNo=18446744073709551615) at predicate.c:928
#9 0x00005f8247099854 in SummarizeOldestCommittedSxact () at predicate.c:1520
#10 0x00005f8247099ddd in GetSerializableTransactionSnapshotInt (snapshot=0x5f8247691740 <CurrentSnapshotData>, sourcevxid=0x0, sourcepid=-1) at predicate.c:1798
#11 0x00005f8247099c72 in GetSerializableTransactionSnapshot (snapshot=0x5f8247691740 <CurrentSnapshotData>) at predicate.c:1692
#12 0x00005f82472ee3b1 in GetTransactionSnapshot () at snapmgr.c:257
#13 0x00005f82470adfe8 in exec_simple_query (query_string=0x5f8248e06a28 "INSERT INTO t VALUES(42);") at postgres.c:1173
#14 0x00005f82470b3453 in PostgresMain (dbname=0x5f8248e030f0 "regression", username=0x5f8248e3fe08 "test") at postgres.c:4793
#15 0x00005f82470aa5ac in BackendMain (startup_data=0x7fffdf93a054 "", startup_data_len=4) at backend_startup.c:107
#16 0x00005f8246fd6a6a in postmaster_child_launch (child_type=B_BACKEND, startup_data=0x7fffdf93a054 "", startup_data_len=4, client_sock=0x7fffdf93a0a0) at launch_backend.c:274
#17 0x00005f8246fdc4ed in BackendStartup (client_sock=0x7fffdf93a0a0) at postmaster.c:3420
#18 0x00005f8246fd9dd0 in ServerLoop () at postmaster.c:1653
#19 0x00005f8246fd96d0 in PostmasterMain (argc=1, argv=0x5f8248e01090) at postmaster.c:1351
#20 0x00005f8246e9729e in main (argc=1, argv=0x5f8248e01090) at main.c:197
Assert would not be triggered in this case, because of:
(gdb) fr 8
#8 0x00005f82470988c2 in SerialAdd (xid=18204, minConflictCommitSeqNo=18446744073709551615) at predicate.c:928
928 slotno = SimpleLruReadPage(SerialSlruCtl, targetPage, true, xid);
(gdb) p *serialControl
$1 = {headPage = 18, headXid = 18654, tailXid = 18557}
I'm not sure if this is the same problem.
Thanks to Alexandr Lakhin for help with reproducing the issue!
Using a build with no asserts enabled and without that define, I can also see the following errors using the script with increased concurrency (Initially we got a report from our client, who encountered these errors in production):
2024-10-17 04:58:29.774 UTC|test|regression|67109975.ed069|PANIC: could not access status of transaction 18204
2024-10-17 04:58:29.774 UTC|test|regression|67109975.ed069|DETAIL: Could not open file "pg_serial/0000": No such file or directory.
I changed ERROR to PANIC to get a core dump.
echo "CREATE TABLE t(f1 int);" | psql
res=0
for i in `seq 200`; do
echo "I $i"
for l in `seq 80`; do
echo "
SET default_transaction_isolation to 'SERIALIZABLE';
SELECT 'INSERT INTO t VALUES(42);' FROM generate_series(1, 100) g
\gexec
" | psql -q &
done
wait
grep 'PANIC: could not access status' server.log && { res=1; break; }
grep 'TRAP: ' server.log && { res=1; break; }
done
#5 0x00005f824729a10a in errfinish (filename=0x5f824732df83 "slru.c", lineno=1060, funcname=0x5f824732e220 <__func__.3> "SlruReportIOError") at elog.c:599
#6 0x00005f8246c16c26 in SlruReportIOError (ctl=0x5f8247686c80 <SerialSlruCtlData>, pageno=17, xid=18204) at slru.c:1060
#7 0x00005f8246c15cea in SimpleLruReadPage (ctl=0x5f8247686c80 <SerialSlruCtlData>, pageno=17, write_ok=true, xid=18204) at slru.c:579
#8 0x00005f82470988c2 in SerialAdd (xid=18204, minConflictCommitSeqNo=18446744073709551615) at predicate.c:928
#9 0x00005f8247099854 in SummarizeOldestCommittedSxact () at predicate.c:1520
#10 0x00005f8247099ddd in GetSerializableTransactionSnapshotInt (snapshot=0x5f8247691740 <CurrentSnapshotData>, sourcevxid=0x0, sourcepid=-1) at predicate.c:1798
#11 0x00005f8247099c72 in GetSerializableTransactionSnapshot (snapshot=0x5f8247691740 <CurrentSnapshotData>) at predicate.c:1692
#12 0x00005f82472ee3b1 in GetTransactionSnapshot () at snapmgr.c:257
#13 0x00005f82470adfe8 in exec_simple_query (query_string=0x5f8248e06a28 "INSERT INTO t VALUES(42);") at postgres.c:1173
#14 0x00005f82470b3453 in PostgresMain (dbname=0x5f8248e030f0 "regression", username=0x5f8248e3fe08 "test") at postgres.c:4793
#15 0x00005f82470aa5ac in BackendMain (startup_data=0x7fffdf93a054 "", startup_data_len=4) at backend_startup.c:107
#16 0x00005f8246fd6a6a in postmaster_child_launch (child_type=B_BACKEND, startup_data=0x7fffdf93a054 "", startup_data_len=4, client_sock=0x7fffdf93a0a0) at launch_backend.c:274
#17 0x00005f8246fdc4ed in BackendStartup (client_sock=0x7fffdf93a0a0) at postmaster.c:3420
#18 0x00005f8246fd9dd0 in ServerLoop () at postmaster.c:1653
#19 0x00005f8246fd96d0 in PostmasterMain (argc=1, argv=0x5f8248e01090) at postmaster.c:1351
#20 0x00005f8246e9729e in main (argc=1, argv=0x5f8248e01090) at main.c:197
Assert would not be triggered in this case, because of:
(gdb) fr 8
#8 0x00005f82470988c2 in SerialAdd (xid=18204, minConflictCommitSeqNo=18446744073709551615) at predicate.c:928
928 slotno = SimpleLruReadPage(SerialSlruCtl, targetPage, true, xid);
(gdb) p *serialControl
$1 = {headPage = 18, headXid = 18654, tailXid = 18557}
I'm not sure if this is the same problem.
Thanks to Alexandr Lakhin for help with reproducing the issue!
On Wed, Oct 16, 2024 at 10:33 PM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:
Bug reference: 18658
Logged by: Andrew Bille
Email address: andrewbille@gmail.com
PostgreSQL version: 17.0
Operating system: Ubuntu 20.04
Description:
Hello!
The following script against a server (master c259b1578) compiled with
-DTEST_SUMMARIZE_SERIAL (without the define it's harder to reproduce...):
echo "CREATE TABLE t(f1 int);" | psql test
for i in seq 100; do
echo "
SET default_transaction_isolation to 'SERIALIZABLE';
SELECT 'INSERT INTO t VALUES(42);' FROM generate_series(1, 100) g
\gexec
" | psql -q test &
echo "
SET default_transaction_isolation to 'SERIALIZABLE';
SELECT 'INSERT INTO t VALUES(42);' FROM generate_series(1, 100) g
\gexec
" | psql -q test &
wait
grep 'TRAP:' server.log && break;
done
fails for me with the following stack trace:
#5 0x000055e6aaf3fef5 in ExceptionalCondition
(conditionName=conditionName@entry=0x55e6ab1232d1
"TransactionIdIsValid(tailXid)", fileName=fileName@entry=0x55e6ab1231d9
"predicate.c", lineNumber=lineNumber@entry=886) at assert.c:66
#6 0x000055e6aadcde64 in SerialAdd (xid=976743,
minConflictCommitSeqNo=18446744073709551615) at predicate.c:886
#7 0x000055e6aadd0877 in SummarizeOldestCommittedSxact () at
predicate.c:1521
#8 0x000055e6aadd094f in GetSerializableTransactionSnapshotInt
(snapshot=snapshot@entry=0x55e6ab32f3e0 <CurrentSnapshotData>,
sourcevxid=sourcevxid@entry=0x0, sourcepid=sourcepid@entry=-1) at
predicate.c:1789
#9 0x000055e6aadd2a51 in GetSerializableTransactionSnapshot
(snapshot=snapshot@entry=0x55e6ab32f3e0 <CurrentSnapshotData>) at
predicate.c:1693
#10 0x000055e6aaf868c4 in GetTransactionSnapshot () at snapmgr.c:257
#11 0x000055e6aade2d9a in exec_simple_query
(query_string=query_string@entry=0x55e6ab433fe0 "SELECT 'INSERT INTO t
VALUES(42);' FROM generate_series(1, 100) g\n ") at postgres.c:1173
#12 0x000055e6aade516f in PostgresMain (dbname=<optimized out>,
username=<optimized out>) at postgres.c:4793
#13 0x000055e6aadde301 in BackendMain (startup_data=<optimized out>,
startup_data_len=<optimized out>) at backend_startup.c:107
#14 0x000055e6aad2cbf8 in postmaster_child_launch
(child_type=child_type@entry=B_BACKEND,
startup_data=startup_data@entry=0x7ffc1ca820e4 "",
startup_data_len=startup_data_len@entry=4,
client_sock=client_sock@entry=0x7ffc1ca82120)
at launch_backend.c:274
#15 0x000055e6aad30e72 in BackendStartup
(client_sock=client_sock@entry=0x7ffc1ca82120) at postmaster.c:3420
#16 0x000055e6aad31092 in ServerLoop () at postmaster.c:1653
#17 0x000055e6aad324e6 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x55e6ab42e390) at postmaster.c:1351
#18 0x000055e6aac48bd7 in main (argc=3, argv=0x55e6ab42e390) at main.c:197
This is also reproduced on dafaa3efb
Implement genuine serializable isolation level.
...
Thanks, Andrew
pgsql-bugs by date: