Re: BUG #18658: Assert in SerialAdd() due to race condition - Mailing list pgsql-bugs

From Andrew Bille
Subject Re: BUG #18658: Assert in SerialAdd() due to race condition
Date
Msg-id CAJnzarxBO18Ha==H3hX28yz2zd7cVuasXv1wxyJ7ERd9DzJf=Q@mail.gmail.com
Whole thread Raw
In response to BUG #18658: Assert in SerialAdd() due to race condition  (PG Bug reporting form <noreply@postgresql.org>)
List pgsql-bugs
Hello again!

Using a build with no asserts enabled and without that define, I can also see the following errors using the script with increased concurrency (Initially we got a report from our client, who encountered these errors in production):

2024-10-17 04:58:29.774 UTC|test|regression|67109975.ed069|PANIC:  could not access status of transaction 18204
2024-10-17 04:58:29.774 UTC|test|regression|67109975.ed069|DETAIL:  Could not open file "pg_serial/0000": No such file or directory.

I changed ERROR to PANIC to get a core dump.

echo "CREATE TABLE t(f1 int);" | psql

res=0

for i in `seq 200`; do
    echo "I $i"
    for l in `seq 80`; do
        echo "
SET default_transaction_isolation to 'SERIALIZABLE';
SELECT 'INSERT INTO t VALUES(42);' FROM generate_series(1, 100) g
\gexec
" | psql -q &
    done
    wait

    grep 'PANIC:  could not access status' server.log && { res=1; break; }
    grep 'TRAP: ' server.log && { res=1; break; }

done

#5  0x00005f824729a10a in errfinish (filename=0x5f824732df83 "slru.c", lineno=1060, funcname=0x5f824732e220 <__func__.3> "SlruReportIOError") at elog.c:599
#6  0x00005f8246c16c26 in SlruReportIOError (ctl=0x5f8247686c80 <SerialSlruCtlData>, pageno=17, xid=18204) at slru.c:1060
#7  0x00005f8246c15cea in SimpleLruReadPage (ctl=0x5f8247686c80 <SerialSlruCtlData>, pageno=17, write_ok=true, xid=18204) at slru.c:579
#8  0x00005f82470988c2 in SerialAdd (xid=18204, minConflictCommitSeqNo=18446744073709551615) at predicate.c:928
#9  0x00005f8247099854 in SummarizeOldestCommittedSxact () at predicate.c:1520
#10 0x00005f8247099ddd in GetSerializableTransactionSnapshotInt (snapshot=0x5f8247691740 <CurrentSnapshotData>, sourcevxid=0x0, sourcepid=-1) at predicate.c:1798
#11 0x00005f8247099c72 in GetSerializableTransactionSnapshot (snapshot=0x5f8247691740 <CurrentSnapshotData>) at predicate.c:1692
#12 0x00005f82472ee3b1 in GetTransactionSnapshot () at snapmgr.c:257
#13 0x00005f82470adfe8 in exec_simple_query (query_string=0x5f8248e06a28 "INSERT INTO t VALUES(42);") at postgres.c:1173
#14 0x00005f82470b3453 in PostgresMain (dbname=0x5f8248e030f0 "regression", username=0x5f8248e3fe08 "test") at postgres.c:4793
#15 0x00005f82470aa5ac in BackendMain (startup_data=0x7fffdf93a054 "", startup_data_len=4) at backend_startup.c:107
#16 0x00005f8246fd6a6a in postmaster_child_launch (child_type=B_BACKEND, startup_data=0x7fffdf93a054 "", startup_data_len=4, client_sock=0x7fffdf93a0a0) at launch_backend.c:274
#17 0x00005f8246fdc4ed in BackendStartup (client_sock=0x7fffdf93a0a0) at postmaster.c:3420
#18 0x00005f8246fd9dd0 in ServerLoop () at postmaster.c:1653
#19 0x00005f8246fd96d0 in PostmasterMain (argc=1, argv=0x5f8248e01090) at postmaster.c:1351
#20 0x00005f8246e9729e in main (argc=1, argv=0x5f8248e01090) at main.c:197

Assert would not be triggered in this case, because of:

(gdb) fr 8
#8  0x00005f82470988c2 in SerialAdd (xid=18204, minConflictCommitSeqNo=18446744073709551615) at predicate.c:928
928                     slotno = SimpleLruReadPage(SerialSlruCtl, targetPage, true, xid);
(gdb)  p *serialControl
$1 = {headPage = 18, headXid = 18654, tailXid = 18557}

I'm not sure if this is the same problem.

Thanks to Alexandr Lakhin for help with reproducing the issue!


On Wed, Oct 16, 2024 at 10:33 PM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      18658
Logged by:          Andrew Bille
Email address:      andrewbille@gmail.com
PostgreSQL version: 17.0
Operating system:   Ubuntu 20.04
Description:       

Hello!

The following script against a server (master c259b1578) compiled with
-DTEST_SUMMARIZE_SERIAL (without the define it's harder to reproduce...):

echo "CREATE TABLE t(f1 int);" | psql test

for i in seq 100; do
    echo "
    SET default_transaction_isolation to 'SERIALIZABLE';
    SELECT 'INSERT INTO t VALUES(42);' FROM generate_series(1, 100) g
    \gexec
    " | psql -q test &

    echo "
    SET default_transaction_isolation to 'SERIALIZABLE';
    SELECT 'INSERT INTO t VALUES(42);' FROM generate_series(1, 100) g
    \gexec
    " | psql -q test &

    wait

    grep 'TRAP:' server.log && break;
done

fails for me with the following stack trace:
#5  0x000055e6aaf3fef5 in ExceptionalCondition
(conditionName=conditionName@entry=0x55e6ab1232d1
"TransactionIdIsValid(tailXid)", fileName=fileName@entry=0x55e6ab1231d9
"predicate.c", lineNumber=lineNumber@entry=886) at assert.c:66
#6  0x000055e6aadcde64 in SerialAdd (xid=976743,
minConflictCommitSeqNo=18446744073709551615) at predicate.c:886
#7  0x000055e6aadd0877 in SummarizeOldestCommittedSxact () at
predicate.c:1521
#8  0x000055e6aadd094f in GetSerializableTransactionSnapshotInt
(snapshot=snapshot@entry=0x55e6ab32f3e0 <CurrentSnapshotData>,
sourcevxid=sourcevxid@entry=0x0, sourcepid=sourcepid@entry=-1) at
predicate.c:1789
#9  0x000055e6aadd2a51 in GetSerializableTransactionSnapshot
(snapshot=snapshot@entry=0x55e6ab32f3e0 <CurrentSnapshotData>) at
predicate.c:1693
#10 0x000055e6aaf868c4 in GetTransactionSnapshot () at snapmgr.c:257
#11 0x000055e6aade2d9a in exec_simple_query
(query_string=query_string@entry=0x55e6ab433fe0 "SELECT 'INSERT INTO t
VALUES(42);' FROM generate_series(1, 100) g\n        ") at postgres.c:1173
#12 0x000055e6aade516f in PostgresMain (dbname=<optimized out>,
username=<optimized out>) at postgres.c:4793
#13 0x000055e6aadde301 in BackendMain (startup_data=<optimized out>,
startup_data_len=<optimized out>) at backend_startup.c:107
#14 0x000055e6aad2cbf8 in postmaster_child_launch
(child_type=child_type@entry=B_BACKEND,
startup_data=startup_data@entry=0x7ffc1ca820e4 "",
startup_data_len=startup_data_len@entry=4,
client_sock=client_sock@entry=0x7ffc1ca82120)
    at launch_backend.c:274
#15 0x000055e6aad30e72 in BackendStartup
(client_sock=client_sock@entry=0x7ffc1ca82120) at postmaster.c:3420
#16 0x000055e6aad31092 in ServerLoop () at postmaster.c:1653
#17 0x000055e6aad324e6 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x55e6ab42e390) at postmaster.c:1351
#18 0x000055e6aac48bd7 in main (argc=3, argv=0x55e6ab42e390) at main.c:197

This is also reproduced on dafaa3efb
    Implement genuine serializable isolation level.

...

Thanks, Andrew

pgsql-bugs by date:

Previous
From: Tender Wang
Date:
Subject: Re: BUG #18657: Using JSON_OBJECTAGG with volatile function leads to segfault
Next
From: Ba Jinsheng
Date:
Subject: Re: Performance Issue on Query 18 of TPC-H Benchmark