Thread: BUG #17284: Assert failed in SerialAdd() when the summarize_serial mode is engaged
BUG #17284: Assert failed in SerialAdd() when the summarize_serial mode is engaged
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 17284 Logged by: Alexander Lakhin Email address: exclusion@gmail.com PostgreSQL version: 14.1 Operating system: Ubuntu 20.04 Description: When running concurrent installchecks (x100) for src/test/isolation, I've observed the following server crash: Core was generated by `postgres: postgres regress30 127.0.0.1(55840) SEL'. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007fd1b69d5535 in __GI_abort () at abort.c:79 #2 0x000055d441cc3a02 in ExceptionalCondition ( conditionName=conditionName@entry=0x55d441e34cd2 "TransactionIdIsValid(tailXid)", errorType=errorType@entry=0x55d441d1c01d "FailedAssertion", fileName=fileName@entry=0x55d441e34bc9 "predicate.c", lineNumber=lineNumber@entry=928) at assert.c:67 #3 0x000055d441b8e0aa in SerialAdd (minConflictCommitSeqNo=18446744073709551615, xid=95552353) at predicate.c:928 #4 SummarizeOldestCommittedSxact () at predicate.c:1540 #5 GetSerializableTransactionSnapshotInt (snapshot=0x55d441fad980 <CurrentSnapshotData>, sourcevxid=sourcevxid@entry=0x0, sourcepid=sourcepid@entry=-1) at predicate.c:1816 #6 0x000055d441b90bc6 in GetSerializableTransactionSnapshot (snapshot=<optimized out>) at predicate.c:1710 #7 0x000055d441d036f5 in GetTransactionSnapshot () at snapmgr.c:347 #8 GetTransactionSnapshot () at snapmgr.c:306 #9 0x000055d441b9da4d in exec_simple_query (query_string=0x55d443545990 "select id from D2;") at postgres.c:1128 #10 0x000055d441b9f0a9 in PostgresMain (argc=<optimized out>, argv=argv@entry=0x55d4435793d8, dbname=<optimized out>, username=<optimized out>) at postgres.c:4339 #11 0x000055d441b1236f in BackendRun (port=0x55d443574c30, port=0x55d443574c30) at postmaster.c:4526 #12 BackendStartup (port=0x55d443574c30) at postmaster.c:4210 #13 ServerLoop () at postmaster.c:1739 #14 0x000055d441b1336c in PostmasterMain (argc=3, argv=0x55d443540330) at postmaster.c:1412 #15 0x000055d441824f5b in main (argc=3, argv=0x55d443540330) at main.c:210 test two-ids ... FAILED (test process exited with exit code 1) 4193 ms It can be easily reproduced on the build made with 'CPPFLAGS="-O0 -DTEST_SUMMARIZE_SERIAL"' by the following script (based on the simplified script by Thomas Munro from bug#17116): concurrency=2 output=/tmp/junk rm -rf $output printf "$(printf "test: two-ids\n%.0s" `seq 100`)" >/tmp/isolation_schedule for c in `seq $concurrency`; do mkdir -p $output/results_${c} (EXTRA_REGRESS_OPTS="--dbname=regress_${c} --outputdir=$output/results_${c} --schedule=/tmp/isolation_schedule" make -C src/test/isolation installcheck) \ > $output/out.${c}.log 2>&1 & done wait coredumpctl --no-pager ("fsync=off" facilitates the bug reproduction for me.) Reproduced on REL_13_STABLE..master.
Re: BUG #17284: Assert failed in SerialAdd() when the summarize_serial mode is engaged
From
Thomas Munro
Date:
On Sun, Nov 14, 2021 at 1:35 AM PG Bug reporting form <noreply@postgresql.org> wrote: > It can be easily reproduced on the build made with 'CPPFLAGS="-O0 > -DTEST_SUMMARIZE_SERIAL"' by the following script (based on the simplified > script by Thomas Munro from bug#17116): I assume from the separate bug report that this fails even if you apply the patches from that other thread? https://www.postgresql.org/message-id/flat/17116-d6ca217acc180e30%40postgresql.org
Re: BUG #17284: Assert failed in SerialAdd() when the summarize_serial mode is engaged
From
Alexander Lakhin
Date:
Hello Thomas, 13.11.2021 21:23, Thomas Munro wrote: > On Sun, Nov 14, 2021 at 1:35 AM PG Bug reporting form > <noreply@postgresql.org> wrote: >> It can be easily reproduced on the build made with 'CPPFLAGS="-O0 >> -DTEST_SUMMARIZE_SERIAL"' by the following script (based on the simplified >> script by Thomas Munro from bug#17116): > I assume from the separate bug report that this fails even if you > apply the patches from that other thread? > > https://www.postgresql.org/message-id/flat/17116-d6ca217acc180e30%40postgresql.org Yes, applying patches v4-0001-Fix-assert-failures-in-parallel-SERIALIZABLE-READ and v4-0002-Fix-race-in-SERIALIZABLE-READ-ONLY from that thread doesn't affect this assert (tested on the master branch). Best regards, Alexander
Re: BUG #17284: Assert failed in SerialAdd() when the summarize_serial mode is engaged
From
Alexander Lakhin
Date:
Hello hackers, 13.11.2021 15:00, PG Bug reporting form wrote: > When running concurrent installchecks (x100) for src/test/isolation, I've > observed the following server crash: > Core was generated by `postgres: postgres regress30 127.0.0.1(55840) SEL'. > Program terminated with signal SIGABRT, Aborted. > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. > (gdb) bt > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > #1 0x00007fd1b69d5535 in __GI_abort () at abort.c:79 > #2 0x000055d441cc3a02 in ExceptionalCondition ( > conditionName=conditionName@entry=0x55d441e34cd2 > "TransactionIdIsValid(tailXid)", > errorType=errorType@entry=0x55d441d1c01d "FailedAssertion", > fileName=fileName@entry=0x55d441e34bc9 "predicate.c", > lineNumber=lineNumber@entry=928) at assert.c:67 > #3 0x000055d441b8e0aa in SerialAdd > (minConflictCommitSeqNo=18446744073709551615, xid=95552353) at > predicate.c:928 The issue is reproduced (on REL_14_STABLE) with the following command sequence (can be replayed with pgreplay): 2021-12-29 18:00:15.000 MSK|postgres|regression|61cc77ff.18af00|LOG: statement: create table D1 (id int not null); create table D2 (id int not null); insert into D1 values (1); insert into D2 values (1); 2021-12-29 18:00:15.100 MSK|postgres|regression|61cc77ff.18af00|LOG: disconnection: session time: 0:00:01.129 user=postgres database=regression host=127.0.0.1 port=46730 2021-12-29 18:00:15.200 MSK|postgres|regression|61cc77ff.18af01|LOG: statement: begin isolation level serializable; 2021-12-29 18:00:15.300 MSK|postgres|regression|61cc77ff.18af02|LOG: statement: begin isolation level serializable; 2021-12-29 18:00:15.400 MSK|postgres|regression|61cc77ff.18af02|LOG: statement: update D2 set id = id + 1; 2021-12-29 18:00:15.500 MSK|postgres|regression|61cc77ff.18af02|LOG: statement: commit; 2021-12-29 18:00:15.600 MSK|postgres|regression|61cc77ff.18af03|LOG: statement: begin isolation level serializable; 2021-12-29 18:00:15.700 MSK|postgres|regression|61cc77ff.18af01|LOG: statement: update D1 set id = id + 1; 2021-12-29 18:00:15.800 MSK|postgres|regression|61cc77ff.18af03|LOG: statement: select id from D1; 2021-12-29 18:00:15.800 MSK|postgres|regression|61cc77ff.18af01|LOG: disconnection: session time: 0:00:01.000 user=postgres database=regression host=127.0.0.1 port=46731 2021-12-29 18:00:15.900 MSK|postgres|regression|61cc77ff.18af02|LOG: statement: select pg_sleep(1); To ease a reproduction, the following delays can be added to predicate.c: @@ -1502,6 +1507,7 @@ SummarizeOldestCommittedSxact(void) { SERIALIZABLEXACT *sxact; + pg_usleep(1000000); LWLockAcquire(SerializableFinishedListLock, LW_EXCLUSIVE); /* @@ -3745,6 +3760,7 @@ ClearOldPredicateLocks(void) SERIALIZABLEXACT *finishedSxact; PREDICATELOCK *predlock; + pg_usleep(2000000); /* * Loop through finished transactions. They are in commit order, so we can * stop as soon as we find one that's still interesting. With these delays and CPPFLAGS="-DTEST_SUMMARIZE_SERIAL" the assertion failure can be observed just on `make check`. As I can see, on disconnect of session 18af01 the ReleasePredicateLocks() function calls SetNewSxactGlobalXmin() where SerialSetActiveSerXmin(called with PredXact->SxactGlobalXmin == InvalidTransactionId) sets serialControl->tailXid to InvalidTransactionId. Then session 18af03 calls SummarizeOldestCommittedSxact(), that in case !SHMQueueEmpty(FinishedSerializableTransactions) calls SerialAdd() where the check TransactionIdIsValid(serialControl->tailXid) fails. But if ReleasePredicateLocks() in session 18af01 happens to reach ReleasePredicateLocks(), the FinishedSerializableTransactions queue gets empty, and it would prevent calling SerialAdd() in session 18af03. So there is a small window (that probably can be widened by LWLockAcquire()) between SetNewSxactGlobalXmin(); needToClear = true; and if (needToClear) ClearOldPredicateLocks(); where a concurrent transaction can fail in SummarizeOldestCommittedSxact(). Best regards, Alexander
Re: BUG #17284: Assert failed in SerialAdd() when the summarize_serial mode is engaged
From
Heikki Linnakangas
Date:
This fell through the cracks at the time, but just to close the loop on this thread: This was reported again at https://www.postgresql.org/message-id/18658-7dab125ec688c70b%40postgresql.org, and fixed in commit 1a43de5e0a. -- Heikki Linnakangas Neon (https://neon.tech)