Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed - Mailing list pgsql-bugs

From r.zharkov@postgrespro.ru
Subject Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed
Date
Msg-id b5d89910ec01a657530d1a49ff0e05dc@postgrespro.ru
Whole thread Raw
In response to Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed  (Andres Freund <andres@anarazel.de>)
Responses Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed  (Andres Freund <andres@anarazel.de>)
Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On 2019-04-02 23:13, Andres Freund wrote:
> Hi,
> 
> On 2019-04-02 12:51:08 -0300, Alvaro Herrera wrote:
>> On 2019-Apr-02, Tom Lane wrote:
>> 
>> > Andres Freund <andres@anarazel.de> writes:
>> > >> 2019-04-01 15:27:38.829 +07 [7524] STATEMENT:  UPDATE pgbench_accounts SET
>> > >> abalance = 1 WHERE aid = 1;
>> > >> 2019-04-01 15:27:38.829 +07 [7524] PANIC:  cannot abort transaction
>> > >> 400048276, it was already committed
>> >
>> > > But that's probably a separate issue.
>> >
>> > What that seems to indicate is that the "unexpected table_lock_tuple
>> > status" error was thrown during commit, which seems pretty odd.
>> 
>> AFAICS this error can only come from ExecDelete(), because the value 1
>> is TM_Invisible and the other callsites where the "unexpected
>> table_lock_tuple" error appears use different wording for that one.
> 
> Hm? Why couldn't it be the ExecUpdate() case?
> 
> 
>> Maybe it's the result of a deferred constraint being checked at that
>> time ... maybe it's trying to honor an "on cascade delete" setting for
>> an FK, and the affected tuple has already been updated or deleted?
> 
> Then it ought to get TM_Deleted, no? We ought to wait till that
> transaction commits, and then roll back. So there's something odd
> happening here.  I suspect there has to be some additional log output 
> or
> such to explain this.
> 
> Greetings,
> 
> Andres Freund

I made the new core file:

[New LWP 5967]
[Thread debugging using libthread_db enabled]
Using host libthread_db library 
"/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: zharkov test 127.0.0.1(35888) UPDATE    
     '.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fe194769428 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:54
54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007fe194769428 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007fe19476b02a in __GI_abort () at abort.c:89
#2  0x0000000000a9c0ac in errfinish (dummy=0) at elog.c:552
#3  0x0000000000a9e653 in elog_finish (elevel=22, fmt=0xb695a0 "cannot 
abort transaction %u, it was already committed") at elog.c:1365
#4  0x0000000000562c87 in RecordTransactionAbort (isSubXact=false) at 
xact.c:1644
#5  0x0000000000563c93 in AbortTransaction () at xact.c:2670
#6  0x0000000000564516 in AbortCurrentTransaction () at xact.c:3191
#7  0x00000000009145b9 in PostgresMain (argc=1, argv=0x255d970, 
dbname=0x255d950 "test", username=0x252dd88 "zharkov") at 
postgres.c:4033
#8  0x000000000086406c in BackendRun (port=0x2556020) at 
postmaster.c:4399
#9  0x00000000008637cd in BackendStartup (port=0x2556020) at 
postmaster.c:4090
#10 0x000000000085facf in ServerLoop () at postmaster.c:1703
#11 0x000000000085f2dd in PostmasterMain (argc=3, argv=0x252bcf0) at 
postmaster.c:1376
#12 0x000000000077aeba in main (argc=3, argv=0x252bcf0) at main.c:228

Log:
2019-04-02 21:51:19.639 +07 [5967] STATEMENT:  UPDATE pgbench_accounts 
SET abalance = 1 WHERE aid = 1;
2019-04-02 21:51:19.639 +07 [5967] PANIC:  cannot abort transaction 
200011850, it was already committed
2019-04-02 21:53:57.924 +07 [2473] LOG:  server process (PID 5967) was 
terminated by signal 6: Aborted
2019-04-02 21:53:57.924 +07 [2473] DETAIL:  Failed process was running: 
UPDATE pgbench_accounts SET abalance = 1 WHERE aid = 1;
2019-04-02 21:53:57.924 +07 [2473] LOG:  terminating any other active 
server processes

We run pgbench with two custom scripts.

First:
\set i random_zipfian(1, 100000 * :scale, 1.01)
SELECT abalance FROM pgbench_accounts WHERE aid = :i;

Second:
\set i random_zipfian(1, 100000 * :scale, 1.01)
UPDATE pgbench_accounts SET abalance = 1 WHERE aid = :i;

pgbench connects to the database through the connection pooller called 
Odyssey.
We use pool of 16 or 32 backends.
We run four series of pgbench tests with different connections count ( 
100, 150, 200, ... , 1000 )
Three series with pool size 16 runs normally.
But fourth series fails with 200-300 connections. It took about five 
hours to reproduce the error this time.

Here is the little piece of the pgbench log:

Bench start
Tue Apr  2 14:43:12 UTC 2019
Timestamp 1554216192
transaction type: multiple scripts
scaling factor: 1
query mode: simple
number of clients: 200
number of threads: 32
duration: 300 s
number of transactions actually processed: 8062981
latency average = 7.442 ms
tps = 26875.375100 (including connections establishing)
tps = 26875.694564 (excluding connections establishing)
SQL script 1: ./sql/ycsb_read_zipf.sql
  - weight: 1 (targets 50.0% of total)
  - 4019635 transactions (49.9% of total, tps = 13398.171023)
  - latency average = 6.955 ms
  - latency stddev = 8.366 ms
SQL script 2: ./sql/ycsb_update_zipf.sql
  - weight: 1 (targets 50.0% of total)
  - 4021508 transactions (49.9% of total, tps = 13404.414071)
  - latency average = 7.909 ms
  - latency stddev = 8.478 ms
Bench end
Tue Apr  2 14:48:12 UTC 2019
Timestamp 1554216492
Bench start
Tue Apr  2 14:48:12 UTC 2019
Timestamp 1554216492
client 220 script 1 aborted in command 1 query 0: PANIC:  cannot abort 
transaction 200011850, it was already committed
WARNING:  terminating connection because of crash of another server 
process

-- 
regards,

Roman Zharkov



pgsql-bugs by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed
Next
From: Jeff Janes
Date:
Subject: Re: BUG #15722: 9000x performance regression on query startingPostgreSQL 10 due to bad query plan