Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed - Mailing list pgsql-bugs

From r.zharkov@postgrespro.ru
Subject Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed
Date
Msg-id 994bff111c549f45041cd1dc7473548e@postgrespro.ru
Whole thread Raw
In response to Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed  (Alvaro Herrera <alvherre@2ndquadrant.com>)
Responses Re: BUG #15727: PANIC: cannot abort transaction 295144144, it was already committed  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-bugs
On 2019-04-03 19:20, Alvaro Herrera wrote:
> On 2019-Apr-03, r.zharkov@postgrespro.ru wrote:
> 
>> #0  0x00007f5e0dc7b428 in __GI_raise (sig=sig@entry=6) at
>> ../sysdeps/unix/sysv/linux/raise.c:54
>> #1  0x00007f5e0dc7d02a in __GI_abort () at abort.c:89
>> #2  0x0000000000a9b7ce in ExceptionalCondition (
>>     conditionName=0xb3de08 "!((((tuple->t_data->t_infomask) & 0x0080) 
>> ||
>> (((tuple->t_data->t_infomask) & (0x1000 | ((0x0040 | 0x0010) | 0x0040 
>> |
>> 0x0010))) == 0x0040)))",
>>     errorType=0xb39a69 "FailedAssertion", fileName=0xb39ae8 
>> "heapam.c",
>> lineNumber=4286) at assert.c:54
>> #3  0x00000000004ea024 in heap_lock_tuple (relation=0x7f5e0f252d80,
>> tuple=0x30f4dd0, cid=0, mode=LockTupleNoKeyExclusive,
>> wait_policy=LockWaitBlock, follow_updates=false,
>>     buffer=0x7ffed968cbb4, tmfd=0x7ffed968cd80) at heapam.c:4286
> 
> Uhm ... this assertion failure is doesn't seem related to the other
> problem you were reporting.  Here, it's failing because it thinks the
> tuple should be marked as only locked when the current transaction is
> the only Xmax for the tuple.
> 
>             TransactionIdIsCurrentTransactionId(xwait))
>         {
>             /* ... but if the xmax changed in the meantime, start over */
>             LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE);
>             if (xmax_infomask_changed(tuple->t_data->t_infomask, infomask) ||
>                 !TransactionIdEquals(HeapTupleHeaderGetRawXmax(tuple->t_data),
>                                      xwait))
>                 goto l3;
>             Assert(HEAP_XMAX_IS_LOCKED_ONLY(tuple->t_data->t_infomask));    //
> <--- failed assertion
>             require_sleep = false;
>         }
> 
> I'm not quite sure what's up with that.

Today we reproduced the error without the pooller. All server settings 
are the same.

pgbench test -j 32 -T 36000 -f ycsb_read_zipf.sql -f 
ycsb_update_zipf.sql -c 32 -P 60

Backtrace:

[New LWP 29710]
[Thread debugging using libthread_db enabled]
Using host libthread_db library 
"/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `postgres: zharkov test [local] UPDATE             
     '.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f0db56c3428 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:54
54      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007f0db56c3428 in __GI_raise (sig=sig@entry=6) at 
../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007f0db56c502a in __GI_abort () at abort.c:89
#2  0x0000000000a9c0ac in errfinish (dummy=0) at elog.c:552
#3  0x0000000000a9e653 in elog_finish (elevel=22, fmt=0xc7c7c0 
"unexpected table_lock_tuple status: %u") at elog.c:1365
#4  0x00000000007422c1 in ExecUpdate (mtstate=0x2e02370, 
tupleid=0x7fff6a5c1610, oldtuple=0x0, slot=0x2e03db8, 
planSlot=0x2e03c40, epqstate=0x2e02468, estate=0x2e01fc8, 
canSetTag=true) at nodeModifyTable.c:1272
#5  0x0000000000743756 in ExecModifyTable (pstate=0x2e02370) at 
nodeModifyTable.c:2070
#6  0x0000000000712671 in ExecProcNodeFirst (node=0x2e02370) at 
execProcnode.c:445
#7  0x000000000070748c in ExecProcNode (node=0x2e02370) at 
../../../src/include/executor/executor.h:239
#8  0x0000000000709d11 in ExecutePlan (estate=0x2e01fc8, 
planstate=0x2e02370, use_parallel_mode=false, operation=CMD_UPDATE, 
sendTuples=false, numberTuples=0, direction=ForwardScanDirection, 
dest=0x2e050f8,
     execute_once=true) at execMain.c:1643
#9  0x0000000000707ac9 in standard_ExecutorRun (queryDesc=0x2d56a18, 
direction=ForwardScanDirection, count=0, execute_once=true) at 
execMain.c:362
#10 0x0000000000707905 in ExecutorRun (queryDesc=0x2d56a18, 
direction=ForwardScanDirection, count=0, execute_once=true) at 
execMain.c:306
#11 0x0000000000915950 in ProcessQuery (plan=0x2df5d10, 
sourceText=0x2d2d1d8 "UPDATE pgbench_accounts SET abalance = 1 WHERE aid 
= 1;", params=0x0, queryEnv=0x0, dest=0x2e050f8, 
completionTag=0x7fff6a5c1a90 "")
     at pquery.c:161
#12 0x000000000091739c in PortalRunMulti (portal=0x2daf078, 
isTopLevel=true, setHoldSnapshot=false, dest=0x2e050f8, 
altdest=0x2e050f8, completionTag=0x7fff6a5c1a90 "") at pquery.c:1283
#13 0x00000000009168e8 in PortalRun (portal=0x2daf078, 
count=9223372036854775807, isTopLevel=true, run_once=true, 
dest=0x2e050f8, altdest=0x2e050f8, completionTag=0x7fff6a5c1a90 "") at 
pquery.c:796
#14 0x000000000091027a in exec_simple_query (query_string=0x2d2d1d8 
"UPDATE pgbench_accounts SET abalance = 1 WHERE aid = 1;") at 
postgres.c:1215
#15 0x00000000009148ff in PostgresMain (argc=1, argv=0x2d59a80, 
dbname=0x2d59960 "test", username=0x2d29d98 "zharkov") at 
postgres.c:4247
#16 0x000000000086406c in BackendRun (port=0x2d52030) at 
postmaster.c:4399
#17 0x00000000008637cd in BackendStartup (port=0x2d52030) at 
postmaster.c:4090
#18 0x000000000085facf in ServerLoop () at postmaster.c:1703
#19 0x000000000085f2dd in PostmasterMain (argc=3, argv=0x2d27d00) at 
postmaster.c:1376
#20 0x000000000077aeba in main (argc=3, argv=0x2d27d00) at main.c:228

-------------------------------------------

Some information from first core dump ( thanks to Konstantin Knizhnik ):

(gdb) p *tuple->t_data
$6 = {t_choice = {t_heap = {t_xmin = 200004728, t_xmax = 200004747, 
t_field3 = {t_cid = 0, t_xvac = 0}}, t_datum = {datum_len_ = 200004728,
       datum_typmod = 200004747, datum_typeid = 0}}, t_ctid = {ip_blkid = 
{bi_hi = 0, bi_lo = 2337}, ip_posid = 61}, t_infomask2 = 49156, 
t_infomask = 9474,
   t_hoff = 24 '\030', t_bits = 0x7f59f9ea1c17 ""}
(gdb) p xwait
$4 = 200004747
(gdb) p  /x infomask
$7 = 0x2502
(gdb) p result
$8 = TM_Updated
(gdb) p mode
$9 = LockTupleNoKeyExclusive
(gdb) p *CurrentTransactionState
$11 = {fullTransactionId = {value = 200004747}, subTransactionId = 1, 
name = 0x0, savepointLevel = 0, state = TRANS_INPROGRESS, blockState = 
TBLOCK_STARTED,
   nestingLevel = 1, gucNestLevel = 1, curTransactionContext = 0x303cda0, 
curTransactionOwner = 0x3054660, childXids = 0x0, nChildXids = 0, 
maxChildXids = 0,
   prevUser = 10, prevSecContext = 0, prevXactReadOnly = false, 
startedInRecovery = false, didLogXid = false, parallelModeLevel = 0, 
chain = false, parent = 0x0}



-- 
regards

Roman Zharkov



pgsql-bugs by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed
Next
From: Michał Iwańczuk
Date:
Subject: Bug report - incorrect value displayed in jsonb column for large numbers