Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed
Date
Msg-id 20190406171705.dogsasgftooz5rf5@alap3.anarazel.de
Whole thread Raw
In response to Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed  (r.zharkov@postgrespro.ru)
Responses Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed  (r.zharkov@postgrespro.ru)
List pgsql-bugs
Hi,

On 2019-04-07 00:09:15 +0700, r.zharkov@postgrespro.ru wrote:
> On 2019-04-06 23:28, Andres Freund wrote:
> > Hi,
> > 
> > Let me have a look at the testcase - I'd been running Roman's testcase
> > for quite a few hours without being able to reproduce. But your testcase
> > seems to trigger this reliably, so I hope I can make some quick
> > progress.
> > 
> > - Andres
> 
> Hello,
> I try to find the bad commit using bisect. But it takes very long
> time.

I'd be very surprised if it weren't

commit 5db6df0c0117ff2a4e0cd87594d2db408cd5022f
Author: Andres Freund <andres@anarazel.de>
Date:   2019-03-23 19:55:57 -0700

    tableam: Add tuple_{insert, delete, update, lock} and use.


I just sent a fix for the issue Tom just reported, but I don't quite see
how it applies to your case, given that there is - as far as I
understand - only a single statement per transaction, no triggers
including foreign keys, no CTEs etc.  But it'd sure be interesting if my
fix changes his error into trigering on TM_SelfModified rather than
TM_Invisible.

I'm kinda wondering if your / Roman's case is exposing a race condition
somewhere (like wrong order of clog / procarray checks or such) that
previously wasn't user visible.

I think we probably should expand the error messages for the unexpected
cases to include the tid of the failed tuple (both original and
followed) - then we could at least look through the heap and WAL to get
more understanding.


> The error reproduces with the default config using 24 clients ( server has
> 24 CPUs )
> pgbench test -j 12 -T 36000 -f ycsb_read_zipf.sql -f ycsb_update_zipf.sql -c
> 24 -P 60
> It does not reproduce when updating the only one record.

I ran it for like 9 hours over night, without triggering the error. On a
computer with fewer CPUs though.

Greetings,

Andres Freund



pgsql-bugs by date:

Previous
From: Andres Freund
Date:
Subject: Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed
Next
From: Andres Freund
Date:
Subject: Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed