Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed
Date
Msg-id 20190406171025.x7mbhp6kct75oqny@alap3.anarazel.de
Whole thread Raw
In response to Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed  (Andres Freund <andres@anarazel.de>)
Responses Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed
Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed
List pgsql-bugs
Hi,

On 2019-04-06 09:28:46 -0700, Andres Freund wrote:
> On 2019-04-06 12:23:06 -0400, Tom Lane wrote:
> > It seems that there may be some connection between this problem and
> > EPQ.  I was working on committing Amit's fix for bug #15677, which
> > demonstrated that EPQ doesn't work for partitioned-table target rels.
> > It seemed like there really needed to be regression test coverage for
> > that, so I tried to convert his crasher example into an isolation test.
> > It does indeed crash without Amit's fix ... but with it, lookee what
> > I get:
> > 
> > +error in steps c1 complexpartupdate: ERROR:  unexpected table_lock_tuple status: 1
> > 
> > That seems fully reproducible in this test.  I haven't looked into
> > exactly what's causing that, but now that we have a reproducible
> > example, somebody should.
> > 
> > I'm not quite sure if I should commit this as-is or wait till the
> > other problem is fixed.  A crash is probably worse than a bogus
> > error, but I don't like committing obviously-wrong "expected" output.
> > Thoughts?
> 
> Let me have a look at the testcase - I'd been running Roman's testcase
> for quite a few hours without being able to reproduce. But your testcase
> seems to trigger this reliably, so I hope I can make some quick
> progress.

Hm. I see what's wrong here - the new code assumed that we couldn't get
a SelfModified because the first version of the to-be-(deleted|updated)
tuple was visible. To properly discern that from the TM_Deleted case,
I'd to change/fix heapam_lock_tuple's follow-the-update chain to return
SelfModified, rather than Invisible in this case (I don't think we want
to allow invisible - we'd have to have waited for the earlier tuple
version) - which is a more accurate return code anyway.

I'm still not understanding how that'd be possible in Roman's
case. Given the workload there never should be any self updating going
on?

Heavily-WIP patch attached.


I noticed that we say
+                                ereport(ERROR,
+                                        (errcode(ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION),
+                                         errmsg("tuple to be updated was already modified by an operation triggered by
thecurrent command"),
 

in the ExecDelete() case (that's not new). Which seems odd.

I think my fix would need a non-partition reproducer. I'll work on that
and polishing it after having a coffee.

Greetings,

Andres Freund

Attachment

pgsql-bugs by date:

Previous
From: r.zharkov@postgrespro.ru
Date:
Subject: Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed
Next
From: Andres Freund
Date:
Subject: Re: BUG #15727: PANIC: cannot abort transaction 295144144, it wasalready committed