Hi,
On 2019-04-06 09:28:46 -0700, Andres Freund wrote:
> On 2019-04-06 12:23:06 -0400, Tom Lane wrote:
> > It seems that there may be some connection between this problem and
> > EPQ. I was working on committing Amit's fix for bug #15677, which
> > demonstrated that EPQ doesn't work for partitioned-table target rels.
> > It seemed like there really needed to be regression test coverage for
> > that, so I tried to convert his crasher example into an isolation test.
> > It does indeed crash without Amit's fix ... but with it, lookee what
> > I get:
> >
> > +error in steps c1 complexpartupdate: ERROR: unexpected table_lock_tuple status: 1
> >
> > That seems fully reproducible in this test. I haven't looked into
> > exactly what's causing that, but now that we have a reproducible
> > example, somebody should.
> >
> > I'm not quite sure if I should commit this as-is or wait till the
> > other problem is fixed. A crash is probably worse than a bogus
> > error, but I don't like committing obviously-wrong "expected" output.
> > Thoughts?
>
> Let me have a look at the testcase - I'd been running Roman's testcase
> for quite a few hours without being able to reproduce. But your testcase
> seems to trigger this reliably, so I hope I can make some quick
> progress.
Hm. I see what's wrong here - the new code assumed that we couldn't get
a SelfModified because the first version of the to-be-(deleted|updated)
tuple was visible. To properly discern that from the TM_Deleted case,
I'd to change/fix heapam_lock_tuple's follow-the-update chain to return
SelfModified, rather than Invisible in this case (I don't think we want
to allow invisible - we'd have to have waited for the earlier tuple
version) - which is a more accurate return code anyway.
I'm still not understanding how that'd be possible in Roman's
case. Given the workload there never should be any self updating going
on?
Heavily-WIP patch attached.
I noticed that we say
+ ereport(ERROR,
+ (errcode(ERRCODE_TRIGGERED_DATA_CHANGE_VIOLATION),
+ errmsg("tuple to be updated was already modified by an operation triggered by
thecurrent command"),
in the ExecDelete() case (that's not new). Which seems odd.
I think my fix would need a non-partition reproducer. I'll work on that
and polishing it after having a coffee.
Greetings,
Andres Freund