Hi,
On 2019-04-07 00:09:15 +0700, r.zharkov@postgrespro.ru wrote:
> On 2019-04-06 23:28, Andres Freund wrote:
> > Hi,
> >
> > Let me have a look at the testcase - I'd been running Roman's testcase
> > for quite a few hours without being able to reproduce. But your testcase
> > seems to trigger this reliably, so I hope I can make some quick
> > progress.
> >
> > - Andres
>
> Hello,
> I try to find the bad commit using bisect. But it takes very long
> time.
I'd be very surprised if it weren't
commit 5db6df0c0117ff2a4e0cd87594d2db408cd5022f
Author: Andres Freund <andres@anarazel.de>
Date: 2019-03-23 19:55:57 -0700
tableam: Add tuple_{insert, delete, update, lock} and use.
I just sent a fix for the issue Tom just reported, but I don't quite see
how it applies to your case, given that there is - as far as I
understand - only a single statement per transaction, no triggers
including foreign keys, no CTEs etc. But it'd sure be interesting if my
fix changes his error into trigering on TM_SelfModified rather than
TM_Invisible.
I'm kinda wondering if your / Roman's case is exposing a race condition
somewhere (like wrong order of clog / procarray checks or such) that
previously wasn't user visible.
I think we probably should expand the error messages for the unexpected
cases to include the tid of the failed tuple (both original and
followed) - then we could at least look through the heap and WAL to get
more understanding.
> The error reproduces with the default config using 24 clients ( server has
> 24 CPUs )
> pgbench test -j 12 -T 36000 -f ycsb_read_zipf.sql -f ycsb_update_zipf.sql -c
> 24 -P 60
> It does not reproduce when updating the only one record.
I ran it for like 9 hours over night, without triggering the error. On a
computer with fewer CPUs though.
Greetings,
Andres Freund