On 2019-Jun-16, Oleksii Kliukin wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> > On 2019-Jun-14, Alvaro Herrera wrote:
> >
> >> I think there are worse problems here. I tried the attached isolation
> >> spec. Note that the only difference in the two permutations is that s0
> >> finishes earlier in one than the other; yet the first one works fine and
> >> the second one hangs until killed by the 180s timeout. (s3 isn't
> >> released for a reason I'm not sure I understand.)
> >
> > Actually, those behaviors both seem correct to me now that I look
> > closer. So this was a false alarm. In the code before de87a084c0, the
> > first permutation deadlocks, and the second permutation hangs. The only
> > behavior change is that the first one no longer deadlocks, which is the
> > desired change.
> >
> > I'm still trying to form a case to exercise the case of skip_tuple_lock
> > having the wrong lifetime.
>
> Hm… I think it was an oversight from my part not to give skip_lock_tuple the
> same lifetime as have_tuple_lock or first_time (also initializing it to
> false at the same time). Even if now it might not break anything in an
> obvious way, a backward jump to l3 label will leave skip_lock_tuple
> uninitialized, making it very dangerous for any future code that will rely
> on its value.
But that's not the danger ... with the current coding, it's initialized
to false every time through that block; that means the tuple lock will
never be skipped if we jump back to l3. So the danger is that the first
iteration sets the variable, then jumps back; second iteration
initializes the variable again, so instead of skipping the lock, it
takes it, causing a spurious deadlock.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services