Re: regression, deadlock in high frequency single-row UPDATE - Mailing list pgsql-bugs

From Alvaro Herrera
Subject Re: regression, deadlock in high frequency single-row UPDATE
Date
Msg-id 20141210193314.GZ1768@alvh.no-ip.org
Whole thread Raw
In response to Re: regression, deadlock in high frequency single-row UPDATE  (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
Responses Re: regression, deadlock in high frequency single-row UPDATE
List pgsql-bugs
Mark Kirkwood wrote:

> Not so much advantages - just seeing if I could still reproduce the issue
> with a simpler test case i.e ensuring you were not doing anything 'odd' -
> and you are not :-)
>
> The next step would be trying to figure out what commit introduced this
> behaviour - but depesz has already beaten me to that (nice work)!

So I traced through the problem using the simplified test case I posted.
It goes like this:

* tuple on the referenced table has been updated a bunch of times
  already, and keeps being updated, so there's a large update chain to
  follow.

* two or more transactions (T1 and T2, say) have snapshots in which
  some version of the tuple (not the last one) is visible.

* transaction T3 has a ForKeyShare lock on the latest version of the
  tuple, which is grabbed because of the INSERT in the referencing
  table.  Note that these locks are propagated upwards in an update
  chain, so even if T3 locked an old version, all future versions are
  also locked by T3.

* each of T1 and T2 tries to update the version they have visible; this
  fails because it's not the last version, so they need to go through
  EvalPlanQual, which walks through the update chain.

At this point, T1 passes through heap_lock_tuple, gets the HW tuple
lock, marks itself as locker in the tuple's Xmax.  T1 is scheduled out;
time for T2 to run.  T2 goes through heap_lock_tuple, grabs HW lock;
examines Xmax, sees that T1 is the locker, goes to sleep until T1
awakes.  T1 is scheduled to run again, runs heap_update.  The first
thing it needs is to do LockTuple ...  but T2 is holding that one, so T1
goes to sleep until T2 awakes -- now they are both sleeping on each
other.

Kaboom.

I'm not seeing a solution here.  There is something wrong in having to
release the HW lock then grab it again, in that EvalPlanQual dance.  It
sounds like we need to hold on to the tuple's HW lock until after
heap_update has run, but it's not at all clear how this would work --
the interface through EvalPlanQual is messy enough as it is without
entertaining the idea that a "please keep HW lock until later" flag
needs to be passed up and down all over the place.

One simple idea that occurs to me is have some global state in heapam.c;
when the EPQ code is invoked from ExecUpdate, we tell heapam to keep the
HW lock, and it's only released once heap_update is finished.  That
would keep T2 away from the tuple.  We'd need a PG_TRY block in
ExecUpdate to reset the global state once the update is done.  (It's not
entirely clear that we need all this for deletes too.  The main
difference is that after a delete completes the tuple is gone, which is
obviously not the case in an update.  Also, deletes don't create update
chains.)

One thing I haven't thought too much about is why doesn't this happen in
9.2.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-bugs by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: BUG #11986: psql uses pager inside Emacs shell buffer (not a terminal)
Next
From: rmoyers@rfdinc.com
Date:
Subject: BUG #12188: Output of XML Is Truncated