Home > mailing lists

Re: regression, deadlock in high frequency single-row UPDATE - Mailing list pgsql-bugs

From	Alvaro Herrera
Subject	Re: regression, deadlock in high frequency single-row UPDATE
Date	December 10, 2014 22:34:49
Msg-id	20141210193314.GZ1768@alvh.no-ip.org Whole thread Raw
In response to	Re: regression, deadlock in high frequency single-row UPDATE (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
Responses	Re: regression, deadlock in high frequency single-row UPDATE
List	pgsql-bugs

Tree view

Mark Kirkwood wrote:

> Not so much advantages - just seeing if I could still reproduce the issue
> with a simpler test case i.e ensuring you were not doing anything 'odd' -
> and you are not :-)
>
> The next step would be trying to figure out what commit introduced this
> behaviour - but depesz has already beaten me to that (nice work)!

So I traced through the problem using the simplified test case I posted.
It goes like this:

* tuple on the referenced table has been updated a bunch of times
  already, and keeps being updated, so there's a large update chain to
  follow.

* two or more transactions (T1 and T2, say) have snapshots in which
  some version of the tuple (not the last one) is visible.

* transaction T3 has a ForKeyShare lock on the latest version of the
  tuple, which is grabbed because of the INSERT in the referencing
  table.  Note that these locks are propagated upwards in an update
  chain, so even if T3 locked an old version, all future versions are
  also locked by T3.

* each of T1 and T2 tries to update the version they have visible; this
  fails because it's not the last version, so they need to go through
  EvalPlanQual, which walks through the update chain.

At this point, T1 passes through heap_lock_tuple, gets the HW tuple
lock, marks itself as locker in the tuple's Xmax.  T1 is scheduled out;
time for T2 to run.  T2 goes through heap_lock_tuple, grabs HW lock;
examines Xmax, sees that T1 is the locker, goes to sleep until T1
awakes.  T1 is scheduled to run again, runs heap_update.  The first
thing it needs is to do LockTuple ...  but T2 is holding that one, so T1
goes to sleep until T2 awakes -- now they are both sleeping on each
other.

Kaboom.

I'm not seeing a solution here.  There is something wrong in having to
release the HW lock then grab it again, in that EvalPlanQual dance.  It
sounds like we need to hold on to the tuple's HW lock until after
heap_update has run, but it's not at all clear how this would work --
the interface through EvalPlanQual is messy enough as it is without
entertaining the idea that a "please keep HW lock until later" flag
needs to be passed up and down all over the place.

One simple idea that occurs to me is have some global state in heapam.c;
when the EPQ code is invoked from ExecUpdate, we tell heapam to keep the
HW lock, and it's only released once heap_update is finished.  That
would keep T2 away from the tuple.  We'd need a PG_TRY block in
ExecUpdate to reset the global state once the update is done.  (It's not
entirely clear that we need all this for deletes too.  The main
difference is that after a delete completes the tuple is gone, which is
obviously not the case in an update.  Also, deletes don't create update
chains.)

One thing I haven't thought too much about is why doesn't this happen in
9.2.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-bugs by date:

From: Alvaro Herrera
Date: 10 December 2014, 21:45:59
Subject: Re: BUG #11986: psql uses pager inside Emacs shell buffer (not a terminal)

From: rmoyers@rfdinc.com
Date: 11 December 2014, 00:28:36
Subject: BUG #12188: Output of XML Is Truncated

Re: regression, deadlock in high frequency single-row UPDATE - Mailing list pgsql-bugs

Previous

Next