Re: Reducing likelihood of deadlocks (was referential Integrity and SHARE locks) - Mailing list pgsql-hackers

From Alex Hayward
Subject Re: Reducing likelihood of deadlocks (was referential Integrity and SHARE locks)
Date
Msg-id Pine.LNX.4.58.0702141920120.5426@sphinx.mythic-beasts.com
Whole thread Raw
In response to Re: Reducing likelihood of deadlocks (was referential Integrity and SHARE locks)  (Marc Munro <marc@bloodnok.com>)
List pgsql-hackers
On Tue, 13 Feb 2007, Marc Munro wrote:

> On Mon, 2007-12-02 at 00:10 -0500, Tom Lane wrote:
> > Marc Munro <marc@bloodnok.com> writes:
> > > Consider a table C containing 2 child records C1 and C2, of parent P.
> > > If transaction T1 updates C1 and C2, the locking order of the the
> > > records will be C1, P, C2.  Another transaction, T2, that attempts to
> > > update only C2, will lock the records in order C2, P.
> >
> > > The locks on C2 and P are taken in different orders by the two
> > > transactions, leading to the possibility of deadlock.
> >
> > But the lock on P is shared, hence no deadlock.
>
> Doh!  Yes, you are right.  It is not that simple.
>
> For deadlock to occur, we need a transaction that takes an exclusive
> lock on P as well as on one of the children.  Let us replace T2 with a
> new transaction, T3, which is going to update P and only one of its
> children.
>
> If T3 is going to update P and C1 without the possibility of deadlock
> against T1, then it must take out the locks in the order C1, P.  If, on
> the other hand, it is going to update P and C2, then the locks must be
> taken in the order P, C2.
>
> This means that there is no single strategy we can apply to T3 that will
> guarantee to avoid deadlocks with transactions that update only C (ie
> transactions, which to a developers point of view do nothing to P, and
> so should be unable to deadlock with T3).

This scenario would do it, too:
 Table X has rows representing an object of some kind. These objects contain other objects, which are represented by
rowsin table Y. Suppose X stores a count of the Y objects it contains in a particular status (because your application
needsto get this quickly) and suppose the count is updated by a trigger. The Y objects hold references to the
containingX, checked by FK constraints.
 
 A query which updates the status of one or more Ys can deadlock with another instance of itself. It first locks a Y
row,then shared-locks an X row, then updates the X row (when the trigger runs). Two transactions could get to the
shared-lockstage simultaneously, then deadlock.
 

I've come across some a bit like this in my own applications. I'm sure
there are many, many, others.

> From an application developer's standpoint there are few options, none
> of them ideal:
>
> 1) Insist on a locking policy that requires updates to first lock their
> parent records.
>
> This is horrible for so many reasons.  It should be unnecessary; it
> causes exclusive locking on parent records, thereby eliminating the
> gains made by introducing row share locks in 8.1; it is onerous on the
> developers; it is error-prone; etc

I once tried to define a locking order for rows in a database. It doesn't
work (though this was at a time when FK constraint checking used FOR
UPDATE locks, which, of course, made things much worse). This wasn't
specifically for FK checks, but they were an important cause of deadlocks.

Firstly, you have no idea of the order in which rows locked by a statement
will be locked. UPDATE d SET counter=counter+1 WHERE d.a=1 could deadlock
with UPDATE d SET counter=counter+1 WHERE d.b=1. Secondly, even if you
could defining a usable locking order across all of the rows in your
database (not just the tables) is nearly impossible. I suppose you could
base it on, say, the order (tablename, id) but you'd have to go to extreme
lengths to follow this. Imagine having to determine the id of every row
you want to update and the ID and table name of every row they'll lock
because of FK constraints and then sort a big set of 'SELECT ..  FOR
SHARE' and 'UPDATE' statements.

That's a lot of queries - and huge variety of useful queries you can't use
any more.

And once you've done all that you might find your application has race
conditions - there are sometimes other reasons for performing queries in a
certain order.

> 2) Remove FK constraints to eliminate the possibility of RI-triggered
> deadlocks.
>
> Ugh.

Deadlocks aren't the only problem FK constraints' locks are going to cause
you.  It's quite possible you have, somewhere, a small number of rows
referenced via FKs by a huge number of rows. Think about the amount of
locking (not just locks on rows, but internal locks on bits of cache) and
cache coherency logic going on in a production SMP machine.

It'll depend on your load and schema, of course, but I've found the
constraints to be mostly impractical in my production systems. Some I can
keep, but only if I know that the checks aren't going to be triggered too
often.

> 3) Encapsulate all transactions in some form of retry mechanism that
> traps deadlocks and retries those transactions.
>
> This may not be practicable, and incurs all of the overhead of
> encountering and trapping deadlocks in the first place.  Also, as each
> deadlock occurs, a number of locks will be left active before deadlock
> detection kicks in, increasing the window for further deadlocks.  On a
> busy system, the first deadlock may well trigger a cascade of further
> deadlocks.

It's essential to do this (actually, I always presumed that not checking
for deadlocks was a typical database-newbie mistake and that everyone knew
they needed to; I don't seem to see it mentioned on here much, though).
Unless you have a very simple dataset and simple queries you are not going
to be able to guarantee you won't get deadlocks, from FK checks or
otherwise. You could check every transaction against every other
transaction, I suppose...but you might not know what those are at compile
time, and it'll be a very painful task that'll have to be repeated for
every new query, constraint or trigger you add. You might not even have
control over them. It's also dependent on the inner workings of the
database and bits of schema you aren't really meant to know (consider the
type of lock taken by a FK check, the time it's taken, the order in which
the FK checks are performed and the order of triggers generally).



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Plan for compressed varlena headers
Next
From: Magnus Hagander
Date:
Subject: Re: integer datetimes