Re: group locking: incomplete patch, just for discussion - Mailing list pgsql-hackers

From Robert Haas
Subject Re: group locking: incomplete patch, just for discussion
Date
Msg-id CA+Tgmoa_+u-qCcFZ9zAVZFk+TGXbqQB1+0ktPZRGgPJ5PvrJVw@mail.gmail.com
Whole thread Raw
In response to Re: group locking: incomplete patch, just for discussion  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: group locking: incomplete patch, just for discussion  (Amit Kapila <amit.kapila16@gmail.com>)
Re: group locking: incomplete patch, just for discussion  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers
On Tue, Nov 18, 2014 at 4:40 AM, Jeff Davis <pgsql@j-davis.com> wrote:
>> To reiterate the basic problem here, if we do nothing at all about the
>> lock manager, a parallel backend can stall trying to grab an
>> AccessExclusiveLock that the user backend alread holds, either because
>> the user backend holds an AccessExclusiveLock as well, or because some
>> other process is waiting for one, we'll deadlock and not notice.
>
> My feeling is that we should keep the concept of a group and group
> leader from your patch, and improve the deadlock detector to make use of
> that information (looking at the code, it looks doable but not trivial).
> But unless I am missing something, we should separate out the lock
> sharing, and defer that until later.

Doing as you propose solves this problem:

1. Backend A-1 acquires AccessShareLock on relation R.
2. Backend B waits for AccessExclusiveLock on relation R.
3. Backend A-2 waits for AccessShareLock on relation R.

Knowing that A-1 and A-2 are related, the deadlock detector can move
A-2 ahead of B and grant the lock.  All is well.

But your proposal does not solve this problem:

1. Backend A-1 acquires AccessExclusiveLock on relation R.
2. Backend A-2 waits for AccessShareLock on relation R.

The good news is that the deadlock detector should realize that since
A-1 and A-2 are in the same group, this is a deadlock.  And it can
abort either A-1 or A-2, which will eventually abort them both.  The
bad news, to borrow a phrase from Peter Geoghegan, is that it's an
unprincipled deadlock; a user confronted with the news that her
parallel scan has self-deadlocked will be justifiably dismayed.  It's
been proposed by Andres, and maybe a few other people, that we can
detect this situation and just not use parallelism in these cases, but
that's actually quite hard to do.

Of course, it's pretty easy for a backend A-1 contemplating parallel
scan of relation R to notice that it holds a lock that conflicts with
the AccessShareLock it expects a prospective parallel backend A-2 to
attempt to acquire.  I don't think there's a lock manager method for
that today, but we could easily add one.  However, A-1 might
incidentally hold other locks (from earlier statements in the
transaction) that conflict with locks that will be sought by A-2, and
as discussed *extensively* upthread, it is not easy to predict all of
the locks a parellel backend might try to take: even seemingly-trivial
code like the equality operators for built-in data types can sometimes
take relation locks, and we have no infrastructure for predicting
which ones.

Even if you could somehow solve that problem, there's also the issue
of plan caching.  If we make a parallel plan for an SQL statement
inside a PL/pgsql procedure because our backend holds no strong locks,
and then we acquire a strong lock on a relevant relation, we have to
invalidate that plan.  I think that will add complexity inside the
plan cache machinery.  It seems to me that it would be better to
handle these issues inside the lock manager rather than letting them
creep into other parts of the system.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Abhijit Menon-Sen
Date:
Subject: Re: What exactly is our CRC algorithm?
Next
From: Simon Riggs
Date:
Subject: Re: Add shutdown_at_recovery_target option to recovery.conf