Re: group locking: incomplete patch, just for discussion - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: group locking: incomplete patch, just for discussion
Date
Msg-id 1415867935.2998.110.camel@jeff-desktop
Whole thread Raw
In response to Re: group locking: incomplete patch, just for discussion  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: group locking: incomplete patch, just for discussion  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Great discussion.

Robert, I think you make a lot of very valid points, but philosophically
I am in pretty strong agreement with Andres here.

On Fri, 2014-10-31 at 14:29 -0400, Robert Haas wrote:
> > So just refusing parallelism as soon as anything has taken an access
> > exclusive lock doesn't sound too bad to me.
> 
> That restriction seems onerous to me; for example, it would prevent a
> parallel sort for CLUSTER or a parallel index rebuild for VACUUM FULL.
> Those seems like worthy targets for parallelism, and maybe even early
> targets, since I expect a lot of the real complexity here to be in the
> query planner, and you can avoid most of that complexity when planning
> DDL.

If two backends both have an exclusive lock on the relation for a join
operation, that implies that they need to do their own synchronization,
because obviously the lock manager is not doing it for them.

In turn, that implies that we need parallel versions of these
operations; going to each one and parallelizing it. I don't think that's
a good long-term strategy... it changes our simple executor (tuples in,
tuples out) to something quite different (each node needs to know a lot
about whatever is underneath it in order to divide the data up).

> process and both requests would have been granted.  Parallelism is
> generally not about making different things happen; it's about
> spreading the stuff that would happen anyway across multiple backends,

This point would be much stronger for declarative queries. When the user
issues LOCK TABLE, they are getting their hands dirty with the
implementation to some degree. If they really want to serialize, they
can serialize around the lock of an object that they don't want to also
read (in parallel).

You're right that this would be surprising to users, but hopefully we
can find a way to disable parallelism in that case, or at least issue a
WARNING that the user is probably doing the wrong thing.

> "cluster pg_collate"... the only solution is to
> move the parallel workers ahead of the AccessExclusiveLock request,
> but the deadlock detector won't do that unless it knows about the
> relationship among the parallel workers.

As discussed later in this thread, I also consider failure a
solution ;-) So we don't need to move anything.

If we have to tell our users that exclusive-locking a catalog can cause
deadlocks for parallel query, I think they will find that reasonable.
The same admin could turn parallelism down to one for non-superusers
before they start doing that, and restore it later.


> patch that does parallel anything has got to worry about what
> undetected deadlocks might result, and then argue about which of them
> are obscure enough that we shouldn't care and which are not.  That
> doesn't sound like a fun way to spend my time.

We're in agreement that deadlocks can't go undetected.

> But, really, the first case is the one I'm more worried about: a bunch
> of parallel backends all grab the same locks, but slightly separated
> in time.  A strong locker joins the queue midway through and we're
> hosed.

I agree the situation sounds bad in the abstract, but none of the
examples seem very compelling. Who locks random catalogs exclusively?
And if they do, why is it the lock manager's job to try to reconcile the
resulting mess without canceling someone? (OK, don't answer that. It
obviously is the lock manager's job to some degree, but we can only
demand so much.)

> Even if you think that all of these particular issues are somehow
> avoidable or that they don't matter, I think it would be unwise to bet
> on them being the only issues.  The lock manager looks like it's old,
> ill-maintained code, but that's in no small part because it solved the
> problem it aims at so thoroughly that few people have felt the need to
> modify it very much in the last decade.  I think that's part of why
> there's so much skepticism on this thread - we don't think about the
> problems that the lock manager solves as being important problems not
> because they truly aren't important, but because they are so
> thoroughly solved.  Then every once in a while we complain about the
> performance.

Guilty as charged.

I won't forever oppose changes to the lock manager, but I resist making
them a precondition of parallelism (aside from making deadlock detection
work, of course). Users taking out exclusive locks recklessly can result
in deadlocks today, and will result in more when we do parallelism
(though perhaps in more surprising situations).

I remember when doing Exclusion Constraints, I went to a lot of effort
to make deadlocks impossible, and was quite proud. When Tom saw it, he
told me not to bother, and to do it the simple way instead, because
deadlocks can happen even with UNIQUE constraints (which I didn't even
know).

We should use the same strategy here. When we see deadlocks becoming a
problem for any reasonable workload, we make a series of tweaks (perhaps
some to the lock manager itself) to reduce them.

Regards,Jeff Davis





pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Failback to old master
Next
From: Simon Riggs
Date:
Subject: Re: Teaching pg_dump to use NOT VALID constraints