Re: alternative model for handling locking in parallel groups - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: alternative model for handling locking in parallel groups
Date
Msg-id CAA4eK1+ryoYon0wC-EU9S4sdMQV6M73sObcwWjq-uYSGYQYLcg@mail.gmail.com
Whole thread Raw
In response to alternative model for handling locking in parallel groups  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: alternative model for handling locking in parallel groups  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Nov 14, 2014 at 2:29 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> Discussion of my incomplete group locking patch seems to have
> converged around two points: (1) Everybody agrees that undetected
> deadlocks are unacceptable.  (2) Nobody agrees with my proposal to
> treat locks held by group members as mutually non-conflicting.  As was
> probably evident from the emails on the other thread, it was not
> initially clear to me why you'd EVER want heavyweight locks held by
> different group members to mutually conflict, but after thinking it
> over for a while, I started to think of cases where you would
> definitely want that:
>
> 1. Suppose two or more parallel workers are doing a parallel COPY.
> Each time the relation needs to be extended, one backend or the other
> will need to take the relation extension lock in Exclusive mode.
> Clearly, taking this lock needs to exclude both workers in the same
> group and also unrelated processes.
>
> 2. Suppose two or more parallel workers are doing a parallel
> sequential scan, with a filter condition of myfunc(a.somecol), and
> that myfunc(a.somecal) updates a tuple in some other table.  Access to
> update that tuple will be serialized using tuple locks, and it's no
> safer for two parallel workers to do this at the same time than it
> would be for two unrelated processes to do it at the same time.
>

Won't this be addressed because both updates issued from myfunc()
are considered as separate commands, so w.r.t lock it should behave
as 2 different updates in same transaction.  I think there may be more
things to make updates possible via parallel workers apart from tuple lock.

> On the other hand, I think there are also some cases where you pretty
> clearly DO want the locks among group members to be mutually
> non-conflicting, such as:
>
> 3. Parallel VACUUM.  VACUUM takes ShareUpdateExclusiveLock, so that
> only one process can be vacuuming a relation at the same time.  Now,
> if you've got several processes in a group that are collaborating to
> vacuum that relation, they clearly need to avoid excluding each other,
> but they still need to exclude other people.  And in particular,
> nobody else should get to start vacuuming that relation until the last
> member of the group exits.  So what you want is a
> ShareUpdateExclusiveLock that is, in effect, shared across the whole
> group, so that it's only released when the last process exits.
>
> 4. Parallel query on a locked relation.  Parallel query should work on
> a table created in the current transaction, or one explicitly locked
> by user action.  It's not acceptable for that to just randomly
> deadlock, and skipping parallelism altogether, while it'd probably be
> acceptable for a first version, is not going a good long-term
> solution.  It also sounds buggy and fragile for the query planner to
> try to guess whether the lock requests in the parallel workers will
> succeed or fail when issued.  Figuring such details out is the job of
> the lock manager or the parallelism infrastructure, not the query
> planner.
>
> After thinking about these cases for a bit, I came up with a new
> possible approach to this problem.  Suppose that, at the beginning of
> parallelism, when we decide to start up workers, we grant all of the
> locks already held by the master to each worker (ignoring the normal
> rules for lock conflicts).  Thereafter, we do everything the same as
> now, with no changes to the deadlock detector.  That allows the lock
> conflicts to happen normally in the first two cases above, while
> preventing the unwanted lock conflicts in the second two cases.
>

Here I think we have to consider how to pass the information about
all the locks held by master to worker backends.  Also I think assuming
we have such an information available, still it will be considerable work
to grant locks considering the number of locks we acquire [1] (based
on Simon's analysis) and the additional memory they require.  Finally
I think deadlock detector work might also be increased as there will be
now more procs to visit.

In general, I think this scheme will work, but I am not sure it is worth
at this stage (considering initial goal to make parallel workers will be
used for read operations).

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: proposal: plpgsql - Assert statement
Next
From: Amit Kapila
Date:
Subject: Re: group locking: incomplete patch, just for discussion