On Wed, 2014-11-12 at 14:16 -0500, Robert Haas wrote:
> Detected deadlocks are fine. Improving the deadlock detector is the
> heart of what needs to be done here.
OK, great.
> As you say, the lock requests
> we're talking about will rarely wait, so deadlocks won't be frequent.
> The issue is making sure that, if they do happen, we get a better
> behavior than "your parallel query hangs forever; good luck figuring
> out why".
Right. We can still use this patch's notion of a lock group in the
deadlock detector, but we don't need it to actually affect the way a
lock is granted. That should eliminate concerns about subtle bugs.
Later, after we understand how this is actually used, and if we see
deadlock problems, we can look for ways to solve/mitigate them.
This seems to be what Andres was saying, here:
http://www.postgresql.org/message-id/20141031130727.GF13584@awork2.anarazel.de
So I'll follow up in that thread, because it's an interesting
discussion.
> More generally, I think there's some misunderstanding about the
> overall goal of the parallelism infrastructure that I'm trying to
> create. ... But my goal is in some ways
> the opposite: I'm trying to make it possible to run as much existing
> PostgreSQL backend code as possible inside a parallel worker without
> any modification.
Thank you for clarifying, I think this is a good approach.
Back to the patch:
If I understand correctly, the _primary_ goal of this patch is to make
it safe to take out heavyweight locks in worker processes, even if the
deadlock involves LWLocks/latches synchronizing among the processes
within a lock group.
For example, say processes A1 and A2 are in the same lock group, and B
is in a different lock group. A1 is holding heavyweight lock H1 and
waiting on a LW lock L1; A2 is holding L1 and waiting on heavyweight
lock H2; and B is holding H2 and waiting on H1.
The current deadlock detector would see a dependency graph like:
A2 -> B -> A1
But with lock groups, it would see:
(A1 A2) -> B -> (A1 A2)
which is a cycle, and can be detected regardless of the synchronization
method used between A1 and A2. There are some details to work out to
avoid false positives, of course.
Is that about right?
Regards,Jeff Davis