Re: group locking: incomplete patch, just for discussion - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: group locking: incomplete patch, just for discussion |
Date | |
Msg-id | CA+TgmoZ8eTu95HSAUXn_qG9J75FLBAYfehvd8H5cezQ6y5mPiw@mail.gmail.com Whole thread Raw |
In response to | Re: group locking: incomplete patch, just for discussion (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: group locking: incomplete patch, just for discussion
|
List | pgsql-hackers |
On Wed, Oct 15, 2014 at 10:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 15 October 2014 14:46, Robert Haas <robertmhaas@gmail.com> wrote: >>> When my family goes to a restaurant, any member of the party may ask >>> for a table and the request is granted for the whole family. But the >>> lock is released only when I pay the bill. Once we have the table, any >>> stragglers know we have locked the table and they just come sit at the >>> table without needing to make their own lock request to the Maitre D', >>> though they clearly cache the knowledge that we have the table locked. > >> Hmm, interesting idea. Suppose, though, that the child process >> requests a lock that can't immediately be granted, because the catalog >> it's trying to access is locked in AccessExclusiveLock mode by an >> unrelated transaction. The unrelated transaction, in turn, is blocked >> trying to acquire some resource, which the top level parallelism >> process. Assuming the top level parallelism process is waiting for >> the child (or will eventually wait), this is a deadlock, but without >> some modification to the deadlock detector, it can't see one of the >> edges. > > Family disputes are fairly easily resolved ;-) > > The first and basic point is that in most cases the parent should > already hold the required locks. This can only happen for briefly held > locks and/or more complex stuff. In the first case, getting > parallelism to work without that complex stuff would be useful. I'd be > happy if the first version simply throws an error if a child can't > acquire a lock immediately. Don't overthink the first version. Knowing > you'll disagree, lets take a further step... Well, I'm fervently in agreement with you on one point: the first version of all this needs to be as simple as possible, or the time to get to the first version will be longer than we can afford to wait. I think what we're discussing here is which things are important enough that it makes sense to have them in the first version, and which things can wait. I also think we are in agreement that at least SOME thought about lock management is needed; the question we're trying to hash out is whether what I'm proposing to try to do here is significantly more ambitious than what's really necessary for V1. Which is a good question. > Second point, the relationship between parent and children is clear. > If we do a deadlock detection, we should be able to search for that as > a special case, since we will know that we are a child and that such a > situation might occur. So just add in an edge so the rest of the > deadlock code works fine. > > If that doesn't work, use a heurisic. If parent is waiting when child > does deadlock test, assume its a deadlock and abort the child > speculatively just in case. You can work out how to do that better in > the future, since it won't happen that often. Well, the deadlock checker needs to work not only when one of the parallel processes invokes it, but also when somebody else invokes it. For example, suppose we have parallel processes A and B. A holds lock 1 and awaits lock 2, while B holds awaits lock 3. No problem! Now process X, which is holding lock 2 tries to grab lock 1. X must be able to detect the deadlock. Now, that's not necessarily a problem with what you said: it just means that the parent-child relationship has to be clear from the contents of shared memory. Which is pretty much the direction I'm aiming with the incomplete patch I posted. For the sake of simplicity, I want to assume that every process in a locking group waits for every other process in the same locking group, even though that might not be technically true in every case. If the parent is waiting for a lock and the guy who has that lock is waiting for a parallel worker in the same group, my plan (which I think matches your suggestion) is to call that a deadlock. There are a few sticky points here: sometimes, the deadlock checker doesn't actually abort transactions, but just rearranges the wait queue, so something at least somewhat sensible needs to happen in that case. The thing that I'm aiming to do in the patch which I think you are suggesting might not be necessary is to make it possible for the child go ahead and request AccessShareLock on the scan relation even though the parent might already hold some other lock (perhaps even AccessExclusiveLock). I want to make the lock manager smart enough to recognize that those locks are mutually non-conflicting because of the fact that the two processes are in close cooperation. Clearly, we could get by without that, but it adds complexity in other places: the parent has to never release its lock (even if killed) until the child is done with the relation; and the scan code itself needs to be conditional, passing NoLock from children and some other mode in the parent. That's all manageable, but it looks to me like doing the necessary surgery on the lock manager isn't actually going to be that hard; most of the necessary logic is present in draft form in the patch already, and it's not that complex. In any design, the hard part, at least as far as I can see, is making the deadlock detector work reliably. I agree with you that it's OK if, in some unlikely situation, it occasionally diagnoses a deadlock between parallel processes where perhaps there isn't one. But it's not OK, at least in my opinion, if it ever fails to detect a real deadlock. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: