Thread: group locking: incomplete patch, just for discussion
For parallelism, I think we need a concept of group locking. That is, suppose we have a user backend and N worker backends collaborating to execute some query. For the sake of argument, let's say it's a parallel CLUSTER, requiring a full table lock. We need all of the backends to be able to lock the table even though at least one of them holds AccessExclusiveLock. This suggests that the backends should all be members of a locking group, and that locks within the same locking group should be regarded as mutually non-conflicting. At least to me, that simple scenario is clear-cut[1], but what do we do in more complicated situations? For example, suppose backends A and B are members of the same locking group. A locks a relation with AccessShareLock, an unrelated process X queues up waiting for an AccessExclusiveLock, and then B also requests AccessShareLock. The normal behavior here is that B should wait for X to go first, but here that's a problem. If A is the user backend and B is a worker backend, A will eventually wait for B, which is waiting for X, which is waiting for A: deadlock. If it's the other way around, we might survive, but at the very best it's undesirable to have "parallelism" where A and B are actually completely serialized, with X taking its turn in the middle. Now, in practical cases, we probably expect that if A and B lock the same relation, they're going to do so in very quick succession, so the actual behavior of the lock manager shouldn't matter much in terms of fairness. But since we can't predict what other processes may do in the window between those lock acquisitions, a sound theoretical approach is needed to avoid deadlocks. After some thought I've come to the conclusion that we should assume that every process in a locking group will eventually wait for every other process in that locking group. It follows that once any single process in a locking group obtains any lock on an object, any future lock request by another lock group member on that object should, if non-conflicting, be granted immediately. It also follows that, when processes in a locking group are waiting for a lock on an object, none should be awoken until all lock requests have been satisfied. For example, suppose process X holds AccessExclusiveLock on a relation. First, process Y waits for AccessShareLock. Then, processes A and B, members of the same locking group, queue up respectively for AccessShareLock and AccessExclusiveLock, in that order. Next, process X releases its lock. At this point, we should wake Y *but not A*; A should continue to wait until we can grant locks to both A and B. The reason is that a new process Q might come along and, for purposes of deadlock avoidance, we might need to reorder the lock queue. Putting Q ahead of both A and B is fine; but putting it between A and B is bad for the reasons discussed in the previous paragraph. Attached is an incomplete draft patch. It modifies LockAcquireExtended() and ProcSleep() but not ProcWakeup() or the deadlock detector. Aside from being incomplete, which is a problem in and of itself, I don't have a very good idea how to comprehensively test something like this. I can of course cobble together some test code for particular scenarios, but it's not at all clear to me what a comprehensive test plan would look like. Any ideas on that point would be particularly appreciated, as would feedback on the overall design. A lot more work is clearly needed here, but I've been known to advocate soliciting feedback on proposed patches early, so here I am taking my own advice. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] You could argue that one process instead should take locks on behalf of the group. But I think this is a terrible approach. First, it means that whichever process is holding the locks, presumably the user backend, had better not try to do any real computation of its own, so that it can timely service lock requests, which seems like a huge loss of efficiency; if the optimum degree of parallelism is 2, we'll need 50% more processes and the whole result set will have to be copied an extra time. Yuck. Second, it means that the lock-holding process must refuse to die until all of the processes on behalf of which it is holding locks don't need them any more. And I hate backend processes that refuse to die with a fiery passion that cannot be quenched.
Attachment
Robert Haas <robertmhaas@gmail.com> writes: > For parallelism, I think we need a concept of group locking. That is, > suppose we have a user backend and N worker backends collaborating to > execute some query. For the sake of argument, let's say it's a > parallel CLUSTER, requiring a full table lock. We need all of the > backends to be able to lock the table even though at least one of them > holds AccessExclusiveLock. This suggests that the backends should all > be members of a locking group, and that locks within the same locking > group should be regarded as mutually non-conflicting. In the background worker case, I imagined that the foreground process would hold a lock and the background processes would just assume they could access the table without holding locks of their own. Aren't you building a mountain where a molehill would do? > [1] You could argue that one process instead should take locks on > behalf of the group. But I think this is a terrible approach. First, > it means that whichever process is holding the locks, presumably the > user backend, had better not try to do any real computation of its > own, so that it can timely service lock requests, which seems like a > huge loss of efficiency; What is "timely service lock requests"? You got the lock before firing off the background workers, you hold it till they're done. This is a truly horrible excuse for the amount of complexity (and, without doubt, bugs and performance penalties) you propose to load onto the lock mechanism. regards, tom lane
On Wed, Oct 15, 2014 at 12:13 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > What is "timely service lock requests"? You got the lock before firing > off the background workers, you hold it till they're done. If you want to run any non-trivial (read: useful) code in the background workers, it rapidly gets very hard to predict which relation locks they might need, and infeasible to hold them for the lifetime of the computation. For example, suppose we restrict background workers to running only operators found in btree opclasses. That's a far more draconian restriction than we'd like to have, but perhaps liveable in a pinch. But bttextcmp() uses PG_GETARG_TEXT_PP(), which means it may (or may not) need to lock the TOAST table; and it can indirectly call lookup_collation_cache() which does SearchSysCache1(COLLOID, ...) which may result in scanning pg_collation. And enum_cmp_internal() will do SearchSysCache1(ENUMOID, ...) which may result in scanning pg_enum. There's obviously a balance to be struck, here. It will never be safe to run just anything in a background worker, and we'll go crazy if we try to make that work. On the other hand, if there's essentially no code that can run in a background worker without extensive modification, we might as well not bother implementing parallelism in the first place. I admit that getting group locking is far from simple, but I also don't believe it's as complicated as previous projects like SSI, logical decoding, or LATERAL. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 15 October 2014 05:13, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> For parallelism, I think we need a concept of group locking. That is, >> suppose we have a user backend and N worker backends collaborating to >> execute some query. For the sake of argument, let's say it's a >> parallel CLUSTER, requiring a full table lock. We need all of the >> backends to be able to lock the table even though at least one of them >> holds AccessExclusiveLock. This suggests that the backends should all >> be members of a locking group, and that locks within the same locking >> group should be regarded as mutually non-conflicting. > > In the background worker case, I imagined that the foreground process > would hold a lock and the background processes would just assume they > could access the table without holding locks of their own. Aren't > you building a mountain where a molehill would do? Yeh. Locks should be made in the name of the main transaction and released by it. When my family goes to a restaurant, any member of the party may ask for a table and the request is granted for the whole family. But the lock is released only when I pay the bill. Once we have the table, any stragglers know we have locked the table and they just come sit at the table without needing to make their own lock request to the Maitre D', though they clearly cache the knowledge that we have the table locked. So all lock requests held until EOX should be made in the name of the top level process. Any child process wanting a lock should request it, but on discovering it is already held at parent level should just update the local lock table. Transient locks, like catalog locks can be made and released locally; I think there is more detail there but it shouldn't affect the generalisation. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Oct 15, 2014 at 4:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 15 October 2014 05:13, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> For parallelism, I think we need a concept of group locking. That is, >>> suppose we have a user backend and N worker backends collaborating to >>> execute some query. For the sake of argument, let's say it's a >>> parallel CLUSTER, requiring a full table lock. We need all of the >>> backends to be able to lock the table even though at least one of them >>> holds AccessExclusiveLock. This suggests that the backends should all >>> be members of a locking group, and that locks within the same locking >>> group should be regarded as mutually non-conflicting. >> >> In the background worker case, I imagined that the foreground process >> would hold a lock and the background processes would just assume they >> could access the table without holding locks of their own. Aren't >> you building a mountain where a molehill would do? > > Yeh. Locks should be made in the name of the main transaction and > released by it. > > When my family goes to a restaurant, any member of the party may ask > for a table and the request is granted for the whole family. But the > lock is released only when I pay the bill. Once we have the table, any > stragglers know we have locked the table and they just come sit at the > table without needing to make their own lock request to the Maitre D', > though they clearly cache the knowledge that we have the table locked. > > So all lock requests held until EOX should be made in the name of the > top level process. Any child process wanting a lock should request it, > but on discovering it is already held at parent level should just > update the local lock table. Transient locks, like catalog locks can > be made and released locally; I think there is more detail there but > it shouldn't affect the generalisation. Hmm, interesting idea. Suppose, though, that the child process requests a lock that can't immediately be granted, because the catalog it's trying to access is locked in AccessExclusiveLock mode by an unrelated transaction. The unrelated transaction, in turn, is blocked trying to acquire some resource, which the top level parallelism process. Assuming the top level parallelism process is waiting for the child (or will eventually wait), this is a deadlock, but without some modification to the deadlock detector, it can't see one of the edges. Figuring out what to do about that is really the heart of this project, I think, and there are a variety of designs possible. One of the early ideas that I had was to the parallel workers directly twaddle the main processes' PGPROC and lock table state. In other words, instead of taking locks using their own PGPROCs, everybody uses a single PGPROC. I made several attempts at getting designs along these lines off the ground, but it got complicated and invasive: (1) The processes need to coordinate to make sure that you don't have two people twaddling the lock state at the same time; (2) The existing data structures won't support more than one process waiting at a time, but there's no reason why one parallel worker couldn't be trying to lock one catalog while another one is trying to lock a different catalog; (3) On a related note, when a lock wait ends, you can't just wake up the process that owns the PGPROC, but rather the one that's actually waiting; (4) the LWLockReleaseAll algorithm just falls apart in this environment, as far as I can see. The alternative design which I've been experimenting with is to have each process use its own PGPROC and PROCLOCK structures, but to tag each PROCLOCK with not only the owning PGPROC but also the group leader's PGPROC. This has not been entirely smooth sailing, but it sees to break much less code than trying to have everybody use one PGPROC. Most of the changes that seem to be needed to make things work are pretty well-isolated; rather than totally rearranging the lock manager, you're just adding extra code that runs only in the parallel case. I'm definitely open to the idea that there's a better, simpler design out there, but I haven't been able to think of one that doesn't break deadlock detection. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 15 October 2014 14:46, Robert Haas <robertmhaas@gmail.com> wrote: >> When my family goes to a restaurant, any member of the party may ask >> for a table and the request is granted for the whole family. But the >> lock is released only when I pay the bill. Once we have the table, any >> stragglers know we have locked the table and they just come sit at the >> table without needing to make their own lock request to the Maitre D', >> though they clearly cache the knowledge that we have the table locked. > Hmm, interesting idea. Suppose, though, that the child process > requests a lock that can't immediately be granted, because the catalog > it's trying to access is locked in AccessExclusiveLock mode by an > unrelated transaction. The unrelated transaction, in turn, is blocked > trying to acquire some resource, which the top level parallelism > process. Assuming the top level parallelism process is waiting for > the child (or will eventually wait), this is a deadlock, but without > some modification to the deadlock detector, it can't see one of the > edges. Family disputes are fairly easily resolved ;-) The first and basic point is that in most cases the parent should already hold the required locks. This can only happen for briefly held locks and/or more complex stuff. In the first case, getting parallelism to work without that complex stuff would be useful. I'd be happy if the first version simply throws an error if a child can't acquire a lock immediately. Don't overthink the first version. Knowing you'll disagree, lets take a further step... Second point, the relationship between parent and children is clear. If we do a deadlock detection, we should be able to search for that as a special case, since we will know that we are a child and that such a situation might occur. So just add in an edge so the rest of the deadlock code works fine. If that doesn't work, use a heurisic. If parent is waiting when child does deadlock test, assume its a deadlock and abort the child speculatively just in case. You can work out how to do that better in the future, since it won't happen that often. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Oct 15, 2014 at 10:12 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 15 October 2014 14:46, Robert Haas <robertmhaas@gmail.com> wrote: >>> When my family goes to a restaurant, any member of the party may ask >>> for a table and the request is granted for the whole family. But the >>> lock is released only when I pay the bill. Once we have the table, any >>> stragglers know we have locked the table and they just come sit at the >>> table without needing to make their own lock request to the Maitre D', >>> though they clearly cache the knowledge that we have the table locked. > >> Hmm, interesting idea. Suppose, though, that the child process >> requests a lock that can't immediately be granted, because the catalog >> it's trying to access is locked in AccessExclusiveLock mode by an >> unrelated transaction. The unrelated transaction, in turn, is blocked >> trying to acquire some resource, which the top level parallelism >> process. Assuming the top level parallelism process is waiting for >> the child (or will eventually wait), this is a deadlock, but without >> some modification to the deadlock detector, it can't see one of the >> edges. > > Family disputes are fairly easily resolved ;-) > > The first and basic point is that in most cases the parent should > already hold the required locks. This can only happen for briefly held > locks and/or more complex stuff. In the first case, getting > parallelism to work without that complex stuff would be useful. I'd be > happy if the first version simply throws an error if a child can't > acquire a lock immediately. Don't overthink the first version. Knowing > you'll disagree, lets take a further step... Well, I'm fervently in agreement with you on one point: the first version of all this needs to be as simple as possible, or the time to get to the first version will be longer than we can afford to wait. I think what we're discussing here is which things are important enough that it makes sense to have them in the first version, and which things can wait. I also think we are in agreement that at least SOME thought about lock management is needed; the question we're trying to hash out is whether what I'm proposing to try to do here is significantly more ambitious than what's really necessary for V1. Which is a good question. > Second point, the relationship between parent and children is clear. > If we do a deadlock detection, we should be able to search for that as > a special case, since we will know that we are a child and that such a > situation might occur. So just add in an edge so the rest of the > deadlock code works fine. > > If that doesn't work, use a heurisic. If parent is waiting when child > does deadlock test, assume its a deadlock and abort the child > speculatively just in case. You can work out how to do that better in > the future, since it won't happen that often. Well, the deadlock checker needs to work not only when one of the parallel processes invokes it, but also when somebody else invokes it. For example, suppose we have parallel processes A and B. A holds lock 1 and awaits lock 2, while B holds awaits lock 3. No problem! Now process X, which is holding lock 2 tries to grab lock 1. X must be able to detect the deadlock. Now, that's not necessarily a problem with what you said: it just means that the parent-child relationship has to be clear from the contents of shared memory. Which is pretty much the direction I'm aiming with the incomplete patch I posted. For the sake of simplicity, I want to assume that every process in a locking group waits for every other process in the same locking group, even though that might not be technically true in every case. If the parent is waiting for a lock and the guy who has that lock is waiting for a parallel worker in the same group, my plan (which I think matches your suggestion) is to call that a deadlock. There are a few sticky points here: sometimes, the deadlock checker doesn't actually abort transactions, but just rearranges the wait queue, so something at least somewhat sensible needs to happen in that case. The thing that I'm aiming to do in the patch which I think you are suggesting might not be necessary is to make it possible for the child go ahead and request AccessShareLock on the scan relation even though the parent might already hold some other lock (perhaps even AccessExclusiveLock). I want to make the lock manager smart enough to recognize that those locks are mutually non-conflicting because of the fact that the two processes are in close cooperation. Clearly, we could get by without that, but it adds complexity in other places: the parent has to never release its lock (even if killed) until the child is done with the relation; and the scan code itself needs to be conditional, passing NoLock from children and some other mode in the parent. That's all manageable, but it looks to me like doing the necessary surgery on the lock manager isn't actually going to be that hard; most of the necessary logic is present in draft form in the patch already, and it's not that complex. In any design, the hard part, at least as far as I can see, is making the deadlock detector work reliably. I agree with you that it's OK if, in some unlikely situation, it occasionally diagnoses a deadlock between parallel processes where perhaps there isn't one. But it's not OK, at least in my opinion, if it ever fails to detect a real deadlock. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 15 October 2014 17:03, Robert Haas <robertmhaas@gmail.com> wrote: > Well, I'm fervently in agreement with you on one point: the first > version of all this needs to be as simple as possible, or the time to > get to the first version will be longer than we can afford to wait. I > think what we're discussing here is which things are important enough > that it makes sense to have them in the first version, and which > things can wait. I'm guessing we might differ slightly on what constitutes as simple as possible. Something usable, with severe restrictions, is actually better than we have now. I understand the journey this work represents, so don't be embarrassed by submitting things with heuristics and good-enoughs in it. Our mentor, Mr.Lane, achieved much by spreading work over many releases, leaving others to join in the task. Might I gently enquire what the "something usable" we are going to see in this release? I'm not up on current plans. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Oct 15, 2014 at 2:55 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 15 October 2014 17:03, Robert Haas <robertmhaas@gmail.com> wrote: >> Well, I'm fervently in agreement with you on one point: the first >> version of all this needs to be as simple as possible, or the time to >> get to the first version will be longer than we can afford to wait. I >> think what we're discussing here is which things are important enough >> that it makes sense to have them in the first version, and which >> things can wait. > > I'm guessing we might differ slightly on what constitutes as simple as possible. Yes, I believe there have been occasions in the past when that has happened, so definitely possible. :-) > Something usable, with severe restrictions, is actually better than we > have now. I understand the journey this work represents, so don't be > embarrassed by submitting things with heuristics and good-enoughs in > it. Our mentor, Mr.Lane, achieved much by spreading work over many > releases, leaving others to join in the task. > > Might I gently enquire what the "something usable" we are going to see > in this release? I'm not up on current plans. I don't know how far I'm going to get for this release yet. I think pg_background is a pretty good milestone, and useful in its own right. I would like to get something that's truly parallel working sooner rather than later, but this group locking issue is one of 2 or 3 significant hurdles that I need to climb over first. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 16 October 2014 16:22, Robert Haas <robertmhaas@gmail.com> wrote: >> Might I gently enquire what the "something usable" we are going to see >> in this release? I'm not up on current plans. > > I don't know how far I'm going to get for this release yet. I think > pg_background is a pretty good milestone, and useful in its own right. > I would like to get something that's truly parallel working sooner > rather than later, but this group locking issue is one of 2 or 3 > significant hurdles that I need to climb over first. pg_background is very cute, but really its not really a step forward, or at least very far. It's sounding like you've already decided that is as far as we're going to get this release, which I'm disappointed about. Given your description of pg_background it looks an awful lot like infrastructure to make Autonomous Transactions work, but it doesn't even do that. I guess it could do in a very small additional patch, so maybe it is useful for something. You asked for my help, but I'd like to see some concrete steps towards an interim feature so I can see some benefit in a clear direction. Can we please have the first step we discussed? Parallel CREATE INDEX? (Note the please) -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 10/28/14, 3:48 PM, Simon Riggs wrote: > Given your description of pg_background it looks an awful lot like > infrastructure to make Autonomous Transactions work, but it doesn't > even do that. I guess it could do in a very small additional patch, so > maybe it is useful for something. What do you see as being missing for autonomous transactios? BTW, what I think would make this feature VERY useful is if it provided the ability to fire something up in another backendand leave it running in the background. I think you can do that with FDW, but then you have the authentication PITAto deal with (and perhaps pg_background is a more efficient way to move data between backends than FDW, but that's justa guess...) -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On Tue, Oct 28, 2014 at 4:48 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 16 October 2014 16:22, Robert Haas <robertmhaas@gmail.com> wrote: >>> Might I gently enquire what the "something usable" we are going to see >>> in this release? I'm not up on current plans. >> >> I don't know how far I'm going to get for this release yet. I think >> pg_background is a pretty good milestone, and useful in its own right. >> I would like to get something that's truly parallel working sooner >> rather than later, but this group locking issue is one of 2 or 3 >> significant hurdles that I need to climb over first. > > pg_background is very cute, but really its not really a step forward, > or at least very far. It's sounding like you've already decided that > is as far as we're going to get this release, which I'm disappointed > about. > > Given your description of pg_background it looks an awful lot like > infrastructure to make Autonomous Transactions work, but it doesn't > even do that. I guess it could do in a very small additional patch, so > maybe it is useful for something. > > You asked for my help, but I'd like to see some concrete steps towards > an interim feature so I can see some benefit in a clear direction. > > Can we please have the first step we discussed? Parallel CREATE INDEX? > (Note the please) What I've been thinking about trying to work towards is parallel sequential scan. I think that it would actually be pretty easy to code up a mostly-working version using the existing infrastructure, but the patch would be rejected with a bazooka, because the non-working parts would include things like: 1. The cooperating backends might not all be using the same snapshot, because that requires sharing the snapshot, combo CID hash, and transaction state. 2. The quals that got pushed down to the workers might not return the same answers that they would have produced with a single backend, because we have no mechanism for assessing pushdown-safety. 3. Deadlock detection would be to some greater or lesser degree broken, the details depending on the implementation choices you made. There is a bit of a chicken-and-egg problem here. If I submit a patch for parallel sequential scan, it'll (justifiably) get rejected because it doesn't solve those problems. So I'm trying to solve those above-enumerated problems first, with working and at least somewhat-useful examples that show how the incremental bits of infrastructure can be used to do stuff. But that leads to your (understandable) complaint that this isn't very real yet. Why am I now thinking about parallel sequential scan instead of parallel CREATE INDEX? You may remember that I posted a patch for a new memory allocator some time ago, and it came in for a fair amount of criticism and not much approbation. Some of that criticism was certainly justified, and perhaps I was as hard on myself as anyone else was. However you want to look at it, I see the trade-off between parallel sort and parallel seq-scan this way: parallel seq-scan requires dealing with the planner (ouch!) but parallel sort requires dealing with memory allocation in dynamic shared memory segments (ouch!). Both of them require solving the three problems listed above. And maybe a few others, but I think those are the big ones - and I think proper deadlock detection is the hardest of them. A colleague of mine has drafted patches for sharing snapshots and combo CIDs between processes, and as you might expect that's pretty easy. Sharing the transaction state (so we can test whether a transaction ID is "our" transaction ID inside the worker) is a bit trickier, but I think not too hard. Assessing pushdown-safety will probably boil down to adding some equivalent of proisparallel. Maybe not the most elegant, but defensible, and if you're looking for the shortest path to something usable, that's probably it. But deadlock detection ... well, I don't see any simpler solution than what I'm trying to build here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Oct 28, 2014 at 7:22 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: > On 10/28/14, 3:48 PM, Simon Riggs wrote: >> Given your description of pg_background it looks an awful lot like >> infrastructure to make Autonomous Transactions work, but it doesn't >> even do that. I guess it could do in a very small additional patch, so >> maybe it is useful for something. > > What do you see as being missing for autonomous transactios? Personally, I don't see this patch set as having much to do with real autonomous transactions. > BTW, what I think would make this feature VERY useful is if it provided the > ability to fire something up in another backend and leave it running in the > background. You *can* do that. I mean, long-running transactions will have their usual problems, but if you want to kick off a long-running (or a short-running query) in the background and forget about it, this patch lets you do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, Oct 28, 2014 at 7:26 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Oct 28, 2014 at 7:22 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote: >> On 10/28/14, 3:48 PM, Simon Riggs wrote: >>> Given your description of pg_background it looks an awful lot like >>> infrastructure to make Autonomous Transactions work, but it doesn't >>> even do that. I guess it could do in a very small additional patch, so >>> maybe it is useful for something. >> >> What do you see as being missing for autonomous transactios? > > Personally, I don't see this patch set as having much to do with real > autonomous transactions. > >> BTW, what I think would make this feature VERY useful is if it provided the >> ability to fire something up in another backend and leave it running in the >> background. > > You *can* do that. I mean, long-running transactions will have their > usual problems, but if you want to kick off a long-running (or a > short-running query) in the background and forget about it, this patch > lets you do that. Err, sorry. pg_background lets you do that, not this patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 28 October 2014 23:24, Robert Haas <robertmhaas@gmail.com> wrote: >> You asked for my help, but I'd like to see some concrete steps towards >> an interim feature so I can see some benefit in a clear direction. >> >> Can we please have the first step we discussed? Parallel CREATE INDEX? >> (Note the please) > > What I've been thinking about trying to work towards is parallel > sequential scan. I think that it would actually be pretty easy to > code up a mostly-working version using the existing infrastructure, > but the patch would be rejected with a bazooka, because the > non-working parts would include things like: > > 1. The cooperating backends might not all be using the same snapshot, > because that requires sharing the snapshot, combo CID hash, and > transaction state. > 2. The quals that got pushed down to the workers might not return the > same answers that they would have produced with a single backend, > because we have no mechanism for assessing pushdown-safety. > 3. Deadlock detection would be to some greater or lesser degree > broken, the details depending on the implementation choices you made. > > There is a bit of a chicken-and-egg problem here. If I submit a patch > for parallel sequential scan, it'll (justifiably) get rejected because > it doesn't solve those problems. So I'm trying to solve those > above-enumerated problems first, with working and at least > somewhat-useful examples that show how the incremental bits of > infrastructure can be used to do stuff. But that leads to your > (understandable) complaint that this isn't very real yet. > > Why am I now thinking about parallel sequential scan instead of > parallel CREATE INDEX? You may remember that I posted a patch for a > new memory allocator some time ago, and it came in for a fair amount > of criticism and not much approbation. Some of that criticism was > certainly justified, and perhaps I was as hard on myself as anyone > else was. However you want to look at it, I see the trade-off between > parallel sort and parallel seq-scan this way: parallel seq-scan > requires dealing with the planner (ouch!) but parallel sort requires > dealing with memory allocation in dynamic shared memory segments > (ouch!). Both of them require solving the three problems listed > above. > > And maybe a few others, but I think those are the big ones - and I > think proper deadlock detection is the hardest of them. A colleague > of mine has drafted patches for sharing snapshots and combo CIDs > between processes, and as you might expect that's pretty easy. > Sharing the transaction state (so we can test whether a transaction ID > is "our" transaction ID inside the worker) is a bit trickier, but I > think not too hard. Assessing pushdown-safety will probably boil down > to adding some equivalent of proisparallel. Maybe not the most > elegant, but defensible, and if you're looking for the shortest path > to something usable, that's probably it. But deadlock detection ... > well, I don't see any simpler solution than what I'm trying to build > here. OK, I see where you are headed, and perhaps why. I said I would help and I mean to do so. Here's my thoughts: Solving your 3 problems requires significant research, coding and agreement that you're unlikely to get in 9.5 because there are no solutions happening at the end of it; its all theoretical design, which becomes more arguable. And even if you do, Parallel (||) Seq Scan isn't interesting except for marketing purposes, since its use in real queries is limited because most queries need to do something else after the SeqScan, which opens up a fuller discussion of parallel query which we've never had. My feeling is that releasing only || Seq Scan would backfire since we don't yet do enough other stuff to be a full solution. If you do wish to pursue || Seq Scan, then a working prototype would help. It allows us to see that there is an open source solution we are working to solve the problems for. People can benchmark it, understand the benefits and issues it raises and that would help focus attention on the problems you are trying to solve in infrastructure. People may have suggestions on how to solve or avoid those that you hadn't thought of. Personally, IMHO, its quite late in the cycle to be releasing such a prototype since we have only 6-7 weeks until the major patch submission deadline. Hence my request: can we get *something* good for users into 9.5, since with respect, infrastructure isn't useful. (If you take such comments negatively, we might discuss similar issues with BDR - but we are working to make simple features available in each release on that project also). My proposal is we do a parallel index build scan... just as we discussed earlier, so you would be following the direction set by Dev Meeting, not just a proposal of mine. As I mentioned previously when you started discussing shared memory segments, parallel sort does NOT require shared memory. The only thing you need to share are files. Split the problem into N pieces, sort them to produce N files and then merge the files using existing code. That only applies to large sorts, but then those are the ones you cared about doing in parallel anyway. CREATE INDEX has a lock on the table. Worker tasks can be simply banned from acquiring new locks and doing many other tasks. We can prevent transaction chains, so both normal and concurrent versions have no complex combocid or transaction state issues. So your 3 problems come down to nothing that needs to be solved immediately. Pushdown can be assumed for the functions we use for existing index types: we can verify this from our internal code, so we are not at risk there. So my proposal is we do a parallel index build scan, a limited subset of parallel seq scan, we put the data from that into a private sort node, then output a series of files. The plan for this is the following... Parallel Sort Merge Local Sort Local IndexBuildScanSegment Local IndexBuildScanSegment scans a portion of the table being indexed, a "segment" - this is the heart of the division of labour within the parallel scan. Local Sort is just a normal sort node except it outputs a file, not a stream of sorted tuples. Parallel Sort Merge waits for all Local Sorts to finish and then merges the inputs using existing sort merge logic. Now I'm sure you can see there are ways to improve that plan using shmem. But the above is a workable solution for 9.5 within the time available. We'll learn much making this happen and it will fuel everyone for more work in the following release. For the next release, I suggest that || hash join is more useful goal than || seq scan since it allows typical queries on big tables to be optimized, but we can discuss that in more detail another time. Happy to have a more detailed phone call or meeting to discuss. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Oct 29, 2014 at 2:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
>
> My proposal is we do a parallel index build scan... just as we
> discussed earlier, so you would be following the direction set by Dev
> Meeting, not just a proposal of mine.
>
> As I mentioned previously when you started discussing shared memory
> segments, parallel sort does NOT require shared memory. The only thing
> you need to share are files. Split the problem into N pieces, sort
> them to produce N files and then merge the files using existing code.
> That only applies to large sorts, but then those are the ones you
> cared about doing in parallel anyway.
>
> CREATE INDEX has a lock on the table. Worker tasks can be simply
> banned from acquiring new locks and doing many other tasks.
>
> My proposal is we do a parallel index build scan... just as we
> discussed earlier, so you would be following the direction set by Dev
> Meeting, not just a proposal of mine.
>
> As I mentioned previously when you started discussing shared memory
> segments, parallel sort does NOT require shared memory. The only thing
> you need to share are files. Split the problem into N pieces, sort
> them to produce N files and then merge the files using existing code.
> That only applies to large sorts, but then those are the ones you
> cared about doing in parallel anyway.
>
> CREATE INDEX has a lock on the table. Worker tasks can be simply
> banned from acquiring new locks and doing many other tasks.
Banning worker tasks from taking locks is only possible if the
worker backend doesn't need to acquire any lock which is not
already acquired by main backend, but can we safely assume
the same? One example as taken by Robert upthread about
bttextcmp() can break that assumption.
On Wed, Oct 29, 2014 at 4:48 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > If you do wish to pursue || Seq Scan, then a working prototype would > help. It allows us to see that there is an open source solution we are > working to solve the problems for. People can benchmark it, understand > the benefits and issues it raises and that would help focus attention > on the problems you are trying to solve in infrastructure. People may > have suggestions on how to solve or avoid those that you hadn't > thought of. I've mulled that over a bit and it might be worth pursuing further. Of course there's always the trade-off: doing that means not doing something else. > As I mentioned previously when you started discussing shared memory > segments, parallel sort does NOT require shared memory. The only thing > you need to share are files. Split the problem into N pieces, sort > them to produce N files and then merge the files using existing code. > That only applies to large sorts, but then those are the ones you > cared about doing in parallel anyway. A simple implementation of this would work only for simple pass-by-value types, like integers. Pass-by-reference types require the comparator to de-TOAST, and some other types require catalog lookups. I don't think that's very useful: Noah previously did some analysis of this problem and concluded (with apologies if I'm remember the details incorrectly here) that the comparator for strings was something like 1000x as expensive as the comparator for integers, and that you basically couldn't get the latter to take enough time to be worth parallelizing. I care much more about getting the general infrastructure in place to make parallel programming feasible in PostgreSQL than I do about getting one particular case working. And more than feasible: I want it to be relatively straightforward. That's not simple, but the potential rewards are great. Let's face it: there are people here who are much better than I am at hacking on the planner and especially the executor than I am. Why haven't any of those people implemented parallel anything? I think it's because, right now, it's just too darn hard. I'm trying to reduce that to something approaching the difficulty of writing normal PostgreSQL backend code, and I think I'm 6-12 patches away from that. This is one of them and, yeah, it's not done, and, yeah, we might not get to parallel anything this release and, yeah, things would be going faster if I could work on parallelism full time. But I think that the progress we are making is meaningful and the goal is within sight. I appreciate that you'd probably attack this problem from a different direction than I'm attacking it from, but I still think that what I'm trying to do is a legitimate direction of attack which, by the way, does not preclude anybody else from attacking it from a different direction and, indeed, such a development would be most welcome. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 29 October 2014 12:08, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Wed, Oct 29, 2014 at 2:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote: >> >> My proposal is we do a parallel index build scan... just as we >> discussed earlier, so you would be following the direction set by Dev >> Meeting, not just a proposal of mine. >> >> As I mentioned previously when you started discussing shared memory >> segments, parallel sort does NOT require shared memory. The only thing >> you need to share are files. Split the problem into N pieces, sort >> them to produce N files and then merge the files using existing code. >> That only applies to large sorts, but then those are the ones you >> cared about doing in parallel anyway. >> >> CREATE INDEX has a lock on the table. Worker tasks can be simply >> banned from acquiring new locks and doing many other tasks. > > Banning worker tasks from taking locks is only possible if the > worker backend doesn't need to acquire any lock which is not > already acquired by main backend, but can we safely assume > the same? One example as taken by Robert upthread about > bttextcmp() can break that assumption. I think its reasonable to imagine some prep work will be required in the main task before workers begin. Locking the toast table of any main tables we access seems easily done. Though perhaps we should make weak locking of the toast table presumed. Do we have cases where the toast table can be accessed when the main table is not also strong locked first? As for locking the enums table or collation table, that's simple stuff also. They're just AccessShareLocks. I can't imagine we'll be able to presume that all functions of every kind are parallel-safe. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 29 October 2014 12:28, Robert Haas <robertmhaas@gmail.com> wrote: > I care much more about getting the general infrastructure in place to > make parallel programming feasible in PostgreSQL than I do about > getting one particular case working. And more than feasible: I want > it to be relatively straightforward. That's not simple, but the > potential rewards are great. Let's face it: there are people here who > are much better than I am at hacking on the planner and especially the > executor than I am. Why haven't any of those people implemented > parallel anything? I think it's because, right now, it's just too > darn hard. I'm trying to reduce that to something approaching the > difficulty of writing normal PostgreSQL backend code, and I think I'm > 6-12 patches away from that. This is one of them and, yeah, it's not > done, and, yeah, we might not get to parallel anything this release > and, yeah, things would be going faster if I could work on parallelism > full time. But I think that the progress we are making is meaningful > and the goal is within sight. The role of an immediate working solution is to prove the infrastructure so far is genuinely useful, helping also to build interest and understanding. There is a real danger that your "ta-dah" moment sometime in the future contains flaws which need to be addressed, but we now have piles of questionable infrastructure lieing around. If you have similar doubts about anything I'm doing, please just ask. If we could see the 6-12 patch descriptions and understand where you are going it would help. > I appreciate that you'd probably attack this problem from a different > direction than I'm attacking it from, but I still think that what I'm > trying to do is a legitimate direction of attack which, by the way, > does not preclude anybody else from attacking it from a different > direction and, indeed, such a development would be most welcome. Well, I don't have time; been a little busy these last 10 years and for at least one year more yet before projects open up again. If I did, I don't think it would be that useful to interfere. Cooperation seems better use of my time, if possible. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Oct 29, 2014 at 10:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > There is a real danger that your "ta-dah" moment sometime in the > future contains flaws which need to be addressed, but we now have > piles of questionable infrastructure lieing around. If you have > similar doubts about anything I'm doing, please just ask. I think the implication that the infrastructure created to date is of questionable utility for parallelism is false and unjustifed. Every infrastructure patch thus far committed, and every patch proposed other than this one, has been accompanied by code that demonstrates what it does and how that's useful. Most of them have had enough interest by extension authors to generate bug reports from other people - most notably Petr, who has obviously been playing with all of the existing modules enough to notice bugs and come to conclusions about the shortcomings of the existing technology similar to my own. Interest in dynamic background workers seems to go considerably further, to the point where it made 9.4's list of major features. > If we could see the 6-12 patch descriptions and understand where you > are going it would help. OK, here's what I think we need (roughly): 1. The remaining pg_background patches. The main things in there are (a) the ability to propagate errors from a background worker back to the user backend and (b) the ability to synchronize the background worker's GUC state with the user backend. 2. Group locking. 3. Patches to synchronize the (a) snapshot, (b) combo CID hash, and (c) transaction state from the user backend to a background worker. Draft patches for (a) and (b) are already completed yet, but not published because I didn't think anyone would be very excited about them just yet. 4. A patch to add "parallel mode" in which a whole bunch of things are prohibited in the user backend and even more things are prohibited in the background worker - specifically, things that won't be safe in those contexts. Noah and Amit Kapila worked on this and we have draft patches for this as well, also not published and for the same reasons. 5. A patch to add a "proisparallel" flag or similar so that we can label parallel-safe functions. 6. A patch to add a way for a background worker that hits an error to kick the user backend, probably via the ProcSignal mechanism, so that the user doesn't have to poll the shm_mq structures it's using to communicate with those backends for errors continuously (or alternative sit idle while those background workers do all the work). This might be something that can be skipped for early versions, but eventually I think we'll need it. 7. Eventually, a dynamic shared memory allocator. A lot of things can be done without this, but some can't. I think that's most of it. No doubt we'll discover a few other needs along the way, but if we had that infrastructure, then I think it would be quite feasible for an experienced hacker to parallelize most of what we want to parallelize without having to invent too much that's really fundamental. > If I did, I don't think it would be that useful to interfere. > Cooperation seems better use of my time, if possible. I am not asking anyone to interfere, in the sense of getting in the way, but I certainly think it would be useful for people to contribute code and/or reviews. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 29 October 2014 15:43, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Oct 29, 2014 at 10:21 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> There is a real danger that your "ta-dah" moment sometime in the >> future contains flaws which need to be addressed, but we now have >> piles of questionable infrastructure lieing around. If you have >> similar doubts about anything I'm doing, please just ask. > > I think the implication that the infrastructure created to date is of > questionable utility for parallelism is false and unjustifed. Huh? I haven't implied that. Thanks for your wider explanations. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Oct 29, 2014 at 7:38 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 29 October 2014 12:08, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Wed, Oct 29, 2014 at 2:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >>
> >> CREATE INDEX has a lock on the table. Worker tasks can be simply
> >> banned from acquiring new locks and doing many other tasks.
> >
> > Banning worker tasks from taking locks is only possible if the
> > worker backend doesn't need to acquire any lock which is not
> > already acquired by main backend, but can we safely assume
> > the same? One example as taken by Robert upthread about
> > bttextcmp() can break that assumption.
>
> I think its reasonable to imagine some prep work will be required in
> the main task before workers begin.
>
> Locking the toast table of any main tables we access seems easily
> done. Though perhaps we should make weak locking of the toast table
> presumed. Do we have cases where the toast table can be accessed when
> the main table is not also strong locked first?
I think it is possible to have a strong lock on toast table before
> As for locking the enums table or collation table, that's simple stuff
> also. They're just AccessShareLocks.
Yeah, taking AccessShareLocks is not a problem, however the main
> I can't imagine we'll be able to presume that all functions of every
> kind are parallel-safe.
Yeah, thats right, so we need to either block such functions to get executed
> On 29 October 2014 12:08, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Wed, Oct 29, 2014 at 2:18 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >>
> >> CREATE INDEX has a lock on the table. Worker tasks can be simply
> >> banned from acquiring new locks and doing many other tasks.
> >
> > Banning worker tasks from taking locks is only possible if the
> > worker backend doesn't need to acquire any lock which is not
> > already acquired by main backend, but can we safely assume
> > the same? One example as taken by Robert upthread about
> > bttextcmp() can break that assumption.
>
> I think its reasonable to imagine some prep work will be required in
> the main task before workers begin.
>
> Locking the toast table of any main tables we access seems easily
> done. Though perhaps we should make weak locking of the toast table
> presumed. Do we have cases where the toast table can be accessed when
> the main table is not also strong locked first?
I think it is possible to have a strong lock on toast table before
main table.
Cluster pg_toast.<toast_table_name> using <toast_index>;
Cluster pg_toast.<toast_table_name> using <toast_index>;
Vacuum Full pg_toast.<toast_table_name>;
Reindex Index pg_toast.<toast_index_name>;
..
Now if take the lock on toast table in main task, it will block some of
the operations before actually they need to be blocked.
> As for locking the enums table or collation table, that's simple stuff
> also. They're just AccessShareLocks.
Yeah, taking AccessShareLocks is not a problem, however the main
problem is that they block any other operation on those tables which require
AccessExclusiveLock lock or any other strong lock required, before it is
actually required to block any such operation.
> I can't imagine we'll be able to presume that all functions of every
> kind are parallel-safe.
Yeah, thats right, so we need to either block such functions to get executed
in parallel worker or make arrangements so that they can be safely executed,
I think for first version blocking might be better.
Now, I think we can find ways to avoid all such things by hacking code such
that conflicting operations can be banned/blocked, however eventually we
need to have some kind of group locking to avoid deadlocks which can arise
due to operations in parallel worker. So it seems to me it is not a bad idea
to have a strong infrastructure first for the things (like group locking) which
we think are required for any non-trivial useful parallel operation without
putting too much restrictions on users for using the feature.
On 30 October 2014 04:24, Amit Kapila <amit.kapila16@gmail.com> wrote: >> Locking the toast table of any main tables we access seems easily >> done. Though perhaps we should make weak locking of the toast table >> presumed. Do we have cases where the toast table can be accessed when >> the main table is not also strong locked first? > > I think it is possible to have a strong lock on toast table before > main table. > Cluster pg_toast.<toast_table_name> using <toast_index>; > Vacuum Full pg_toast.<toast_table_name>; > Reindex Index pg_toast.<toast_index_name>; > .. > > Now if take the lock on toast table in main task, it will block some of > the operations before actually they need to be blocked. Very strange. Those commands are not common operations? I doubt many people even know that exists. All of those commands would block SELECTs on the main table, so there is no significant benefit in that behaviour. In fact it would be more sensible to lock the toast table earlier. >> As for locking the enums table or collation table, that's simple stuff >> also. They're just AccessShareLocks. > > Yeah, taking AccessShareLocks is not a problem, however the main > problem is that they block any other operation on those tables which require > AccessExclusiveLock lock or any other strong lock required, before it is > actually required to block any such operation. What operations are those? How frequently do they occur? I can't see any need for major maintenance operations on small catalog tables. >> I can't imagine we'll be able to presume that all functions of every >> kind are parallel-safe. > > Yeah, thats right, so we need to either block such functions to get executed > in parallel worker or make arrangements so that they can be safely executed, > I think for first version blocking might be better. I think that also. That way we won't need complex locking. > Now, I think we can find ways to avoid all such things by hacking code such > that conflicting operations can be banned/blocked, however eventually we > need to have some kind of group locking to avoid deadlocks which can arise > due to operations in parallel worker. So it seems to me it is not a bad > idea > to have a strong infrastructure first for the things (like group locking) > which > we think are required for any non-trivial useful parallel operation without > putting too much restrictions on users for using the feature. We are all agreed we need to block some functions from executing in parallel mode. Why don't we start by making a list of the functions that might cause problems in parallel workers, and why. That way we might find out easier ways of doing things. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Oct 30, 2014 at 2:28 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 30 October 2014 04:24, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >> Locking the toast table of any main tables we access seems easily
> >> done. Though perhaps we should make weak locking of the toast table
> >> presumed. Do we have cases where the toast table can be accessed when
> >> the main table is not also strong locked first?
> >
> > I think it is possible to have a strong lock on toast table before
> > main table.
> > Cluster pg_toast.<toast_table_name> using <toast_index>;
> > Vacuum Full pg_toast.<toast_table_name>;
> > Reindex Index pg_toast.<toast_index_name>;
> > ..
> >
> > Now if take the lock on toast table in main task, it will block some of
> > the operations before actually they need to be blocked.
>
> Very strange.
>
> Those commands are not common operations? I doubt many people even
> know that exists.
>
> All of those commands would block SELECTs on the main table, so there
> is no significant benefit in that behaviour.
>
> In fact it would be more sensible to lock the toast table earlier.
>
> On 30 October 2014 04:24, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> >> Locking the toast table of any main tables we access seems easily
> >> done. Though perhaps we should make weak locking of the toast table
> >> presumed. Do we have cases where the toast table can be accessed when
> >> the main table is not also strong locked first?
> >
> > I think it is possible to have a strong lock on toast table before
> > main table.
> > Cluster pg_toast.<toast_table_name> using <toast_index>;
> > Vacuum Full pg_toast.<toast_table_name>;
> > Reindex Index pg_toast.<toast_index_name>;
> > ..
> >
> > Now if take the lock on toast table in main task, it will block some of
> > the operations before actually they need to be blocked.
>
> Very strange.
>
> Those commands are not common operations? I doubt many people even
> know that exists.
>
> All of those commands would block SELECTs on the main table, so there
> is no significant benefit in that behaviour.
>
> In fact it would be more sensible to lock the toast table earlier.
>
It might make some sense to lock the toast table earlier for this
particular case, but I don't think in general it will be feasible to lock
all the tables (including catalog tables) which might get used in the
overall operation before starting parallel operation. I believe it will
make doing parallel stuff difficult if we try to go via this route, as
we need to always do this for any operation we want to make parallel.
It will always have the risk that we might miss something and another
thing is it might not be feasible in all kind of cases.
If I understand correctly, you are suggesting that to get the first version
of parallel operation out earlier, we should try to avoid all the stuff
(like group locking, ...), have some restrictions on which kind of cases
Create Index can work, do some hackery in Create Index path to avoid
any kind of locking problems and do the parallel operation for External
Sort (which might avoid need for shared memory allocator) and then
call it a day and then do the things which we need for other parallel
operations as and when they are required. I think this might be a good
approach in itself if somebody wants to target something to release
early, however in medium to short term when we want to provide
non-trivial parallel operations we have to eventually solve those problems
and delaying will only make the things worse.
In short, I agree that there is a merit in your idea w.r.t getting the first
version of parallel operation out early, however if we see from medium
to long term, I feel solving group locking problem (or some other general
infrastructure what Robert mentioned upthread) can yield better results,
unless you feel that is not at all required for parallel operations.
On 31 October 2014 04:42, Amit Kapila <amit.kapila16@gmail.com> wrote: >> In fact it would be more sensible to lock the toast table earlier. >> > > It might make some sense to lock the toast table earlier for this > particular case, but I don't think in general it will be feasible to lock > all the tables (including catalog tables) which might get used in the > overall operation before starting parallel operation. I believe it will > make doing parallel stuff difficult if we try to go via this route, as > we need to always do this for any operation we want to make parallel. > It will always have the risk that we might miss something and another > thing is it might not be feasible in all kind of cases. > > If I understand correctly, you are suggesting that to get the first version > of parallel operation out earlier, we should try to avoid all the stuff > (like group locking, ...), I advocate trying to avoid it, to see. The problems cited so far are not so extreme they cannot be easily got around, if you have a will to do so. I've just posted about an idea to reduce catalog locking. > have some restrictions on which kind of cases > Create Index can work, do some hackery in Create Index path to avoid > any kind of locking problems and do the parallel operation for External > Sort (which might avoid need for shared memory allocator) and then > call it a day and then do the things which we need for other parallel > operations as and when they are required. > I think this might be a good > approach in itself if somebody wants to target something to release > early, however in medium to short term when we want to provide > non-trivial parallel operations we have to eventually solve those problems > and delaying will only make the things worse. My (by now) long experience on Postgres has shown me that developing something now is a good route to take. By moving towards a solution at a reasonable pace you learn things which may affect the longer term viability of the project. New functionality in each release is useful. Big bang developments that don't deliver until the end have a habit of not delivering at the end either. "MVP" is a term frequently used in VCs for a reason. Lack of urgency in delivering a useful outcome from the project looks strange. If all you develop in this release is core infrastructure for parallelism and a custom scan API, it begins to look like there may not be an open source version delivered at all. How people make their income is up to them, but its a concern that's been raised my way before, so it seems reasonable to do so for others also. That is why we have spent much time explaining clearly the goals and licencing of BDR, for example. No need for big arguments; these are not allegations or implications etc.. > In short, I agree that there is a merit in your idea w.r.t getting the first > version of parallel operation out early, however if we see from medium > to long term, I feel solving group locking problem (or some other general > infrastructure what Robert mentioned upthread) can yield better results, > unless you feel that is not at all required for parallel operations. Is it genuinely required for most parallel operations? I think it's clear that none of us knows the answer. Sure, the general case needs it, but is the general case the same thing as the reasonably common case? Certainly, having a working prototype for something useful would help build the case for why these things are needed. It would certainly help the cause to have something working in this release. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Oct 31, 2014 at 6:41 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Is it genuinely required for most parallel operations? I think it's > clear that none of us knows the answer. Sure, the general case needs > it, but is the general case the same thing as the reasonably common > case? Well, I think that the answer is pretty clear. Most of the time, perhaps in 99.9% of cases, group locking will make no difference as to whether a parallel operation succeeds or fails. Occasionally, however, it will cause an undetected deadlock. I don't hear anyone arguing that that's OK. Does anyone wish to make that argument? If not, then we must prevent it. The only design, other than what I've proposed here, that seems like it will do that consistently in all cases is to have the user backend lock every table that the child backend might possibly want to lock and retain those locks throughout the entire duration of the computation whether the child would actually need those locks or not. I think that could be made to work, but there are two probems: 1. Turing's theorem being what it is, predicting what catalog tables the child might lock is not necessarily simple. 2. It might end up taking any more locks than necessary and holding them for much longer than necessary. Right now, for example, a syscache lookup locks the table only if we actually need to read from it and releases the lock as soon as the actual read is complete. Under this design, every syscache that the parallel worker might conceivably consult needs to be locked for the entire duration of the parallel computation. I would expect this to provoke a violent negative reaction from at least one prominent community member (and I bet users wouldn't like it much, either). So, I am still of the opinion that group locking makes sense. While I appreciate the urge to avoid solving difficult problems where it's reasonably avoidable, I think we're in danger of spending more effort trying to avoid solving this particular problem than it would take to actually solve it. Based on what I've done so far, I'm guessing that a complete group locking patch will be between 1000 and 1500 lines of code and that nearly all of the new logic will be skipped when it's not in use (i.e. no parallelism). That sounds to me like a hell of a deal compared to trying to predict what locks the child might conceivably take and preemptively acquire them all, which sounds annoyingly tedious even for simple things (like equality operators) and nearly impossible for anything more complicated. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-10-31 08:54:54 -0400, Robert Haas wrote: > On Fri, Oct 31, 2014 at 6:41 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > Is it genuinely required for most parallel operations? I think it's > > clear that none of us knows the answer. Sure, the general case needs > > it, but is the general case the same thing as the reasonably common > > case? > > Well, I think that the answer is pretty clear. Most of the time, > perhaps in 99.9% of cases, group locking will make no difference as to > whether a parallel operation succeeds or fails. Occasionally, > however, it will cause an undetected deadlock. I don't hear anyone > arguing that that's OK. Does anyone wish to make that argument? Maybe we can, as a first step, make those edges in the lock graph visible to the deadlock detector? It's pretty clear that undetected deadlocks aren't ok, but detectable deadlocks in a couple corner cases might be acceptable. > If not, then we must prevent it. The only design, other than what > I've proposed here, that seems like it will do that consistently in > all cases is to have the user backend lock every table that the child > backend might possibly want to lock and retain those locks throughout > the entire duration of the computation whether the child would > actually need those locks or not. I think that could be made to work, > but there are two probems: > > 1. Turing's theorem being what it is, predicting what catalog tables > the child might lock is not necessarily simple. Well, there's really no need to be absolutely general here. We're only going to support a subset of functionality as parallelizable. And in that case we don't need anything complicated to acquire all locks. It seems quite possible to combine a heuristic approach with improved deadlock detection. > 2. It might end up taking any more locks than necessary and holding > them for much longer than necessary. Right now, for example, a > syscache lookup locks the table only if we actually need to read from > it and releases the lock as soon as the actual read is complete. > Under this design, every syscache that the parallel worker might > conceivably consult needs to be locked for the entire duration of the > parallel computation. I would expect this to provoke a violent > negative reaction from at least one prominent community member (and I > bet users wouldn't like it much, either). I see little reason to do this for system level relations. > So, I am still of the opinion that group locking makes sense. While > I appreciate the urge to avoid solving difficult problems where it's > reasonably avoidable, I think we're in danger of spending more effort > trying to avoid solving this particular problem than it would take to > actually solve it. Based on what I've done so far, I'm guessing that > a complete group locking patch will be between 1000 and 1500 lines of > code and that nearly all of the new logic will be skipped when it's > not in use (i.e. no parallelism). That sounds to me like a hell of a > deal compared to trying to predict what locks the child might > conceivably take and preemptively acquire them all, which sounds > annoyingly tedious even for simple things (like equality operators) > and nearly impossible for anything more complicated. What I'm worried about is the performance impact of group locking when it's not in use. The heavyweight locking code is already quite complex and often a noticeable bottleneck... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Oct 31, 2014 at 9:07 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-10-31 08:54:54 -0400, Robert Haas wrote: >> On Fri, Oct 31, 2014 at 6:41 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> > Is it genuinely required for most parallel operations? I think it's >> > clear that none of us knows the answer. Sure, the general case needs >> > it, but is the general case the same thing as the reasonably common >> > case? >> >> Well, I think that the answer is pretty clear. Most of the time, >> perhaps in 99.9% of cases, group locking will make no difference as to >> whether a parallel operation succeeds or fails. Occasionally, >> however, it will cause an undetected deadlock. I don't hear anyone >> arguing that that's OK. Does anyone wish to make that argument? > > Maybe we can, as a first step, make those edges in the lock graph > visible to the deadlock detector? It's pretty clear that undetected > deadlocks aren't ok, but detectable deadlocks in a couple corner cases > might be acceptable. Interesting idea. I agree that would be more acceptable. It would be kind of sad, though, if you got a deadlock from this: BEGIN; LOCK TABLE foo; SELECT * FROM foo; You can, of course, avoid that, but it's basically transferring the burden from the lock manager to every bit of parallel code we ever write. If it turns out to be easier to do this than what I'm currently proposing, I'm definitely amenable to going this route for version 1, but I don't think it's going to be appealing as a permanent solution. >> 1. Turing's theorem being what it is, predicting what catalog tables >> the child might lock is not necessarily simple. > > Well, there's really no need to be absolutely general here. We're only > going to support a subset of functionality as parallelizable. And in > that case we don't need anything complicated to acquire all locks. See, that's what I don't believe. Even very simple things like equality operators do a surprising amount of catalog locking. >> 2. It might end up taking any more locks than necessary and holding >> them for much longer than necessary. Right now, for example, a >> syscache lookup locks the table only if we actually need to read from >> it and releases the lock as soon as the actual read is complete. >> Under this design, every syscache that the parallel worker might >> conceivably consult needs to be locked for the entire duration of the >> parallel computation. I would expect this to provoke a violent >> negative reaction from at least one prominent community member (and I >> bet users wouldn't like it much, either). > > I see little reason to do this for system level relations. I'm not following you. >> So, I am still of the opinion that group locking makes sense. While >> I appreciate the urge to avoid solving difficult problems where it's >> reasonably avoidable, I think we're in danger of spending more effort >> trying to avoid solving this particular problem than it would take to >> actually solve it. Based on what I've done so far, I'm guessing that >> a complete group locking patch will be between 1000 and 1500 lines of >> code and that nearly all of the new logic will be skipped when it's >> not in use (i.e. no parallelism). That sounds to me like a hell of a >> deal compared to trying to predict what locks the child might >> conceivably take and preemptively acquire them all, which sounds >> annoyingly tedious even for simple things (like equality operators) >> and nearly impossible for anything more complicated. > > What I'm worried about is the performance impact of group locking when > it's not in use. The heavyweight locking code is already quite complex > and often a noticeable bottleneck... I certainly agree that it would be unacceptable for group locking to significantly regress the normal case. I'm pretty optimistic about reducing the overhead to near-zero, though - a few extra branch instructions on the non-fast-path case should be lost in the noise. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-10-31 10:08:59 -0400, Robert Haas wrote: > On Fri, Oct 31, 2014 at 9:07 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2014-10-31 08:54:54 -0400, Robert Haas wrote: > >> On Fri, Oct 31, 2014 at 6:41 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> > Is it genuinely required for most parallel operations? I think it's > >> > clear that none of us knows the answer. Sure, the general case needs > >> > it, but is the general case the same thing as the reasonably common > >> > case? > >> > >> Well, I think that the answer is pretty clear. Most of the time, > >> perhaps in 99.9% of cases, group locking will make no difference as to > >> whether a parallel operation succeeds or fails. Occasionally, > >> however, it will cause an undetected deadlock. I don't hear anyone > >> arguing that that's OK. Does anyone wish to make that argument? > > > > Maybe we can, as a first step, make those edges in the lock graph > > visible to the deadlock detector? It's pretty clear that undetected > > deadlocks aren't ok, but detectable deadlocks in a couple corner cases > > might be acceptable. > > Interesting idea. I agree that would be more acceptable. It would be > kind of sad, though, if you got a deadlock from this: > > BEGIN; > LOCK TABLE foo; > SELECT * FROM foo; I have serious doubts about the number of cases where it's correct to access relations in a second backend that are exclusively locked. Not so much when that happens for a user issued LOCK statement of course, but when the system has done so. Personally I think to stay sane we'll have to forbid access to anything normal other backends can't access either - otherwise things will get too hard to reason about. So just refusing parallelism as soon as anything has taken an access exclusive lock doesn't sound too bad to me. > You can, of course, avoid that, but it's basically transferring the > burden from the lock manager to every bit of parallel code we ever > write. If it turns out to be easier to do this than what I'm > currently proposing, I'm definitely amenable to going this route for > version 1, but I don't think it's going to be appealing as a permanent > solution. It's probably not the solution forever, I agree. But it might be the simplest way forward till we have more actual users. > >> 1. Turing's theorem being what it is, predicting what catalog tables > >> the child might lock is not necessarily simple. > > > > Well, there's really no need to be absolutely general here. We're only > > going to support a subset of functionality as parallelizable. And in > > that case we don't need anything complicated to acquire all locks. > > See, that's what I don't believe. Even very simple things like > equality operators do a surprising amount of catalog locking. So what? I don't see catalog locking as something problematic? Maybe I'm missing something, but I don't see cases would you expect deadlocks or anything like it on catalog relations? > >> So, I am still of the opinion that group locking makes sense. While > >> I appreciate the urge to avoid solving difficult problems where it's > >> reasonably avoidable, I think we're in danger of spending more effort > >> trying to avoid solving this particular problem than it would take to > >> actually solve it. Based on what I've done so far, I'm guessing that > >> a complete group locking patch will be between 1000 and 1500 lines of > >> code and that nearly all of the new logic will be skipped when it's > >> not in use (i.e. no parallelism). That sounds to me like a hell of a > >> deal compared to trying to predict what locks the child might > >> conceivably take and preemptively acquire them all, which sounds > >> annoyingly tedious even for simple things (like equality operators) > >> and nearly impossible for anything more complicated. > > > > What I'm worried about is the performance impact of group locking when > > it's not in use. The heavyweight locking code is already quite complex > > and often a noticeable bottleneck... > > I certainly agree that it would be unacceptable for group locking to > significantly regress the normal case. I'm pretty optimistic about > reducing the overhead to near-zero, though - a few extra branch > instructions on the non-fast-path case should be lost in the noise. I'm less worried about the additional branches than about increasing the size of the datastructures. They're already pretty huge. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 31 October 2014 12:54, Robert Haas <robertmhaas@gmail.com> wrote: > 1. Turing's theorem being what it is, predicting what catalog tables > the child might lock is not necessarily simple. The Pareto principle offers ways to cope with the world's lack of simplicity. You mentioned earlier that functions would need to be marked proisparallel etc.. What conditions will that be protecting against? If we aren't going to support the general case where every single thing works, can we at least discuss what the list of cases is that we will support. I don't think we can argue that everything must be generic when we already admit it won't be. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Oct 31, 2014 at 10:38 AM, Andres Freund <andres@2ndquadrant.com> wrote: > I have serious doubts about the number of cases where it's correct to > access relations in a second backend that are exclusively locked. Not so > much when that happens for a user issued LOCK statement of course, but > when the system has done so. Personally I think to stay sane we'll have > to forbid access to anything normal other backends can't access either - > otherwise things will get too hard to reason about. > > So just refusing parallelism as soon as anything has taken an access > exclusive lock doesn't sound too bad to me. That restriction seems onerous to me; for example, it would prevent a parallel sort for CLUSTER or a parallel index rebuild for VACUUM FULL. Those seems like worthy targets for parallelism, and maybe even early targets, since I expect a lot of the real complexity here to be in the query planner, and you can avoid most of that complexity when planning DDL. I also think it's unnecessary. It's wrong to think of two parallel backends that both want AccessExclusiveLock on a given target relation as conflicting with each other; if the process were not running in parallel, those lock requests would both have come from the same process and both requests would have been granted. Parallelism is generally not about making different things happen; it's about spreading the stuff that would happen anyway across multiple backends, and things that would succeed if done in a single backend shouldn't fail when spread across 2 or more without some compelling justification. The cases where it's actually unsafe for a table to be accessed even by some other code within the same backend are handled not by the lock manager, but by CheckTableNotInUse(). That call should probably fail categorically if issued while parallelism is active. >> >> 1. Turing's theorem being what it is, predicting what catalog tables >> >> the child might lock is not necessarily simple. >> > >> > Well, there's really no need to be absolutely general here. We're only >> > going to support a subset of functionality as parallelizable. And in >> > that case we don't need anything complicated to acquire all locks. >> >> See, that's what I don't believe. Even very simple things like >> equality operators do a surprising amount of catalog locking. > > So what? I don't see catalog locking as something problematic? Maybe I'm > missing something, but I don't see cases would you expect deadlocks or > anything like it on catalog relations? Suppose somebody fires off a parallel sort on a text column, or a parallel sequential scan involving a qual of the form textcol = 'zog'. We launch a bunch of workers to do comparisons; they do lookups against pg_collation. After some but not all of them have loaded the collation information they need into their local cache, the DBA types "cluster pg_collate". It queues up for an AccessExclusiveLock. The remaining parallel workers join the wait queue waiting for their AccessShareLock. The system is now deadlocked; the only solution is to move the parallel workers ahead of the AccessExclusiveLock request, but the deadlock detector won't do that unless it knows about the relationship among the parallel workers. It's possible to construct more obscure cases as well. For example, suppose a user (for reasons known only to himself and God) does LOCK TABLE pg_opclass and then begins a sort of a column full of enums. Meanwhile, another user tries to CLUSTER pg_enum. This will deadlock if, and only if, some of the enum OIDs are odd. I mention this example not because I think it's a particularly useful case but because it illustrates just how robust the current system is: it will catch that case, and break the deadlock somehow, and you as a PostgreSQL backend hacker do not need to think about it. The guy who added pg_enum did not need to think about it. It just works. If we decide that parallelism is allowed to break that guarantee, then every patch that does parallel anything has got to worry about what undetected deadlocks might result, and then argue about which of them are obscure enough that we shouldn't care and which are not. That doesn't sound like a fun way to spend my time. But, really, the first case is the one I'm more worried about: a bunch of parallel backends all grab the same locks, but slightly separated in time. A strong locker joins the queue midway through and we're hosed. Obviously any system cache lookups whatsoever are going to be separated in time like this, just because the world isn't synchronous. The pg_enum case is an example of how the lookups could be more widely spaced: each backend will scan pg_enum when it first hits an odd OID, and that could happen much later for some than others. But even a small window is enough to render undetected deadlock a possibility, and you will not convince me that "hope the race condition is narrow enough in practice that this never happens" is a remotely sane strategy. Down the road, it's not hard to imagine the same issues cropping up with user tables. For example, the user defines a non-inlinable SQL or PL/pgsql function getname() that does SELECT name FROM other_table WHERE id = $1 and returns the result. They mark this function parallel safe, or we infer that automatically. Then they do SELECT * FROM some_table WHERE somecol = 242 AND getname(othercol) ~ 'cranberry'. There is nothing whatsoever to make this query non-parallelizable; indeed, it's probably an excellent candidate for parellelism. But if you don't have group locking then it is again susceptible to the risk of undetected deadlock as soon as an unrelated process tries to grab AccessExclusiveLock on other_table, because it may be that some parallel backends already have AccessShareLock on it, and that others will request it later, queueing up behind the AccessExcusiveLock request. Even if you think that all of these particular issues are somehow avoidable or that they don't matter, I think it would be unwise to bet on them being the only issues. The lock manager looks like it's old, ill-maintained code, but that's in no small part because it solved the problem it aims at so thoroughly that few people have felt the need to modify it very much in the last decade. I think that's part of why there's so much skepticism on this thread - we don't think about the problems that the lock manager solves as being important problems not because they truly aren't important, but because they are so thoroughly solved. Then every once in a while we complain about the performance. > I'm less worried about the additional branches than about increasing the > size of the datastructures. They're already pretty huge. I share that concern. I aimed for a design which would be memory-efficient and have low impact on the non-group-locking case rather than a design which would be ideal for group locking. The patch could be made to handle large locking groups more efficiently by changing the shared memory structure around; e.g. by replacing PROCLOCK's holdMask and waitMask fields, now both of type LOCKMASK, with int [MAX_LOCKMODES] so that we can represent the entire group's locks held and awaited on a single object with a single PROCLOCK. That representation would be significantly more convenient than the one I actually chose, but it would also use up a LOT more shared memory and would penalize LockReleaseAll(). The design embodied by the proposed patch adds one additional pointer per PROCLOCK, which is still not great, but I haven't been able to think of a reasonable way to do better. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 31 October 2014 18:29, Robert Haas <robertmhaas@gmail.com> wrote: > Suppose somebody fires off a parallel sort on a text column, or a > parallel sequential scan involving a qual of the form textcol = 'zog'. > We launch a bunch of workers to do comparisons; they do lookups > against pg_collation. After some but not all of them have loaded the > collation information they need into their local cache, the DBA types > "cluster pg_collate". It queues up for an AccessExclusiveLock. The > remaining parallel workers join the wait queue waiting for their > AccessShareLock. The system is now deadlocked; the only solution is to > move the parallel workers ahead of the AccessExclusiveLock request, > but the deadlock detector won't do that unless it knows about the > relationship among the parallel workers. It's an obscure case and its not the only solution either. I'm really surprised that having workers do their own locking doesn't scare you. Personally, it scares me. It's not like we're the first do parallel query. Honestly, how many parallel databases do you think do this? I can't see this being a practical, workable solution for running a parallel query across a cluster of many nodes. Shared, distributed lock table? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Oct 31, 2014 at 11:02 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > You mentioned earlier that functions would need to be marked proisparallel etc.. > > What conditions will that be protecting against? If we aren't going to > support the general case where every single thing works, can we at > least discuss what the list of cases is that we will support. In general, any operation that relies on backend-private state not shared with a cooperating backend isn't parallel-safe. For example, consider: SELECT setseed(1); SELECT random() FROM generate_series(1,1000) g; This sequence of queries can be relied upon to produce the same output every time. But if some of the random() calls are executed in another backend, you'll get different results because random() works by using backend-private memory to store its pseudo-random state. It's unlikely to be worth the effort to move the pseudo-random state to a DSM for parallelism, so we probably just want to disallow parallel query when random() is in use, or, better still, disallow parallelizing only the particular query node that uses random(). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-10-31 14:29:57 -0400, Robert Haas wrote: > On Fri, Oct 31, 2014 at 10:38 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > I have serious doubts about the number of cases where it's correct to > > access relations in a second backend that are exclusively locked. Not so > > much when that happens for a user issued LOCK statement of course, but > > when the system has done so. Personally I think to stay sane we'll have > > to forbid access to anything normal other backends can't access either - > > otherwise things will get too hard to reason about. > > > > So just refusing parallelism as soon as anything has taken an access > > exclusive lock doesn't sound too bad to me. > > That restriction seems onerous to me; for example, it would prevent a > parallel sort for CLUSTER or a parallel index rebuild for VACUUM FULL. > Those seems like worthy targets for parallelism, and maybe even early > targets, since I expect a lot of the real complexity here to be in the > query planner, and you can avoid most of that complexity when planning > DDL. Ugh. I think that's aiming too high. You'll suddenly need to share cche invalidations between the backends. I think we should start by requiring that there's no DDL done *at all* in the transaction that starts the parallel activity. For CREATE INDEX PARALLEL that can be done by reusing the logic we have for CONCURRENTLY, thereby moving the parallelized part into a transaction that hasn't done any DDL yet. > I also think it's unnecessary. It's wrong to think of two parallel > backends that both want AccessExclusiveLock on a given target relation > as conflicting with each other; if the process were not running in > parallel, those lock requests would both have come from the same > process and both requests would have been granted. I don't think that's a realistic view. There's lots of backend private state around. If we try to make all that coherent between backends we'll fail. > Parallelism is > generally not about making different things happen; it's about > spreading the stuff that would happen anyway across multiple backends, > and things that would succeed if done in a single backend shouldn't > fail when spread across 2 or more without some compelling > justification. The cases where it's actually unsafe for a table to be > accessed even by some other code within the same backend are handled > not by the lock manager, but by CheckTableNotInUse(). That call > should probably fail categorically if issued while parallelism is > active. Which will e.g. prohibit CLUSTER and VACUUM FULL. > >> >> 1. Turing's theorem being what it is, predicting what catalog tables > >> >> the child might lock is not necessarily simple. > >> > > >> > Well, there's really no need to be absolutely general here. We're only > >> > going to support a subset of functionality as parallelizable. And in > >> > that case we don't need anything complicated to acquire all locks. > >> > >> See, that's what I don't believe. Even very simple things like > >> equality operators do a surprising amount of catalog locking. > > > > So what? I don't see catalog locking as something problematic? Maybe I'm > > missing something, but I don't see cases would you expect deadlocks or > > anything like it on catalog relations? > > Suppose somebody fires off a parallel sort on a text column, or a > parallel sequential scan involving a qual of the form textcol = 'zog'. > We launch a bunch of workers to do comparisons; they do lookups > against pg_collation. After some but not all of them have loaded the > collation information they need into their local cache, the DBA types > "cluster pg_collate". It queues up for an AccessExclusiveLock. The > remaining parallel workers join the wait queue waiting for their > AccessShareLock. The system is now deadlocked; the only solution is to > move the parallel workers ahead of the AccessExclusiveLock request, > but the deadlock detector won't do that unless it knows about the > relationship among the parallel workers. I think you would need to make the case more complex - we only hold locks on system object for a very short time, and mostly not in a nested fashion. But generally I think it's absolutely perfectly alright to fail in such case. We need to fail reliably, but *none* of this is a new hazard. You can have pretty similar issues today, without any parallelism. > It's possible to construct more obscure cases as well. For example, > suppose a user (for reasons known only to himself and God) does LOCK > TABLE pg_opclass and then begins a sort of a column full of enums. > Meanwhile, another user tries to CLUSTER pg_enum. This will deadlock > if, and only if, some of the enum OIDs are odd. I mention this > example not because I think it's a particularly useful case but > because it illustrates just how robust the current system is: it will > catch that case, and break the deadlock somehow, and you as a > PostgreSQL backend hacker do not need to think about it. The guy who > added pg_enum did not need to think about it. It just works. If we > decide that parallelism is allowed to break that guarantee, then every > patch that does parallel anything has got to worry about what > undetected deadlocks might result, and then argue about which of them > are obscure enough that we shouldn't care and which are not. That > doesn't sound like a fun way to spend my time. Well, that's why I think we should teach the deadlock detector to catch these kind of cases. That doesn't sound particularly hard. I'm sorry to be a bit pessimistic here. But my intuition says that starting to do group locking opens a can of worms that'll take us a long time to close again. Maybe I'm just imagining complexity that won't actually be there. But I have a hard time believing that. I wonder if we couldn't implement a 'halfway' by allowing parallel workers to jump the lockqueue if the parent process already has the lock. That'd only work for nonconflicting locks, but I think that's actually good. > > I'm less worried about the additional branches than about increasing the > > size of the datastructures. They're already pretty huge. > > I share that concern. I aimed for a design which would be > memory-efficient and have low impact on the non-group-locking case > rather than a design which would be ideal for group locking. The > patch could be made to handle large locking groups more efficiently by > changing the shared memory structure around; e.g. by replacing > PROCLOCK's holdMask and waitMask fields, now both of type LOCKMASK, > with int [MAX_LOCKMODES] so that we can represent the entire group's > locks held and awaited on a single object with a single PROCLOCK. > That representation would be significantly more convenient than the > one I actually chose, but it would also use up a LOT more shared > memory and would penalize LockReleaseAll(). The design embodied by > the proposed patch adds one additional pointer per PROCLOCK, which is > still not great, but I haven't been able to think of a reasonable way > to do better. That's in the current prototype you sent or something newer you have privately? Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Oct 31, 2014 at 2:46 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 31 October 2014 18:29, Robert Haas <robertmhaas@gmail.com> wrote: >> Suppose somebody fires off a parallel sort on a text column, or a >> parallel sequential scan involving a qual of the form textcol = 'zog'. >> We launch a bunch of workers to do comparisons; they do lookups >> against pg_collation. After some but not all of them have loaded the >> collation information they need into their local cache, the DBA types >> "cluster pg_collate". It queues up for an AccessExclusiveLock. The >> remaining parallel workers join the wait queue waiting for their >> AccessShareLock. The system is now deadlocked; the only solution is to >> move the parallel workers ahead of the AccessExclusiveLock request, >> but the deadlock detector won't do that unless it knows about the >> relationship among the parallel workers. > > It's an obscure case and its not the only solution either. I don't think that's an obscure situation at all. Do you really think a patch that could cause an attempt to VACUUM FULL a system catalog to suffer an undetected deadlock meets this community's quality standards? Because that's what we're talking about. You are right that it isn't the only solution. I have said the same thing myself, multiple times, on this thread. > I'm really surprised that having workers do their own locking doesn't > scare you. Personally, it scares me. Frankly, the reverse decision scares me a heck of a lot more. It superficially seems like a good idea to have the master take locks on behalf of the workers, but when you start to really think about how many low-level parts of the code take locks, it quickly becomes evident, at least to me, that trying to make the resulting system robust will be a nightmare. Don't forget that there are not only relation locks but also page locks, tuple locks, relation extension locks, XID and VXID locks, locks on arbitrary database or shared objects. The deadlock detector handles them all. If avoiding deadlock outside the lock manager is so easy, why do we have a deadlock detector in the first place? Why not just change all of our other code to avoid them instead? > It's not like we're the first do parallel query. Honestly, how many > parallel databases do you think do this? All of them. I might be wrong, of course, but I see no reason at all to believe that Oracle or SQL Server have ignoring deadlock detection when parallel query is in use. > I can't see this being a practical, workable solution for running a > parallel query across a cluster of many nodes. Shared, distributed > lock table? I am not proposing a distributed lock manager that can run across many nodes. The considerations for such a piece of software are largely orthogonal to the problem that I'm actually trying to solve here, and at least an order of magnitude more complex. If you make it sound like that's the project that I'm doing, then of course it's going to sound frighteningly complicated. But I'm not. I'm sort of baffled by your concern on this point. Sure, the lock manager is a complex and performance-sensitive piece of code, and we have to be careful not to break it or make it slower. But we're not talking about launching a rocket to Mars here; we're just talking about making the lock manager interface look the same way to two cooperating backends issuing lock requests in parallel that it looks to a single backend making those requests serially. I don't think it's noticeably harder than getting Hot Standby conflict resolution work, or getting historical snapshots working for logical decoding, or making the visibility map crash-safe - in fact, I'd bet on it being substantially easier than the first two of those, because all the logic is inside a single, relatively simple backend subsystem. Why we'd want to minimize complexity there at the cost of spreading it across half the backend is mysterious to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Oct 31, 2014 at 3:32 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> > So just refusing parallelism as soon as anything has taken an access >> > exclusive lock doesn't sound too bad to me. >> >> That restriction seems onerous to me; for example, it would prevent a >> parallel sort for CLUSTER or a parallel index rebuild for VACUUM FULL. >> Those seems like worthy targets for parallelism, and maybe even early >> targets, since I expect a lot of the real complexity here to be in the >> query planner, and you can avoid most of that complexity when planning >> DDL. > > Ugh. I think that's aiming too high. You'll suddenly need to share cche > invalidations between the backends. I think we should start by requiring > that there's no DDL done *at all* in the transaction that starts the > parallel activity. For CREATE INDEX PARALLEL that can be done by reusing > the logic we have for CONCURRENTLY, thereby moving the parallelized part > into a transaction that hasn't done any DDL yet. I don't think that's correct. We only need to process local invalidation messages after CommandCounterIncrement(), which I anticipate prohibiting during parallel execution (short thought should convince you that anything else is completely nuts). So I think there's no real problem with invalidation messages being generated during parallel execution. I wouldn't anticipate supporting that in V1, because they'd have to be propagated back to the user backend prior to commit, but in the longer term it seems pretty feasible. >> I also think it's unnecessary. It's wrong to think of two parallel >> backends that both want AccessExclusiveLock on a given target relation >> as conflicting with each other; if the process were not running in >> parallel, those lock requests would both have come from the same >> process and both requests would have been granted. > > I don't think that's a realistic view. There's lots of backend private > state around. If we try to make all that coherent between backends we'll > fail. I agree that there's a lot of backend private state, and that's the principle problem with making this work at all. The challenge here is to figure out which parts of that backend-private state we absolutely must make coherent between backends to enable parallelism to have a useful, non-trivial scope, and which parts we can punt on. I have ideas about that - which are mostly documented at https://wiki.postgresql.org/wiki/Parallel_Sort - but I'm trying to keep an open mind on the details. >> Parallelism is >> generally not about making different things happen; it's about >> spreading the stuff that would happen anyway across multiple backends, >> and things that would succeed if done in a single backend shouldn't >> fail when spread across 2 or more without some compelling >> justification. The cases where it's actually unsafe for a table to be >> accessed even by some other code within the same backend are handled >> not by the lock manager, but by CheckTableNotInUse(). That call >> should probably fail categorically if issued while parallelism is >> active. > > Which will e.g. prohibit CLUSTER and VACUUM FULL. I don't think so. I think it's safe to get to the point where we check that the table is not in use, then do a parallel scan and sort, then collect the results and write them out. But I would not try to make it safe, for example, to be scanning a table and have some user-defined function invoked along the way to go try to do anything that involves CheckTableNotInUse(). Many of those would fall over anyway because of PreventTransactionChain(), but I've got no problem nailing that hatch shut twice. And conversely, I would expect we'd disallow the sort operators invoked by CLUSTER from reaching any code guarded by CheckTableNotInUse(). >> Suppose somebody fires off a parallel sort on a text column, or a >> parallel sequential scan involving a qual of the form textcol = 'zog'. >> We launch a bunch of workers to do comparisons; they do lookups >> against pg_collation. After some but not all of them have loaded the >> collation information they need into their local cache, the DBA types >> "cluster pg_collate". It queues up for an AccessExclusiveLock. The >> remaining parallel workers join the wait queue waiting for their >> AccessShareLock. The system is now deadlocked; the only solution is to >> move the parallel workers ahead of the AccessExclusiveLock request, >> but the deadlock detector won't do that unless it knows about the >> relationship among the parallel workers. > > I think you would need to make the case more complex - we only hold > locks on system object for a very short time, and mostly not in a nested > fashion. Hmm. You're right. If the parallel backends release their locks quickly, the CLUSTER could run and then the parallel job could finish. But it sounds as if you get the idea. > But generally I think it's absolutely perfectly alright to fail in such > case. We need to fail reliably, but *none* of this is a new hazard. You > can have pretty similar issues today, without any parallelism. I think it's absolutely fine for it to *fail*. I think it's not OK for it to *hang* without triggering the deadlock detector. > Well, that's why I think we should teach the deadlock detector to catch > these kind of cases. That doesn't sound particularly hard. I could not agree more! In fact, that's the whole point of the patch. > I'm sorry to be a bit pessimistic here. But my intuition says that > starting to do group locking opens a can of worms that'll take us a long > time to close again. Maybe I'm just imagining complexity that won't > actually be there. But I have a hard time believing that. What's the distinction between "teach the deadlock detector to catch these kind of cases" and "group locking"? Because I think those are at least 90% the same thing. > I wonder if we couldn't implement a 'halfway' by allowing parallel > workers to jump the lockqueue if the parent process already has the > lock. That'd only work for nonconflicting locks, but I think that's > actually good. The patch takes precisely that approach; that part of the logic is already implemented. I'd like to make it a bit more robust to handle cases where the deadlock detector gets involved, as I currently believe that (a) can be done without unmanageable complexity, (b) will make programming in parallel mode much simpler, and (c) will significantly decrease the chances of an angry user or fellow-committer tracking me down and beating me to death with a keen-edged bug report. > That's in the current prototype you sent or something newer you have > privately? The former. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-10-31 15:56:42 -0400, Robert Haas wrote: > On Fri, Oct 31, 2014 at 3:32 PM, Andres Freund <andres@2ndquadrant.com> wrote: > >> > So just refusing parallelism as soon as anything has taken an access > >> > exclusive lock doesn't sound too bad to me. > >> > >> That restriction seems onerous to me; for example, it would prevent a > >> parallel sort for CLUSTER or a parallel index rebuild for VACUUM FULL. > >> Those seems like worthy targets for parallelism, and maybe even early > >> targets, since I expect a lot of the real complexity here to be in the > >> query planner, and you can avoid most of that complexity when planning > >> DDL. > > > > Ugh. I think that's aiming too high. You'll suddenly need to share cche > > invalidations between the backends. I think we should start by requiring > > that there's no DDL done *at all* in the transaction that starts the > > parallel activity. For CREATE INDEX PARALLEL that can be done by reusing > > the logic we have for CONCURRENTLY, thereby moving the parallelized part > > into a transaction that hasn't done any DDL yet. > > I don't think that's correct. We only need to process local > invalidation messages after CommandCounterIncrement(), which I > anticipate prohibiting during parallel execution (short thought should > convince you that anything else is completely nuts). It is more complex, even without CCI. As long as you're doing anything inside a transaction that already has done DDL, you'll have to play nasty tricks. Imagine the primary backend has done a BEGIN; CLUSTER pg_class; and then then starts a child backend. Which will get the same snapshot, combocids, yada yada. But it *also* will have preexisting cache entries. Those you need to invalidate before it can do anything correct. > > I'm sorry to be a bit pessimistic here. But my intuition says that > > starting to do group locking opens a can of worms that'll take us a long > > time to close again. Maybe I'm just imagining complexity that won't > > actually be there. But I have a hard time believing that. > > What's the distinction between "teach the deadlock detector to catch > these kind of cases" and "group locking"? Because I think those are > at least 90% the same thing. I understand under 'group locking' that a primary/second backends can coown a lock that normally is self-exclusive. That's a fair bit more than adding an edge to the deadlock graph between primary/secondary backends to make the deadlock detector recognize problems. What I have serious doubts about is 'coowning' locks. Especially if two backends normally wouldn't be able to both get that lock. > > I wonder if we couldn't implement a 'halfway' by allowing parallel > > workers to jump the lockqueue if the parent process already has the > > lock. That'd only work for nonconflicting locks, but I think that's > > actually good. > > The patch takes precisely that approach; that part of the logic is > already implemented. Well, my point is that if the solution is just to jump the queue, there really isn't any data structure changes needed. Secondary acquirers just need to check whether a lock is already owned by the primary and then acquire the lock in the absolutely normal way - with a completely separate entry. Only that they ignored the queue. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, Oct 31, 2014 at 4:10 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> I don't think that's correct. We only need to process local >> invalidation messages after CommandCounterIncrement(), which I >> anticipate prohibiting during parallel execution (short thought should >> convince you that anything else is completely nuts). > > It is more complex, even without CCI. As long as you're doing anything > inside a transaction that already has done DDL, you'll have to play > nasty tricks. Imagine the primary backend has done a BEGIN; CLUSTER > pg_class; and then then starts a child backend. Which will get the same > snapshot, combocids, yada yada. But it *also* will have preexisting > cache entries. Those you need to invalidate before it can do anything > correct. Where will those preexisting cache entries come from, exactly? The postmaster is forking the parallel worker, not the user backend. >> > I'm sorry to be a bit pessimistic here. But my intuition says that >> > starting to do group locking opens a can of worms that'll take us a long >> > time to close again. Maybe I'm just imagining complexity that won't >> > actually be there. But I have a hard time believing that. >> >> What's the distinction between "teach the deadlock detector to catch >> these kind of cases" and "group locking"? Because I think those are >> at least 90% the same thing. > > I understand under 'group locking' that a primary/second backends can > coown a lock that normally is self-exclusive. That's a fair bit more > than adding an edge to the deadlock graph between primary/secondary > backends to make the deadlock detector recognize problems. I guess. It seems like a pretty minor extension to me. Getting the deadlock detector to do the right thing is the hard part. > What I have serious doubts about is 'coowning' locks. Especially if two > backends normally wouldn't be able to both get that lock. Perhaps we should discuss that more. To me it seems pretty obvious that's both safe and important. >> > I wonder if we couldn't implement a 'halfway' by allowing parallel >> > workers to jump the lockqueue if the parent process already has the >> > lock. That'd only work for nonconflicting locks, but I think that's >> > actually good. >> >> The patch takes precisely that approach; that part of the logic is >> already implemented. > > Well, my point is that if the solution is just to jump the queue, there > really isn't any data structure changes needed. Secondary acquirers just > need to check whether a lock is already owned by the primary and then > acquire the lock in the absolutely normal way - with a completely > separate entry. Only that they ignored the queue. Well, they need to be able to identify who is in their group. You could possibly do that without any changes at all to the lock manager data structures, but it seems a bit complex. I took the approach of storing the group leader's PGPROC * in each PROCLOCK, which makes it trivial (and is enough information for the deadlock detector to work correctly, too). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 31 October 2014 19:36, Robert Haas <robertmhaas@gmail.com> wrote: >> It's an obscure case and its not the only solution either. > > I don't think that's an obscure situation at all. Do you really think > a patch that could cause an attempt to VACUUM FULL a system catalog to > suffer an undetected deadlock meets this community's quality > standards? Because that's what we're talking about. Nobody has said that allowing undetected deadlocks is acceptable, so your other comments are void. I've suggested *stricter* locking, which would obviously allow deadlock detection. You recognised that by claiming that the locking I had proposed was actually too strict, which is where the above example came from. Yes, I have proposed stricter locking, but as explained, the only things this would slow down are catalog VAC FULLs, which are already a problem. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 31 October 2014 18:47, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Oct 31, 2014 at 11:02 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> You mentioned earlier that functions would need to be marked proisparallel etc.. >> >> What conditions will that be protecting against? If we aren't going to >> support the general case where every single thing works, can we at >> least discuss what the list of cases is that we will support. > > In general, any operation that relies on backend-private state not > shared with a cooperating backend isn't parallel-safe. For example, Yes, the principle is easy to understand, but that was not the question. You are saying that placing restrictions on which functions can execute is necessary, yet at the same time saying that we must have generalised locking for workers because restrictions on locking are not acceptable. I don't point that out because I want an argument or to prove a point, but because there are important things on the table here. What are the specific restrictions you are suggesting we place on parallel workers? We need that definition clear so we can decide how to mark the functions. If we don't know, which is easily understandable, then the best way to find that out is to build a restricted solution and to expand slowly outwards to find problems. An obvious milestone there is whether a function contains SQL, which is the point chosen by some other databases. I personally hope to expand upon that, but it would be useful to reach it first and then continue from there. I was already thinking of implementing CONTAINS NO SQL as a SQL Standard function marking since it can be used to optimize COPY further. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2014-10-31 21:25:02 -0400, Robert Haas wrote: > On Fri, Oct 31, 2014 at 4:10 PM, Andres Freund <andres@2ndquadrant.com> wrote: > >> I don't think that's correct. We only need to process local > >> invalidation messages after CommandCounterIncrement(), which I > >> anticipate prohibiting during parallel execution (short thought should > >> convince you that anything else is completely nuts). > > > > It is more complex, even without CCI. As long as you're doing anything > > inside a transaction that already has done DDL, you'll have to play > > nasty tricks. Imagine the primary backend has done a BEGIN; CLUSTER > > pg_class; and then then starts a child backend. Which will get the same > > snapshot, combocids, yada yada. But it *also* will have preexisting > > cache entries. Those you need to invalidate before it can do anything > > correct. > > Where will those preexisting cache entries come from, exactly? The > postmaster is forking the parallel worker, not the user backend. Several things: 1) The relcache init files load a fair bit 2) There's cache entries made just during startup while we look up user information and such 3) There's lots of places you can hook into where it's perfectly legal and expected that you access system caches. I don'tthink you can reasonably define those away. 4) I'm pretty sure that one of the early next steps will be to reuse a bgworker for multiple tasks. So there'll definitelybe preexisting cache entries. I'm pretty sure that it's impossible to define the problem away. We *might* be able to say that we'll just do a InvalidateSystemCaches() at start of the parallelized task. > >> > I'm sorry to be a bit pessimistic here. But my intuition says that > >> > starting to do group locking opens a can of worms that'll take us a long > >> > time to close again. Maybe I'm just imagining complexity that won't > >> > actually be there. But I have a hard time believing that. > >> > >> What's the distinction between "teach the deadlock detector to catch > >> these kind of cases" and "group locking"? Because I think those are > >> at least 90% the same thing. > > > > I understand under 'group locking' that a primary/second backends can > > coown a lock that normally is self-exclusive. That's a fair bit more > > than adding an edge to the deadlock graph between primary/secondary > > backends to make the deadlock detector recognize problems. > > I guess. It seems like a pretty minor extension to me. Getting the > deadlock detector to do the right thing is the hard part. I doubt it. There's a whole bunch of problems that just exist by virtue of having a group leader. What if the leader dies but the worker backend isn't in a section of code that does a CHECK_FOR_INTERRUPTS()? I also still have serious doubts about allowing self exclusive locks to be shared between backends. That's going to cause pain. And that's pretty much the only reason why you need to share entries. > > What I have serious doubts about is 'coowning' locks. Especially if two > > backends normally wouldn't be able to both get that lock. > > Perhaps we should discuss that more. To me it seems pretty obvious > that's both safe and important. I completely fail to see why you'd generally think it's safe for two backends to hold the same self conflicting lock. Yes, within carefully restricted sections of code that might be doable. But I don't think you can fully enforce that. People *will* mark their own functions at parallel safe. And do stupid stuff. That shouldn't cause segfaults/exploits/low level data corruption/.. as long as it's done from !C functions. The aforementioned issue of how to deal with the leader dying while the other backend still progresses also is nontrivial in my opinion. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Nov 1, 2014 at 1:55 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> Where will those preexisting cache entries come from, exactly? The >> postmaster is forking the parallel worker, not the user backend. > > Several things: > 1) The relcache init files load a fair bit > 2) There's cache entries made just during startup while we look up user > information and such > 3) There's lots of places you can hook into where it's perfectly legal > and expected that you access system caches. I don't think you can > reasonably define those away. > 4) I'm pretty sure that one of the early next steps will be to reuse a > bgworker for multiple tasks. So there'll definitely be preexisting > cache entries. > > I'm pretty sure that it's impossible to define the problem away. We > *might* be able to say that we'll just do a InvalidateSystemCaches() at > start of the parallelized task. I see. I haven't thought deeply about those things, and there could indeed be problems there. >> > What I have serious doubts about is 'coowning' locks. Especially if two >> > backends normally wouldn't be able to both get that lock. >> >> Perhaps we should discuss that more. To me it seems pretty obvious >> that's both safe and important. > > I completely fail to see why you'd generally think it's safe for two > backends to hold the same self conflicting lock. Yes, within carefully > restricted sections of code that might be doable. But I don't think you > can fully enforce that. People *will* mark their own functions at > parallel safe. And do stupid stuff. That shouldn't cause > segfaults/exploits/low level data corruption/.. as long as it's done > from !C functions. I agree that people will mark their own functions as parallel-safe, and I also agree that shouldn't cause segfaults, exploits, or low-level data corruption. Clearly, we'll need to impose a bunch of restrictions to achieve that. But if you put a LOCK TABLE statement in an SQL function, it won't conflict with other locking request made by the same or another function elsewhere in the same transaction; surely you don't want that behavior to be *different* in parallel mode. That would be a blatant, user-visible behavior difference justified by ... nothing at all. There's no hazard there. Where you start getting into crash/exploit/data corruption territory is when you are talking about DDL operations that change the physical structure of the table. That's why we have stuff like CheckTableNotInUse() to verify that, for example, there are no old cursors around that are still expecting the old relfilenode and tuple descriptor to be valid. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Nov 1, 2014 at 1:55 PM, Andres Freund <andres@2ndquadrant.com> wrote: > I doubt it. There's a whole bunch of problems that just exist by virtue > of having a group leader. What if the leader dies but the worker backend > isn't in a section of code that does a CHECK_FOR_INTERRUPTS()? In between all of the heat about whether I'm writing the correct patch, or whether I'm overall making progress fast enough, somebody could, I don't know, read the patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-11-01 17:00:59 -0400, Robert Haas wrote: > On Sat, Nov 1, 2014 at 1:55 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > I doubt it. There's a whole bunch of problems that just exist by virtue > > of having a group leader. What if the leader dies but the worker backend > > isn't in a section of code that does a CHECK_FOR_INTERRUPTS()? > > In between all of the heat about whether I'm writing the correct > patch, or whether I'm overall making progress fast enough, somebody > could, I don't know, read the patch. I plan to, but that takes a fair amount of time. And there's at least one patch of yours ahead in the queue... And I do think that most of the concerns are more general than a specific implementation. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Nov 1, 2014 at 1:09 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > What are the specific restrictions you are suggesting we place on > parallel workers? We need that definition clear so we can decide how > to mark the functions. If we don't know, which is easily > understandable, then the best way to find that out is to build a > restricted solution and to expand slowly outwards to find problems. > > An obvious milestone there is whether a function contains SQL, which > is the point chosen by some other databases. I personally hope to > expand upon that, but it would be useful to reach it first and then > continue from there. > > I was already thinking of implementing CONTAINS NO SQL as a SQL > Standard function marking since it can be used to optimize COPY > further. I've written some fairly extensive thoughts on this topic at http://wiki.postgresql.org/wiki/Parallel_Sort Personally, I tend to focus more on what needs to be allowed in parallel workers than what needs to be prohibited. The motivation for this patch is simple: 1. Any non-trivial piece of PostgreSQL code is likely to contain syscache lookups. 2. Syscache lookups had better work in parallel workers, or they'll be all but useless. 3. Some of those syscache lookups will be satisfied from cache, but sometimes we'll need to actually read the system catalogs. 4. That requires locks. 5. Gosh, we'd better not deadlock. There's a lot of infrastructure that I think will work fine in parallel mode without any changes. Buffer locking, buffer pins, buffer lookups, and lwlocks, for example, I expect to all just work without any real. But heavyweight locks and invalidation messages seem like they'll need a bit of surgery to work correctly in this environment. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Nov 1, 2014 at 5:06 PM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-11-01 17:00:59 -0400, Robert Haas wrote: >> On Sat, Nov 1, 2014 at 1:55 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> > I doubt it. There's a whole bunch of problems that just exist by virtue >> > of having a group leader. What if the leader dies but the worker backend >> > isn't in a section of code that does a CHECK_FOR_INTERRUPTS()? >> >> In between all of the heat about whether I'm writing the correct >> patch, or whether I'm overall making progress fast enough, somebody >> could, I don't know, read the patch. > > I plan to, but that takes a fair amount of time. And there's at least > one patch of yours ahead in the queue... And I do think that most of the > concerns are more general than a specific implementation. Fair enough. Well, the way I solved that specific problem is to allow the group leader to terminate but not to return its PGPROC to the free list until the last group member exits. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-11-01 16:59:34 -0400, Robert Haas wrote: > > I completely fail to see why you'd generally think it's safe for two > > backends to hold the same self conflicting lock. Yes, within carefully > > restricted sections of code that might be doable. But I don't think you > > can fully enforce that. People *will* mark their own functions at > > parallel safe. And do stupid stuff. That shouldn't cause > > segfaults/exploits/low level data corruption/.. as long as it's done > > from !C functions. > > I agree that people will mark their own functions as parallel-safe, > and I also agree that shouldn't cause segfaults, exploits, or > low-level data corruption. Clearly, we'll need to impose a bunch of > restrictions to achieve that. > But if you put a LOCK TABLE statement > in an SQL function, it won't conflict with other locking request made > by the same or another function elsewhere in the same transaction; > surely you don't want that behavior to be *different* in parallel > mode. That would be a blatant, user-visible behavior difference > justified by ... nothing at all. That's just a mislabeled function. It's obviously not parallel safe at all. I see absolutely no problem with erroring out. > There's no hazard there. Where you > start getting into crash/exploit/data corruption territory is when you > are talking about DDL operations that change the physical structure of > the table. That's why we have stuff like CheckTableNotInUse() to > verify that, for example, there are no old cursors around that are > still expecting the old relfilenode and tuple descriptor to be valid. It's not just fully structural changes although they are a concern. It's also that we've amassed a number of hacks to deal with local state that just won't be nicely transported. What's with stuff like RelationSetIndexList() (which is infrequently enough used to not be a big problem in practice...)? If we only allow parallel access while independent backends could acquire the relevant we can rely on us already having taken care about the concurrency hazards. Otherwise not. And before you say that'd prohibit a number of useful things - yes, it might. But e.g. CREATE INDEX can still be parallelized. The primary backend takes a Exclusive/ShareUpdateExclusive (depending on CONCURRENTLY), workers just a AccessShare. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Nov 1, 2014 at 5:38 PM, Andres Freund <andres@2ndquadrant.com> wrote: > That's just a mislabeled function. It's obviously not parallel safe at > all. I see absolutely no problem with erroring out. I disagree. It's entirely parallel-safe, as long as you don't arbitrarily decide to have the lock manager break it. >> There's no hazard there. Where you >> start getting into crash/exploit/data corruption territory is when you >> are talking about DDL operations that change the physical structure of >> the table. That's why we have stuff like CheckTableNotInUse() to >> verify that, for example, there are no old cursors around that are >> still expecting the old relfilenode and tuple descriptor to be valid. > > It's not just fully structural changes although they are a concern. > It's also that we've amassed a number of hacks to deal with local state > that just won't be nicely transported. What's with stuff like > RelationSetIndexList() (which is infrequently enough used to not be a > big problem in practice...)? If we only allow parallel access while > independent backends could acquire the relevant we can rely on us > already having taken care about the concurrency hazards. Otherwise not. RelationSetIndexList() is only used inside REINDEX, which I think illustrates my point that it's mostly DDL we need to be worried about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 15 October 2014 05:03, Robert Haas <robertmhaas@gmail.com> wrote: > At least to me, that simple scenario is clear-cut[1], but what do we > do in more complicated situations? For example, suppose backends A > and B are members of the same locking group. A locks a relation with > AccessShareLock, an unrelated process X queues up waiting for an > AccessExclusiveLock, and then B also requests AccessShareLock. The > normal behavior here is that B should wait for X to go first, but here > that's a problem. If A is the user backend and B is a worker backend, > A will eventually wait for B, which is waiting for X, which is waiting > for A: deadlock. Yes, deadlock. My understanding would be that the lead process would wait on a latch, not a heavyweight lock. So it would never perform a deadlock detection. Which leaves only X and B to perform the deadlock check. Are you aware that the deadlock detector will reorder the lock queue, if that presents a possible solution to the deadlock? Would the above example not be resolved simply with the existing code? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2 November 2014 10:39, Simon Riggs <simon@2ndquadrant.com> wrote: > On 15 October 2014 05:03, Robert Haas <robertmhaas@gmail.com> wrote: > >> At least to me, that simple scenario is clear-cut[1], but what do we >> do in more complicated situations? For example, suppose backends A >> and B are members of the same locking group. A locks a relation with >> AccessShareLock, an unrelated process X queues up waiting for an >> AccessExclusiveLock, and then B also requests AccessShareLock. The >> normal behavior here is that B should wait for X to go first, but here >> that's a problem. If A is the user backend and B is a worker backend, >> A will eventually wait for B, which is waiting for X, which is waiting >> for A: deadlock. > > Yes, deadlock. > > My understanding would be that the lead process would wait on a latch, > not a heavyweight lock. So it would never perform a deadlock > detection. Which leaves only X and B to perform the deadlock check. > > Are you aware that the deadlock detector will reorder the lock queue, > if that presents a possible solution to the deadlock? > > Would the above example not be resolved simply with the existing code? Self Answer: Only if the lock request would not be self-conflicting. Reading the patch (yay!) you say * ...However, if a conflicting lock is present that is * not held by another member of our lock group, then we can't do this. * In that case we'll have to wait despite the deadlock risk and hope for * the best. I wasn't sure what that meant. Actually, I like the rest of the patch, assuming I understand it. Not sure that changes anything I've said about wanting to see something actually working, even as a prototype. For example, how will you test the code in this patch actually works? The best way is to make parallel query work in a restricted form and then drop that in isolation tester, otherwise you'll have to waste time on a test harness. The procgloballist stuff should be the subject of a separate patch which I agree with. The main question is still why any of this is necessary? If we are only acquiring fast path locks in workers, what need is there for any of this? The leader and workers never conflict with each other, so why would they ever wait for each other (except at a point where they are actively transferring data), which is the necessary pre-condition for a deadlock? Yes, running a command with a conflicting lock would disrupt the start of a parallel operation, but as I said above, the deadlock detector will eventually see that and re-arrange us so there is no deadlock. The only need I can see is if one of the workers/leader requests a Strong lock on an object, someone else requests a conflicting lock on that object and then we perform a parallel operation ourselves. We could easily solve that by saying that you can't go parallel if you hold a strong lock AND that workers can't take anything higher than RowShareLock, like HS backends. That's good enough for now. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sat, Nov 1, 2014 at 9:09 PM, Robert Haas <robertmhaas@gmail.com> wrote: > 1. Any non-trivial piece of PostgreSQL code is likely to contain > syscache lookups. > 2. Syscache lookups had better work in parallel workers, or they'll be > all but useless. I've been using parallel sorts and index builds in my mental model of how this will be used. I note that sorts go out of their way to look up all the syscache entries in advance precisely so that tuplesort doesn't start doing catalog lookups in the middle of the sort. In general I think what people are imagining is that the parallel workers will be running low-level code like tuplesort that has all the databasey stuff like catalog lookups done in advance and just operates on C data structures like function pointers. And I think that's a valuable coding discipline to enforce, it avoids having low level infrastructure calling up to higher level abstractions which quickly becomes hard to reason about. However in practice I think you're actually right -- but not for the reasons you've been saying. I think the parallel workers *should* be written as low level infrastructure and not be directly doing syscache lookups or tuple locking etc. However there are a million ways in which Postgres is extensible which causes loops in the call graph that aren't apparent in the direct code structure. For instance, what happens if the index you're building is an expression index or partial index? Worse, what happens if those expressions have a plpython function that does queries using SPI.... But those are the kinds of user code exploiting extensibility are the situations where we need a deadlock detector and where you might need this infrastructure. We wouldn't and shouldn't need a deadlock detector for our own core server code. In an ideal world some sort of compromise that enforces careful locking rules where all locks are acquired in advance and parallel workers are prohibited from obtaining locks in the core code while still allowing users to a free-for-all and detecting deadlocks at runtime for them would be ideal. But I'm not sure there's any real middle ground here. -- greg
Greg Stark <stark@mit.edu> writes: > On Sat, Nov 1, 2014 at 9:09 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> 2. Syscache lookups had better work in parallel workers, or they'll be >> all but useless. > I've been using parallel sorts and index builds in my mental model of > how this will be used. I note that sorts go out of their way to look > up all the syscache entries in advance precisely so that tuplesort > doesn't start doing catalog lookups in the middle of the sort. Well, they try to avoid doing the lookups more than once, but that does not by any means imply that no lookups happen after the sort starts. In particular, if you're sorting a container type (array, record) there will be lookups during the first call of the sort type's comparison function. Enums could cause catalog lookups much later than the first call, too. I'm with Robert: if you cannot do catalog lookups in a worker process, the feature will be so crippled as to be essentially useless. regards, tom lane
On Mon, Nov 3, 2014 at 10:18 AM, Greg Stark <stark@mit.edu> wrote: > On Sat, Nov 1, 2014 at 9:09 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> 1. Any non-trivial piece of PostgreSQL code is likely to contain >> syscache lookups. >> 2. Syscache lookups had better work in parallel workers, or they'll be >> all but useless. > > I've been using parallel sorts and index builds in my mental model of > how this will be used. I note that sorts go out of their way to look > up all the syscache entries in advance precisely so that tuplesort > doesn't start doing catalog lookups in the middle of the sort. This isn't really true. Routines like varstr_cmp() do syscache lookups and de-TOASTing. We've recently tried to reduce that with the sortsupport stuff, but it hasn't eliminated it completely, and there are other cases. This is really the key point that I think everybody needs to understand: a lot of code that you think doesn't do anything complicated actually reads from the database - generally either in the form of syscache lookups, or in the form of de-TOASTing. Most of that code doesn't perform those operations *frequently*, but that's because there are backend-private caches that minimize the number of times we actually hit the database. But restructuring all of that code to do no database access whatsoever would be a very difficult proposition and would involve painful sacrifices, which is, I imagine, why it is the way it is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Nov 2, 2014 at 7:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > The main question is still why any of this is necessary? If we are > only acquiring fast path locks in workers, what need is there for any > of this? The leader and workers never conflict with each other, so why > would they ever wait for each other (except at a point where they are > actively transferring data), which is the necessary pre-condition for > a deadlock? > > Yes, running a command with a conflicting lock would disrupt the start > of a parallel operation, but as I said above, the deadlock detector > will eventually see that and re-arrange us so there is no deadlock. > > The only need I can see is if one of the workers/leader requests a > Strong lock on an object, someone else requests a conflicting lock on > that object and then we perform a parallel operation ourselves. We > could easily solve that by saying that you can't go parallel if you > hold a strong lock AND that workers can't take anything higher than > RowShareLock, like HS backends. That's good enough for now. That would be enough to keep the parallel group from deadlocking with itself, but it wouldn't be enough to prevent deadlock with other processes, which is the scenario that actually worries me a lot more. In particular, I'm worried about deadlocks where some other process gets sandwiched between two processes from the same parallel group. That is, let A1 be a parallel group leader and A2 be a process from the same parallel group, and let B be an unrelated process. I'm worried about the situation where A2 waits for B which waits for A1 which (outside the view of the deadlock detector) waits for A2. All that can happen even if the parallel backends hold no strong locks, because B can hold strong locks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3 November 2014 17:00, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Nov 2, 2014 at 7:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> The main question is still why any of this is necessary? If we are >> only acquiring fast path locks in workers, what need is there for any >> of this? The leader and workers never conflict with each other, so why >> would they ever wait for each other (except at a point where they are >> actively transferring data), which is the necessary pre-condition for >> a deadlock? >> >> Yes, running a command with a conflicting lock would disrupt the start >> of a parallel operation, but as I said above, the deadlock detector >> will eventually see that and re-arrange us so there is no deadlock. >> >> The only need I can see is if one of the workers/leader requests a >> Strong lock on an object, someone else requests a conflicting lock on >> that object and then we perform a parallel operation ourselves. We >> could easily solve that by saying that you can't go parallel if you >> hold a strong lock AND that workers can't take anything higher than >> RowShareLock, like HS backends. That's good enough for now. > > That would be enough to keep the parallel group from deadlocking with > itself, but it wouldn't be enough to prevent deadlock with other > processes, which is the scenario that actually worries me a lot more. > In particular, I'm worried about deadlocks where some other process > gets sandwiched between two processes from the same parallel group. > That is, let A1 be a parallel group leader and A2 be a process from > the same parallel group, and let B be an unrelated process. I'm > worried about the situation where A2 waits for B which waits for A1 > which (outside the view of the deadlock detector) waits for A2. All > that can happen even if the parallel backends hold no strong locks, > because B can hold strong locks. OK, I think I'm happy with this as a general point. How will we automatically test this code? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Mon, Nov 3, 2014 at 1:02 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > OK, I think I'm happy with this as a general point. Cool! > How will we automatically test this code? Good question. I can see two possible approaches: 1. We write a test_group_locking harness along the lines of test_shm_mq and test_decoding and put that in contrib. 2. We wait until higher-level facilities built on top of this are available and get regression test coverage of this code via those higher-level modules. Personally, I can't imagine debugging this code without writing some kind of test harness that only does locking; I don't want my locking bugs to be mixed with my planner and executor bugs. But I could go either way on actually putting that code in contrib. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Nov 2, 2014 at 7:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > The procgloballist stuff should be the subject of a separate patch > which I agree with. Yes, I think that's probably a net improvement in robustness quite apart from what we decide to do about any of the rest of this. I've attached it here as revise-procglobal-tracking.patch and will commit that bit if nobody objects. The remainder is reattached without change as group-locking-v0.1.patch. Per your other comment, I've developed the beginnings of a testing framework which I attached here as test_group_locking-v0.patch. That doesn't look to have much hope of evolving into something we'd want even in contrib, but I think it'll be rather useful for debugging. It works like this: rhaas=# create table foo (a int); CREATE TABLE rhaas=# select test_group_locking('1.0:start,2.0:start,1.0:lock:AccessExclusiveLock:foo,2.0:lock:AccessExclusiveLock:foo'); NOTICE: starting worker 1.0 NOTICE: starting worker 2.0 NOTICE: instructing worker 1.0 to acquire AccessExclusiveLock on relation with OID 16387 NOTICE: instructing worker 2.0 to acquire AccessExclusiveLock on relation with OID 16387 ERROR: could not obtain AccessExclusiveLock on relation with OID 16387 CONTEXT: background worker, group 2, task 0 The syntax is a little arcane, I guess, but it's "documented" in the comments within. In this case I asked it to start up two background workers and have them both try to take AccessExclusiveLock on table foo. As expected, the second one fails. The idea is that workers are identified by a pair of numbers X.Y; two workers with the same X-value are in the same locking group. So if I call the second worker 1.1 rather than 2.0, it'll join the same locking group as worker 1.0 and ... then it does the wrong thing, and then it crashes the server, because my completely-untested code is unsurprisingly riddled with bugs. Eventually, this needs to be generalized a bit so that we can use it to test deadlock detection. That's tricky, because what you really want to do is tell worker A to wait for some lock and then, once you're sure it's on the wait queue, tell worker B to go take some other lock and check that you see the resulting deadlock. There doesn't seem to be a good API for the user backend to find out whether some background worker is waiting for some particular lock, so I may have to resort to the hacky expedient of having the driver process wait for a few seconds and assume that's long enough that the background worker will be on the wait queue by then. Or maybe I can drum up some solution, but anyway it's not done yet. The value of this test code is that we can easily reproduce locking scenarios which would be hard to reproduce in a real workload - e.g. because they're timing-dependent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Wed, Nov 5, 2014 at 9:26 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Yes, I think that's probably a net improvement in robustness quite > apart from what we decide to do about any of the rest of this. I've > attached it here as revise-procglobal-tracking.patch and will commit > that bit if nobody objects. The remainder is reattached without > change as group-locking-v0.1.patch. > > Per your other comment, I've developed the beginnings of a testing > framework which I attached here as test_group_locking-v0.patch. Urk. I attached a version with some stupid bugs. Here's a slightly better one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On 11/5/14, 8:26 PM, Robert Haas wrote: > rhaas=# create table foo (a int); > CREATE TABLE > rhaas=# select test_group_locking('1.0:start,2.0:start,1.0:lock:AccessExclusiveLock:foo,2.0:lock:AccessExclusiveLock:foo'); > NOTICE: starting worker 1.0 > NOTICE: starting worker 2.0 > NOTICE: instructing worker 1.0 to acquire AccessExclusiveLock on > relation with OID 16387 > NOTICE: instructing worker 2.0 to acquire AccessExclusiveLock on > relation with OID 16387 Damn that's cool to see! :) FWIW, some creative use of a composite type, plpgsql and string_to_array() would have saved a bunch of parsing code... it'ssurprisingly easy to parse stuff like this using string_to_array(). -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com
On Fri, Oct 31, 2014 at 9:07 AM, Andres Freund <andres@2ndquadrant.com> wrote: > Maybe we can, as a first step, make those edges in the lock graph > visible to the deadlock detector? It's pretty clear that undetected > deadlocks aren't ok, but detectable deadlocks in a couple corner cases > might be acceptable. An interesting point came to mind this morning on this topic, while discussing this issue with Amit. Suppose process A1 holds a lock on some object and process A2, a member of the same locking group, waits for a lock on that same object. It follows that there must exist a process B which either *holds* or *awaits* a lock which conflicts with the one requested by A2; otherwise, A2 would have been granted the lock at once. If B *holds* the conflicting lock, it may eventually release it; but if B *awaits* the conflicting lock, we have a soft deadlock, because A2 waits for B waits for A1, which we presume to wait for A1.[1] The logical consequence of this is that, if a process seeks to acquire a lock which is already held (in any mode) by a fellow group-member, it had better jump over the entire wait queue and acquire the lock immediately if no conflicting locks have already been granted. If it fails to do this, it immediately creates a soft deadlock. What if there *are* conflicting locks already granted? In that case, things are more complicated. We could, for example, have this situation: A1 holds AccessShareLock, B holds ExclusiveLock, C1 awaits ShareLock, and then (following it in the wait queue) A2 awaits RowExclusiveLock. This is not a deadlock, because B can finish, then C can get the lock, then C1 can finish, then A2 can get the lock. But if C2 now waits for some member of group A, then we have a soft deadlock between group A and group C, which can be resolved by moving A2's request ahead of C1. (This example is relatively realistic, too: a lock upgrade from AccessShareLock to RowExclusiveLock is probably the most common type of lock upgrade, for reasons that are hopefully obvious.) In practice, I suspect it's best to take a more aggressive approach and have A2 jump straight to the head of the line right out of the chute. Letting some parallel workers make progress while holding up others out of a notion of locking fairness is probably stupid, because they're probably dependent on each other in some way that makes it useless for one to make progress while the others don't. For the same reason, I'd argue that we ought to wait until all pending lock requests from a given group can be satisfied before granting any of them. In fact, I suspect it's best in general for the locking machinery and the deadlock detector to view several simultaneous, pending lock requests as if they were a single request conflicting with everything that would conflict with any of the requested modes. Consider the situation where there are 10 lock waiters, 5 in each of two parallel groups. The deadlock detector could spend a good deal of time and energy trying various reorderings of the lock queue, but in practice it seems highly unlikely that it's sensible to do anything other than put all the locks from one group first and all of the locks from the other group afterwards, regardless of whether the processes want the same mode or different modes. Most of the other orderings will either be equivalent to one of those orderings (because commuting two non-conflicting locks in the wait queue has no effect) or soft deadlocks (because the interleave a conflicting lock request between two requests form the same group). And even if you can find some corner case where neither of those things is the case, it's probably still not useful to let half the lock group run and leave the other half blocked. What will probably happen is that some internal queue will pretty quickly either fill up or drain completely and the process filling it, or draining it, will wait. Now we're no better off than if we'd waited to grant the lock in the first place, and we're now holding more locks that can block other things or cause deadlocks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] Here I'm assuming, as in previous discussion, that we regard all members of a parallel group as mutually waiting for each other, on the theory that the parallel computation won't complete until all workers complete and, therefore, the members of the group are either already waiting for each other or eventually will. While this might not be true in every case, I believe it's a good (and generally valid) simplifying assumption.
On Fri, Nov 7, 2014 at 3:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Oct 31, 2014 at 9:07 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > Maybe we can, as a first step, make those edges in the lock graph
> > visible to the deadlock detector? It's pretty clear that undetected
> > deadlocks aren't ok, but detectable deadlocks in a couple corner cases
> > might be acceptable.
>
> An interesting point came to mind this morning on this topic, while
> discussing this issue with Amit. Suppose process A1 holds a lock on
> some object and process A2, a member of the same locking group, waits
> for a lock on that same object. It follows that there must exist a
> process B which either *holds* or *awaits* a lock which conflicts with
> the one requested by A2; otherwise, A2 would have been granted the
> lock at once. If B *holds* the conflicting lock, it may eventually
> release it; but if B *awaits* the conflicting lock, we have a soft
> deadlock, because A2 waits for B waits for A1, which we presume to
> wait for A1.[1]
>
> The logical consequence of this is that, if a process seeks to acquire
> a lock which is already held (in any mode) by a fellow group-member,
> it had better jump over the entire wait queue and acquire the lock
> immediately if no conflicting locks have already been granted. If it
> fails to do this, it immediately creates a soft deadlock. What if
> there *are* conflicting locks already granted? In that case, things
> are more complicated. We could, for example, have this situation: A1
> holds AccessShareLock, B holds ExclusiveLock, C1 awaits ShareLock, and
> then (following it in the wait queue) A2 awaits RowExclusiveLock.
> This is not a deadlock, because B can finish, then C can get the lock,
> then C1 can finish, then A2 can get the lock. But if C2 now waits for
> some member of group A, then we have a soft deadlock between group A
> and group C, which can be resolved by moving A2's request ahead of C1.
> (This example is relatively realistic, too: a lock upgrade from
> AccessShareLock to RowExclusiveLock is probably the most common type
> of lock upgrade, for reasons that are hopefully obvious.)
>
> In practice, I suspect it's best to take a more aggressive approach
> and have A2 jump straight to the head of the line right out of the
> chute. Letting some parallel workers make progress while holding up
> others out of a notion of locking fairness is probably stupid, because
> they're probably dependent on each other in some way that makes it
> useless for one to make progress while the others don't. For the same
> reason, I'd argue that we ought to wait until all pending lock
> requests from a given group can be satisfied before granting any of
> them.
>
> On Fri, Oct 31, 2014 at 9:07 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > Maybe we can, as a first step, make those edges in the lock graph
> > visible to the deadlock detector? It's pretty clear that undetected
> > deadlocks aren't ok, but detectable deadlocks in a couple corner cases
> > might be acceptable.
>
> An interesting point came to mind this morning on this topic, while
> discussing this issue with Amit. Suppose process A1 holds a lock on
> some object and process A2, a member of the same locking group, waits
> for a lock on that same object. It follows that there must exist a
> process B which either *holds* or *awaits* a lock which conflicts with
> the one requested by A2; otherwise, A2 would have been granted the
> lock at once. If B *holds* the conflicting lock, it may eventually
> release it; but if B *awaits* the conflicting lock, we have a soft
> deadlock, because A2 waits for B waits for A1, which we presume to
> wait for A1.[1]
>
> The logical consequence of this is that, if a process seeks to acquire
> a lock which is already held (in any mode) by a fellow group-member,
> it had better jump over the entire wait queue and acquire the lock
> immediately if no conflicting locks have already been granted. If it
> fails to do this, it immediately creates a soft deadlock. What if
> there *are* conflicting locks already granted? In that case, things
> are more complicated. We could, for example, have this situation: A1
> holds AccessShareLock, B holds ExclusiveLock, C1 awaits ShareLock, and
> then (following it in the wait queue) A2 awaits RowExclusiveLock.
> This is not a deadlock, because B can finish, then C can get the lock,
> then C1 can finish, then A2 can get the lock. But if C2 now waits for
> some member of group A, then we have a soft deadlock between group A
> and group C, which can be resolved by moving A2's request ahead of C1.
> (This example is relatively realistic, too: a lock upgrade from
> AccessShareLock to RowExclusiveLock is probably the most common type
> of lock upgrade, for reasons that are hopefully obvious.)
>
> In practice, I suspect it's best to take a more aggressive approach
> and have A2 jump straight to the head of the line right out of the
> chute. Letting some parallel workers make progress while holding up
> others out of a notion of locking fairness is probably stupid, because
> they're probably dependent on each other in some way that makes it
> useless for one to make progress while the others don't. For the same
> reason, I'd argue that we ought to wait until all pending lock
> requests from a given group can be satisfied before granting any of
> them.
I think this logic can sometimes block processes from acquiring a lock
which no one is holding. Assume Group-1 (G-1) is waiting on two locks
(Lock L1 on Table T1 and Lock L2 on Table T2) which are held by
unrelated processes P-2 and P-3 respectively; now P-3 releases the
lock on table T-2 and cannot grant it to G-1 who is waiting on this lock
because still L-1 lock cannot be granted to G-1. At this point the situation
is that no one holds lock on T-2 and G-1 waits for it, however any new
process won't be able to acquire lock on T-2 as there is already a waiter
for the same in wait queue. OTOH if we would have granted lock for
T-2 to G-1, it might have finished it's work and released the lock
(considering it is a short time lock on catalog relation).
If it is possible to have such a situation, then even though it is beneficial
for certain cases to view several pending requests from a same group
as single request, at the same time it can be harmful for some unrelated
processes as well.
On Mon, Nov 10, 2014 at 6:33 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > I think this logic can sometimes block processes from acquiring a lock > which no one is holding. Assume Group-1 (G-1) is waiting on two locks > (Lock L1 on Table T1 and Lock L2 on Table T2) which are held by > unrelated processes P-2 and P-3 respectively; now P-3 releases the > lock on table T-2 and cannot grant it to G-1 who is waiting on this lock > because still L-1 lock cannot be granted to G-1. That's not what I meant. I meant we shouldn't grant a lock *on a particular object* to any process on the group until all lock requests *for that object* can be granted. Waiting until every awaited lock in the system can be granted in all requested modes would be really hard to implement and probably have all kinds of problems (as you point out here). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, 2014-10-15 at 00:03 -0400, Robert Haas wrote: > For parallelism, I think we need a concept of group locking. That is, > suppose we have a user backend and N worker backends collaborating to > execute some query. For the sake of argument, let's say it's a > parallel CLUSTER, requiring a full table lock. We need all of the > backends to be able to lock the table even though at least one of them > holds AccessExclusiveLock. This suggests that the backends should all > be members of a locking group, and that locks within the same locking > group should be regarded as mutually non-conflicting. Trying to catch up on this thread, please excuse me if these questions are already covered. You mention the possibility of undetected deadlocks, which is surely unacceptable. But why not improve the deadlock detector? How bad are _detected_ deadlocks? A lot of the concern was around catalog accesses, but those lock requests would rarely wait anyway. I also wonder if group locking is generally the wrong approach to parallelism. Parallel scan/sort should work by assigning workers to chunks of data, and then merging the results. In principle the workers don't need to touch the same data at all, so why are we trying so hard to get them to all take the same locks? The reason, I assume, is that a full table is the lockable object, but we are imagining the segment files as the chunks. But maybe there's a way to address this more directly with an extra field in the lock tag, and perhaps some catalog knowledge? Regards,Jeff Davis
On Wed, Nov 12, 2014 at 2:57 AM, Jeff Davis <pgsql@j-davis.com> wrote: > Trying to catch up on this thread, please excuse me if these questions > are already covered. Welcome to the party. The more, the merrier. :-) > You mention the possibility of undetected deadlocks, which is surely > unacceptable. But why not improve the deadlock detector? How bad are > _detected_ deadlocks? A lot of the concern was around catalog accesses, > but those lock requests would rarely wait anyway. Detected deadlocks are fine. Improving the deadlock detector is the heart of what needs to be done here. As you say, the lock requests we're talking about will rarely wait, so deadlocks won't be frequent. The issue is making sure that, if they do happen, we get a better behavior than "your parallel query hangs forever; good luck figuring out why". > I also wonder if group locking is generally the wrong approach to > parallelism. Parallel scan/sort should work by assigning workers to > chunks of data, and then merging the results. In principle the workers > don't need to touch the same data at all, so why are we trying so hard > to get them to all take the same locks? > > The reason, I assume, is that a full table is the lockable object, but > we are imagining the segment files as the chunks. But maybe there's a > way to address this more directly with an extra field in the lock tag, > and perhaps some catalog knowledge? In the common case, we're only taking AccessShareLock on some relation to prevent it from being concurrently dropped or rewritten. Locking only part of the relation wouldn't lead to any improvement in concurrency, because the full-relation lock isn't conflicting with anything that the partial-relation lock wouldn't also need to conflict with. More generally, I think there's some misunderstanding about the overall goal of the parallelism infrastructure that I'm trying to create. Many people have proposed interesting strategies for how we could do various things (locking, planning, execution) better. Some of those ideas are, doubtless, good ones. But my goal is in some ways the opposite: I'm trying to make it possible to run as much existing PostgreSQL backend code as possible inside a parallel worker without any modification. To do that, I'm trying to modify the subsystems that are widely used throughout the backend - such as locking - in such a way that the people using that infrastructure need not put conditional logic into their code to make it do one thing when in parallel mode and another thing when not in parallel mode. And I'm also trying to avoid fundamentally changing the way major subsystems work today as a precondition of parallelism. So I'm taking a very surgical approach to the patches I propose. I'm not proposing patches that do anything fundamentally new or different; instead, I'm proposing patches that let the stuff we already do continue to work. In a single backend executing a query, we have lots of code that assumes you can allocate memory, look data up in a syscache, throw an error, lock a buffer, examine the value of a GUC, test a tuple against the active snapshot, and so on. For parallel mode to be useful, those same things need to work in a parallel worker. We of course do not need every crazy thing somebody may want to do in a parallel worker, nor will it. But anything that's done in thousands of places throughout the code had better work, or parallelism will be so restricted as to be useless. So commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, for example, is about letting "ereport" work in a parallel worker, and this patch is about making things like "SearchSysCache1" and "PG_GETARG_TEXT_PP" work *reliably* in a parallel worker. Now, the time to do new and different things is coming, and is maybe even now not so very far away. I'm not an expert on parallel execution, and I'm sure there are a number of people far more skilled than I at figuring out how to do a parallel scan, join, sort, or whatever quickly. What I'm trying to do is get the infrastructure to a point where those people don't have to worry about whether the basic facilities that we rely on throughout the backend are gonna work. By way of comparison, think about the periodic discussions about making the backend multi-threaded. Since the backend relies on global variables in an enormous number of places, it isn't thread-safe. If you spawn a thread inside a PostgreSQL backend, it can't safely ereport(), palloc(), LWLockAcquire(), or just about anything else, which means that, although anybody can write code that calls pthread_create(), it's not useful to do so because there's practically no existing backend code that you could safely call from the new thread. Using background worker processes eliminates a lot of those problems - palloc and lwlocks just work, for example - but not all of them. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, 2014-11-12 at 14:16 -0500, Robert Haas wrote: > Detected deadlocks are fine. Improving the deadlock detector is the > heart of what needs to be done here. OK, great. > As you say, the lock requests > we're talking about will rarely wait, so deadlocks won't be frequent. > The issue is making sure that, if they do happen, we get a better > behavior than "your parallel query hangs forever; good luck figuring > out why". Right. We can still use this patch's notion of a lock group in the deadlock detector, but we don't need it to actually affect the way a lock is granted. That should eliminate concerns about subtle bugs. Later, after we understand how this is actually used, and if we see deadlock problems, we can look for ways to solve/mitigate them. This seems to be what Andres was saying, here: http://www.postgresql.org/message-id/20141031130727.GF13584@awork2.anarazel.de So I'll follow up in that thread, because it's an interesting discussion. > More generally, I think there's some misunderstanding about the > overall goal of the parallelism infrastructure that I'm trying to > create. ... But my goal is in some ways > the opposite: I'm trying to make it possible to run as much existing > PostgreSQL backend code as possible inside a parallel worker without > any modification. Thank you for clarifying, I think this is a good approach. Back to the patch: If I understand correctly, the _primary_ goal of this patch is to make it safe to take out heavyweight locks in worker processes, even if the deadlock involves LWLocks/latches synchronizing among the processes within a lock group. For example, say processes A1 and A2 are in the same lock group, and B is in a different lock group. A1 is holding heavyweight lock H1 and waiting on a LW lock L1; A2 is holding L1 and waiting on heavyweight lock H2; and B is holding H2 and waiting on H1. The current deadlock detector would see a dependency graph like: A2 -> B -> A1 But with lock groups, it would see: (A1 A2) -> B -> (A1 A2) which is a cycle, and can be detected regardless of the synchronization method used between A1 and A2. There are some details to work out to avoid false positives, of course. Is that about right? Regards,Jeff Davis
Great discussion. Robert, I think you make a lot of very valid points, but philosophically I am in pretty strong agreement with Andres here. On Fri, 2014-10-31 at 14:29 -0400, Robert Haas wrote: > > So just refusing parallelism as soon as anything has taken an access > > exclusive lock doesn't sound too bad to me. > > That restriction seems onerous to me; for example, it would prevent a > parallel sort for CLUSTER or a parallel index rebuild for VACUUM FULL. > Those seems like worthy targets for parallelism, and maybe even early > targets, since I expect a lot of the real complexity here to be in the > query planner, and you can avoid most of that complexity when planning > DDL. If two backends both have an exclusive lock on the relation for a join operation, that implies that they need to do their own synchronization, because obviously the lock manager is not doing it for them. In turn, that implies that we need parallel versions of these operations; going to each one and parallelizing it. I don't think that's a good long-term strategy... it changes our simple executor (tuples in, tuples out) to something quite different (each node needs to know a lot about whatever is underneath it in order to divide the data up). > process and both requests would have been granted. Parallelism is > generally not about making different things happen; it's about > spreading the stuff that would happen anyway across multiple backends, This point would be much stronger for declarative queries. When the user issues LOCK TABLE, they are getting their hands dirty with the implementation to some degree. If they really want to serialize, they can serialize around the lock of an object that they don't want to also read (in parallel). You're right that this would be surprising to users, but hopefully we can find a way to disable parallelism in that case, or at least issue a WARNING that the user is probably doing the wrong thing. > "cluster pg_collate"... the only solution is to > move the parallel workers ahead of the AccessExclusiveLock request, > but the deadlock detector won't do that unless it knows about the > relationship among the parallel workers. As discussed later in this thread, I also consider failure a solution ;-) So we don't need to move anything. If we have to tell our users that exclusive-locking a catalog can cause deadlocks for parallel query, I think they will find that reasonable. The same admin could turn parallelism down to one for non-superusers before they start doing that, and restore it later. > patch that does parallel anything has got to worry about what > undetected deadlocks might result, and then argue about which of them > are obscure enough that we shouldn't care and which are not. That > doesn't sound like a fun way to spend my time. We're in agreement that deadlocks can't go undetected. > But, really, the first case is the one I'm more worried about: a bunch > of parallel backends all grab the same locks, but slightly separated > in time. A strong locker joins the queue midway through and we're > hosed. I agree the situation sounds bad in the abstract, but none of the examples seem very compelling. Who locks random catalogs exclusively? And if they do, why is it the lock manager's job to try to reconcile the resulting mess without canceling someone? (OK, don't answer that. It obviously is the lock manager's job to some degree, but we can only demand so much.) > Even if you think that all of these particular issues are somehow > avoidable or that they don't matter, I think it would be unwise to bet > on them being the only issues. The lock manager looks like it's old, > ill-maintained code, but that's in no small part because it solved the > problem it aims at so thoroughly that few people have felt the need to > modify it very much in the last decade. I think that's part of why > there's so much skepticism on this thread - we don't think about the > problems that the lock manager solves as being important problems not > because they truly aren't important, but because they are so > thoroughly solved. Then every once in a while we complain about the > performance. Guilty as charged. I won't forever oppose changes to the lock manager, but I resist making them a precondition of parallelism (aside from making deadlock detection work, of course). Users taking out exclusive locks recklessly can result in deadlocks today, and will result in more when we do parallelism (though perhaps in more surprising situations). I remember when doing Exclusion Constraints, I went to a lot of effort to make deadlocks impossible, and was quite proud. When Tom saw it, he told me not to bother, and to do it the simple way instead, because deadlocks can happen even with UNIQUE constraints (which I didn't even know). We should use the same strategy here. When we see deadlocks becoming a problem for any reasonable workload, we make a series of tweaks (perhaps some to the lock manager itself) to reduce them. Regards,Jeff Davis
On Thu, Nov 13, 2014 at 2:26 AM, Jeff Davis <pgsql@j-davis.com> wrote: > The current deadlock detector would see a dependency graph like: > > A2 -> B -> A1 > > But with lock groups, it would see: > > (A1 A2) -> B -> (A1 A2) > > which is a cycle, and can be detected regardless of the synchronization > method used between A1 and A2. There are some details to work out to > avoid false positives, of course. > > Is that about right? Yep. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Nov 13, 2014 at 3:38 AM, Jeff Davis <pgsql@j-davis.com> wrote: > If two backends both have an exclusive lock on the relation for a join > operation, that implies that they need to do their own synchronization, > because obviously the lock manager is not doing it for them. This doesn't make sense to me. Why would they need to synchronize access to a relation in order to join it? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Nov 13, 2014 at 3:38 AM, Jeff Davis <pgsql@j-davis.com> wrote: >> If two backends both have an exclusive lock on the relation for a join >> operation, that implies that they need to do their own synchronization, >> because obviously the lock manager is not doing it for them. > This doesn't make sense to me. Why would they need to synchronize > access to a relation in order to join it? What's more to the point: why would you take an exclusive lock just to do a join? regards, tom lane
On November 13, 2014 8:50:18 PM CET, Tom Lane <tgl@sss.pgh.pa.us> wrote: >Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Nov 13, 2014 at 3:38 AM, Jeff Davis <pgsql@j-davis.com> >wrote: >>> If two backends both have an exclusive lock on the relation for a >join >>> operation, that implies that they need to do their own >synchronization, >>> because obviously the lock manager is not doing it for them. > >> This doesn't make sense to me. Why would they need to synchronize >> access to a relation in order to join it? > >What's more to the point: why would you take an exclusive lock just to >do a join? Robert's case basically rest on the premise that it's useful & correct when normally conflicting locks don't conflict whennormal processes and its workers acquire them. I have serious doubts about the safety of that and the complexity it requires. Obviously a join won't need an exclusive lock itself - but with Robert's design a join inside a transaction that locked arelation could still be parallelized, even if that rel is involved. Andred -- Please excuse brevity and formatting - I am writing this on my mobile phone. Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
<div dir="ltr">On Thu, Nov 13, 2014 at 11:26 AM, Robert Haas <<a href="mailto:robertmhaas@gmail.com">robertmhaas@gmail.com</a>>wrote:<br />><br />> On Thu, Nov 13, 2014 at 3:38AM, Jeff Davis <<a href="mailto:pgsql@j-davis.com">pgsql@j-davis.com</a>> wrote:<br />> > If two backendsboth have an exclusive lock on the relation for a join<br />> > operation, that implies that they need to dotheir own synchronization,<br />> > because obviously the lock manager is not doing it for them.<br />><br />>This doesn't make sense to me. Why would they need to synchronize<br />> access to a relation in order to joinit?<br /><br /><br />Unfortunate typo: that was supposed to be "joint" operation, just meaning that they are workingtogether for something (e.g. CLUSTER, VACUUM FULL as you suggest). Sorry for the confusion.<br /><br />I meant thatthe backends need to divide up the work somehow. And if each operator needs to divide up the work before operating, thatmeans we need to change every operator.<br /><br />Regards,<br /> Jeff Davis</div>
On Thu, Nov 13, 2014 at 4:57 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Thu, Nov 13, 2014 at 11:26 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Thu, Nov 13, 2014 at 3:38 AM, Jeff Davis <pgsql@j-davis.com> wrote: >> > If two backends both have an exclusive lock on the relation for a join >> > operation, that implies that they need to do their own synchronization, >> > because obviously the lock manager is not doing it for them. >> >> This doesn't make sense to me. Why would they need to synchronize >> access to a relation in order to join it? > > Unfortunate typo: that was supposed to be "joint" operation, just meaning > that they are working together for something (e.g. CLUSTER, VACUUM FULL as > you suggest). Sorry for the confusion. Ah, well, that's different. > I meant that the backends need to divide up the work somehow. And if each > operator needs to divide up the work before operating, that means we need to > change every operator. I think I see your point, which it just so happens Amit articulated to me in different words while we were chatting about this problem this morning. We want to avoid waiving the mutual exclusion provided by the lock manager only to end up re-implementing something very much like the current lock manager to arbitrate resource contention among backends within a parallel group. However, we also don't want the mutual exclusion that the lock manager provides to serve as a barrier to implementing useful parallelism; that is, if the parallel backends want to manage conflicts themselves instead of letting the lock manager do it, that should be possible. In http://www.postgresql.org/message-id/CA+TgmoYGpLoJo+LG1beBOs8gdjwjTQ0qdmxsYJ4ihFyJ11Tr-g@mail.gmail.com I propose a new approach. The basic idea is that any heavyweight lock that we hold at the start of a parallel phase can't represent a bona fide conflict between the activities of two different backends, so we arrange to waive conflicts over those locks. The problem (as Andres points out) is that those locks could be subsequently re-taken by operations that are not parallel-safe, and the new locks would be granted locally regardless of the shared lock table because the current backend would already hold them. However, I think it's fine to make it the job of the parallelism infrastructure to plug that hole, rather than trying to enforce it in the lock manager. There are LOTS of things that need to be prohibited in parallel mode and only allowed once they've been made sufficiently safe, and in the first version, that's certainly going to include - among other things - any operation that writes data. I think it's OK to the job of the people who want to relax those prohibitions to deal with making it safe to do so. To reiterate the basic problem here, if we do nothing at all about the lock manager, a parallel backend can stall trying to grab an AccessExclusiveLock that the user backend alread holds, either because the user backend holds an AccessExclusiveLock as well, or because some other process is waiting for one, we'll deadlock and not notice. We have to do *something* about that. I'd like to arrive at some consensus on how to move forward here as soon as possible, because the clock is ticking here. http://www.postgresql.org/message-id/CA+TgmoYGpLoJo+LG1beBOs8gdjwjTQ0qdmxsYJ4ihFyJ11Tr-g@mail.gmail.com articulates what I currently believe to be the best plan. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, 2014-11-17 at 14:32 -0500, Robert Haas wrote: > I think I see your point, which it just so happens Amit articulated to > me in different words while we were chatting about this problem this > morning. We want to avoid waiving the mutual exclusion provided by > the lock manager only to end up re-implementing something very much > like the current lock manager to arbitrate resource contention among > backends within a parallel group. Even worse, it seems like you lose composability of executor nodes. Right now, you can rearrange executor nodes in the planner quite easily, because they all take tuples as input and give tuples as output (with few exceptions). What is the input to a parallelized executor node? What is the output? How do you hook one parallized node to another in the planner? Does it stay parallel all the way through, or do you have to serialize between nodes? You can answer that for Scan, because it's a leaf node; but how will we get to HashJoin, etc.? > However, we also don't want the > mutual exclusion that the lock manager provides to serve as a barrier > to implementing useful parallelism; that is, if the parallel backends > want to manage conflicts themselves instead of letting the lock > manager do it, that should be possible. Agreed. Backends are quite generic, and they should be usable for all kinds of things. > To reiterate the basic problem here, if we do nothing at all about the > lock manager, a parallel backend can stall trying to grab an > AccessExclusiveLock that the user backend alread holds, either because > the user backend holds an AccessExclusiveLock as well, or because some > other process is waiting for one, we'll deadlock and not notice. My feeling is that we should keep the concept of a group and group leader from your patch, and improve the deadlock detector to make use of that information (looking at the code, it looks doable but not trivial). But unless I am missing something, we should separate out the lock sharing, and defer that until later. I am actually starting to see that something like lock sharing could be useful. I think your example of VACUUM (in the other thread) is a good one. But I don't see anything forcing us to decide now if we can instead detect the deadlocks and abort. We will have a lot more information later, and I think we'll make a better decision about the exact form it takes. In other words: lock groups is important, but I don't see the rush for lock sharing specifically. Regards,Jeff Davis
On Tue, Nov 18, 2014 at 3:10 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Mon, 2014-11-17 at 14:32 -0500, Robert Haas wrote:
>
> > To reiterate the basic problem here, if we do nothing at all about the
> > lock manager, a parallel backend can stall trying to grab an
> > AccessExclusiveLock that the user backend alread holds, either because
> > the user backend holds an AccessExclusiveLock as well, or because some
> > other process is waiting for one, we'll deadlock and not notice.
>
> My feeling is that we should keep the concept of a group and group
> leader from your patch, and improve the deadlock detector to make use of
> that information (looking at the code, it looks doable but not trivial).
> But unless I am missing something, we should separate out the lock
> sharing, and defer that until later.
>
> On Mon, 2014-11-17 at 14:32 -0500, Robert Haas wrote:
>
> > To reiterate the basic problem here, if we do nothing at all about the
> > lock manager, a parallel backend can stall trying to grab an
> > AccessExclusiveLock that the user backend alread holds, either because
> > the user backend holds an AccessExclusiveLock as well, or because some
> > other process is waiting for one, we'll deadlock and not notice.
>
> My feeling is that we should keep the concept of a group and group
> leader from your patch, and improve the deadlock detector to make use of
> that information (looking at the code, it looks doable but not trivial).
> But unless I am missing something, we should separate out the lock
> sharing, and defer that until later.
>
+1 to initially have something like you describe and may be an additional
mechanism to grant the non-conflicting locks which are already held by
master backend to child backends. I understand that allowing additional
non-conflicting locks could lead to some problem if write operations are
allowed via child backends as is described as point 1 in Robert's another
mail [1]. However I think as initially we want to allow read only operations
via child backends, this should be okay and later on if we want to
support write via child backends, we might want to consider having an
additional property with lock which will allow lock manager or deadlock detector
to consider them separately, so that those locks won't be granted to
child backends if there is conflict of same with parent.
On Tue, Nov 18, 2014 at 4:40 AM, Jeff Davis <pgsql@j-davis.com> wrote: >> To reiterate the basic problem here, if we do nothing at all about the >> lock manager, a parallel backend can stall trying to grab an >> AccessExclusiveLock that the user backend alread holds, either because >> the user backend holds an AccessExclusiveLock as well, or because some >> other process is waiting for one, we'll deadlock and not notice. > > My feeling is that we should keep the concept of a group and group > leader from your patch, and improve the deadlock detector to make use of > that information (looking at the code, it looks doable but not trivial). > But unless I am missing something, we should separate out the lock > sharing, and defer that until later. Doing as you propose solves this problem: 1. Backend A-1 acquires AccessShareLock on relation R. 2. Backend B waits for AccessExclusiveLock on relation R. 3. Backend A-2 waits for AccessShareLock on relation R. Knowing that A-1 and A-2 are related, the deadlock detector can move A-2 ahead of B and grant the lock. All is well. But your proposal does not solve this problem: 1. Backend A-1 acquires AccessExclusiveLock on relation R. 2. Backend A-2 waits for AccessShareLock on relation R. The good news is that the deadlock detector should realize that since A-1 and A-2 are in the same group, this is a deadlock. And it can abort either A-1 or A-2, which will eventually abort them both. The bad news, to borrow a phrase from Peter Geoghegan, is that it's an unprincipled deadlock; a user confronted with the news that her parallel scan has self-deadlocked will be justifiably dismayed. It's been proposed by Andres, and maybe a few other people, that we can detect this situation and just not use parallelism in these cases, but that's actually quite hard to do. Of course, it's pretty easy for a backend A-1 contemplating parallel scan of relation R to notice that it holds a lock that conflicts with the AccessShareLock it expects a prospective parallel backend A-2 to attempt to acquire. I don't think there's a lock manager method for that today, but we could easily add one. However, A-1 might incidentally hold other locks (from earlier statements in the transaction) that conflict with locks that will be sought by A-2, and as discussed *extensively* upthread, it is not easy to predict all of the locks a parellel backend might try to take: even seemingly-trivial code like the equality operators for built-in data types can sometimes take relation locks, and we have no infrastructure for predicting which ones. Even if you could somehow solve that problem, there's also the issue of plan caching. If we make a parallel plan for an SQL statement inside a PL/pgsql procedure because our backend holds no strong locks, and then we acquire a strong lock on a relevant relation, we have to invalidate that plan. I think that will add complexity inside the plan cache machinery. It seems to me that it would be better to handle these issues inside the lock manager rather than letting them creep into other parts of the system. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Nov 19, 2014 at 9:33 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Nov 18, 2014 at 4:40 AM, Jeff Davis <pgsql@j-davis.com> wrote:
> >> To reiterate the basic problem here, if we do nothing at all about the
> >> lock manager, a parallel backend can stall trying to grab an
> >> AccessExclusiveLock that the user backend alread holds, either because
> >> the user backend holds an AccessExclusiveLock as well, or because some
> >> other process is waiting for one, we'll deadlock and not notice.
> >
> > My feeling is that we should keep the concept of a group and group
> > leader from your patch, and improve the deadlock detector to make use of
> > that information (looking at the code, it looks doable but not trivial).
> > But unless I am missing something, we should separate out the lock
> > sharing, and defer that until later.
>
> Doing as you propose solves this problem:
>
> 1. Backend A-1 acquires AccessShareLock on relation R.
> 2. Backend B waits for AccessExclusiveLock on relation R.
> 3. Backend A-2 waits for AccessShareLock on relation R.
>
> Knowing that A-1 and A-2 are related, the deadlock detector can move
> A-2 ahead of B and grant the lock. All is well.
>
> But your proposal does not solve this problem:
>
> 1. Backend A-1 acquires AccessExclusiveLock on relation R.
> 2. Backend A-2 waits for AccessShareLock on relation R.
>
>
> Even if you could somehow solve that problem, there's also the issue
> of plan caching. If we make a parallel plan for an SQL statement
> inside a PL/pgsql procedure because our backend holds no strong locks,
> and then we acquire a strong lock on a relevant relation, we have to
> invalidate that plan. I think that will add complexity inside the
> plan cache machinery. It seems to me that it would be better to
> handle these issues inside the lock manager rather than letting them
> creep into other parts of the system.
>
>
> On Tue, Nov 18, 2014 at 4:40 AM, Jeff Davis <pgsql@j-davis.com> wrote:
> >> To reiterate the basic problem here, if we do nothing at all about the
> >> lock manager, a parallel backend can stall trying to grab an
> >> AccessExclusiveLock that the user backend alread holds, either because
> >> the user backend holds an AccessExclusiveLock as well, or because some
> >> other process is waiting for one, we'll deadlock and not notice.
> >
> > My feeling is that we should keep the concept of a group and group
> > leader from your patch, and improve the deadlock detector to make use of
> > that information (looking at the code, it looks doable but not trivial).
> > But unless I am missing something, we should separate out the lock
> > sharing, and defer that until later.
>
> Doing as you propose solves this problem:
>
> 1. Backend A-1 acquires AccessShareLock on relation R.
> 2. Backend B waits for AccessExclusiveLock on relation R.
> 3. Backend A-2 waits for AccessShareLock on relation R.
>
> Knowing that A-1 and A-2 are related, the deadlock detector can move
> A-2 ahead of B and grant the lock. All is well.
>
> But your proposal does not solve this problem:
>
> 1. Backend A-1 acquires AccessExclusiveLock on relation R.
> 2. Backend A-2 waits for AccessShareLock on relation R.
>
I think that could be doable under above proposal by making lock
conflicts (LockCheckConflicts) code aware of parallel group. But
it might lead to a problem incase it allows to acquire a lock to parallel
worker where it should not like in cases of Relation extension as
mentioned by you on another thread. However I think we should block
any such operations and allow read only operations in parallel workers.
It seems to me that we have to do lot more things to build a capability
to allow parallel workers to perform write operations.
>
> Even if you could somehow solve that problem, there's also the issue
> of plan caching. If we make a parallel plan for an SQL statement
> inside a PL/pgsql procedure because our backend holds no strong locks,
> and then we acquire a strong lock on a relevant relation, we have to
> invalidate that plan. I think that will add complexity inside the
> plan cache machinery. It seems to me that it would be better to
> handle these issues inside the lock manager rather than letting them
> creep into other parts of the system.
>
How would changes in lock manager guarantee that it won't be problem
in anycase?
On Wed, 2014-11-19 at 11:03 -0500, Robert Haas wrote: > But your proposal does not solve this problem: > > 1. Backend A-1 acquires AccessExclusiveLock on relation R. > 2. Backend A-2 waits for AccessShareLock on relation R. > > The good news is that the deadlock detector should realize that since > A-1 and A-2 are in the same group, this is a deadlock. And it can > abort either A-1 or A-2, which will eventually abort them both. Right. It can even give a nice error hint to tell the user how to avoid the problem. > The > bad news, to borrow a phrase from Peter Geoghegan, is that it's an > unprincipled deadlock; a user confronted with the news that her > parallel scan has self-deadlocked will be justifiably dismayed. You seem to be raising this as a show-stopping problem, and I'm not convinced that it is. (a) There are no parallel operators yet, so this is speculative. (b) Out of the parallel operators you imagine (e.g. Scan), we don't expect anything other than AccessShare locks in the "normal" case. (c) The failure mode is not so bad; it's just an error and the user can probably work around it. You could argue that we know we're headed for this problem, and therefore we should solve it now. I disagree. You are assuming that sharing exclusive heavyweight locks among a group will be a fundamental part of everything postgres does with parallelism; but not every design requires it. Regards,Jeff Davis
On Thu, Nov 20, 2014 at 4:21 AM, Jeff Davis <pgsql@j-davis.com> wrote: >> The >> bad news, to borrow a phrase from Peter Geoghegan, is that it's an >> unprincipled deadlock; a user confronted with the news that her >> parallel scan has self-deadlocked will be justifiably dismayed. > > You seem to be raising this as a show-stopping problem, and I'm not > convinced that it is. Well, what I'm saying is that at very minimum we have to be able detect deadlocks, and we have two plausible designs for avoiding that: 1. Modify the deadlock detector to know about lock groups. 2. Propagate pre-existing locks from the user backend to all the workers. I initially proposed #1, but now I think #2 solves more of the problems for less code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, 2014-11-20 at 11:22 -0500, Robert Haas wrote: > 2. Propagate pre-existing locks from the user backend to all the workers. > > I initially proposed #1, but now I think #2 solves more of the > problems for less code. OK. The primary concern with that is unintended consequences. But it's reasonable for you to ask for something more concrete. I will think on this more. A few things I'm thinking about now: * What do you mean by "pre-existing"? Locks existing prior to what event? (I don't think that's exactly what you meant.)* What's the conceptual difference between granting locks that would otherwise conflict with another process in the group (which is what this proposal is about) and having exactly the same set of locks? Is there any?* Let's say you have processes A1 and A2 in one group, and B. A1 and B both have an AccessShare lock, and A2 tries to acquire an exclusive lock. B is waiting on A2. That's still a deadlock, right? Regards,Jeff Davis
On 2014-11-20 11:22:37 -0500, Robert Haas wrote: > On Thu, Nov 20, 2014 at 4:21 AM, Jeff Davis <pgsql@j-davis.com> wrote: > >> The > >> bad news, to borrow a phrase from Peter Geoghegan, is that it's an > >> unprincipled deadlock; a user confronted with the news that her > >> parallel scan has self-deadlocked will be justifiably dismayed. > > > > You seem to be raising this as a show-stopping problem, and I'm not > > convinced that it is. > > Well, what I'm saying is that at very minimum we have to be able > detect deadlocks, and we have two plausible designs for avoiding that: > > 1. Modify the deadlock detector to know about lock groups. > > 2. Propagate pre-existing locks from the user backend to all the workers. > > I initially proposed #1, but now I think #2 solves more of the > problems for less code. Except that it opens us up for all kinds of concurrency bugs. I'm pretty strictly set against granting any self exclusive locks en-masse. If we do this by default for all granted locks when starting a worker backend it'll get *so* much harder to reason about correctness. Suddenly locks don't guarantee what they used to anymore. We'll e.g. not be able to rely that a CheckTableNotInUse() + AEL makes it safe to drop/rewrite/whatever a relation - even if that happens in the main backend. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Nov 20, 2014 at 12:12 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Thu, 2014-11-20 at 11:22 -0500, Robert Haas wrote: >> 2. Propagate pre-existing locks from the user backend to all the workers. >> >> I initially proposed #1, but now I think #2 solves more of the >> problems for less code. > > OK. The primary concern with that is unintended consequences. But it's > reasonable for you to ask for something more concrete. I will think on > this more. > > A few things I'm thinking about now: > > * What do you mean by "pre-existing"? Locks existing prior to what > event? (I don't think that's exactly what you meant.) > * What's the conceptual difference between granting locks that would > otherwise conflict with another process in the group (which is what this > proposal is about) and having exactly the same set of locks? Is there > any? > * Let's say you have processes A1 and A2 in one group, and B. A1 and B > both have an AccessShare lock, and A2 tries to acquire an exclusive > lock. B is waiting on A2. That's still a deadlock, right? I think I discussed all of these issues on the other thread already. Am I wrong? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Nov 20, 2014 at 12:19 PM, Andres Freund <andres@2ndquadrant.com> wrote: > Except that it opens us up for all kinds of concurrency bugs. I'm pretty > strictly set against granting any self exclusive locks en-masse. If we > do this by default for all granted locks when starting a worker backend > it'll get *so* much harder to reason about correctness. Suddenly locks > don't guarantee what they used to anymore. We'll e.g. not be able to > rely that a CheckTableNotInUse() + AEL makes it safe to > drop/rewrite/whatever a relation - even if that happens in the main > backend. Haven't I responded to this a few times already? I see no way, even theoretically, that it can be sane for CheckTableNotInUse() to succeed in a parallel context. Ever. If the deadlock detector would kill the processes anyway, it doesn't matter, because CheckTableNotInUse() should do it first, so that we get a better error and without waiting for deadlock_timeout. So any scenario that's predicated on the assumption that CheckTableNotInUse() will succeed in a parallel context is 100% unconvincing to me as an argument for anything. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-11-20 13:57:25 -0500, Robert Haas wrote: > On Thu, Nov 20, 2014 at 12:19 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > Except that it opens us up for all kinds of concurrency bugs. I'm pretty > > strictly set against granting any self exclusive locks en-masse. If we > > do this by default for all granted locks when starting a worker backend > > it'll get *so* much harder to reason about correctness. Suddenly locks > > don't guarantee what they used to anymore. We'll e.g. not be able to > > rely that a CheckTableNotInUse() + AEL makes it safe to > > drop/rewrite/whatever a relation - even if that happens in the main > > backend. > > Haven't I responded to this a few times already? Not in a particularly convincing way at least. > I see no way, even theoretically, that it can be sane for > CheckTableNotInUse() to succeed in a parallel context. Ever. It's not exactly inconceivable that you'd want a parallel CLUSTER/REINDEX/VACUUM or something similar. That'll require cooperating backends, true, but it's still a pretty sane case. You haven't really made your case why you want to allow access to AEL locked relations in a parallel context in the first place. > If the > deadlock detector would kill the processes anyway, it doesn't matter, > because CheckTableNotInUse() should do it first, so that we get a > better error and without waiting for deadlock_timeout. So any > scenario that's predicated on the assumption that CheckTableNotInUse() > will succeed in a parallel context is 100% unconvincing to me as an > argument for anything. Meh, there's plenty cases where CheckTableNotInUse() isn't used where we still rely on locks actually working. And it's not just postgres code, this *will* break user code. Your argument essentially is that we don't actually need to lock objects if the other side only reads. Yes, you add a condition or two ontop (e.g. CheckTableNotInUse() fails), but that's not that much. If we want to do this we really need to forbid *any* modification to catalog/user tables while parallel operations are ongoing. In the primary and worker backends. I think that's going to end up being much more problematic/restrictive than not allowing non-cooperative paralellism after >= ShareUpdateExclusive. If we ever want to parallelize writing operations we'll regret this quite a bit. Breaking the locking system - and that's what you're doing by removing the usual guarantees - seems just fundamentally wrong. Yes, I can't name all the dangers that I think are out there. The 'lets grant all the current locks at start of parallel query' approach seems quite a bit safer than always doing that during parallel query, don't get me wrong. Btw, what are your thoughts about SERIALIZABLE and parallel query? Afaics that'd be pretty hard to support? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Nov 20, 2014 at 3:15 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> Haven't I responded to this a few times already? > Not in a particularly convincing way at least. That's not a very helpful response. >> I see no way, even theoretically, that it can be sane for >> CheckTableNotInUse() to succeed in a parallel context. Ever. > > It's not exactly inconceivable that you'd want a parallel > CLUSTER/REINDEX/VACUUM or something similar. That'll require cooperating > backends, true, but it's still a pretty sane case. > > You haven't really made your case why you want to allow access to AEL > locked relations in a parallel context in the first place. Well, what the heck was I doing here? http://www.postgresql.org/message-id/CA+Tgmoa_+u-qCcFZ9zAVZFk+TGXbqQB1+0ktPZRGgPJ5PvrJVw@mail.gmail.com I've *repeatedly* point out that if you *don't* allow this, stuff breaks. And you haven't answered how you'd like to fix that. The only answer you've given so far, that I can see, is that maybe we can foresee all the locks that the child might take and not initiate parallelism if any of them conflict with locks we already hold. But that's not a reasonable approach; we have no reasonable way of predicting what locks the child will need, and even if we did, things can change between plan time and execution time. >> If the >> deadlock detector would kill the processes anyway, it doesn't matter, >> because CheckTableNotInUse() should do it first, so that we get a >> better error and without waiting for deadlock_timeout. So any >> scenario that's predicated on the assumption that CheckTableNotInUse() >> will succeed in a parallel context is 100% unconvincing to me as an >> argument for anything. > > Meh, there's plenty cases where CheckTableNotInUse() isn't used where we > still rely on locks actually working. And it's not just postgres code, > this *will* break user code. It would help if you'd provide some specific examples of the problems you foresee. In the absence of specificity, it's hard to tell come to any conclusions about whether your concerns are well-founded, or propose any way to overcome them. > Your argument essentially is that we don't actually need to lock objects > if the other side only reads. Yes, you add a condition or two ontop > (e.g. CheckTableNotInUse() fails), but that's not that much. I don't think that's my argument at all. What I'm arguing is that there's more than one reason for taking locks. Broadly, I think we take some locks to preserve transaction isolation guarantees and others to preserve data integrity. If you're adding or dropping a column, nobody else had better be looking at that relation, because they might get confused and crash the server or something. Backends, even cooperating parallel backends, won't be able to cope with the tuple descriptor changing under them. But once that operation is complete, there's no reason why the *next* statement in that transaction can't see the relation you just modified, even though it's still now exclusively locked. As long as we make sure that new parallel backends use our snapshot, they'll see the right metadata for that relation and can safely access it. We still can't let *other* people see that relation because the ALTER TABLE is an uncommitted change, but that's no argument against letting our own parallel workers look at it. > If we want > to do this we really need to forbid *any* modification to catalog/user > tables while parallel operations are ongoing. In the primary and worker > backends. I think that's going to end up being much more > problematic/restrictive than not allowing non-cooperative paralellism > after >= ShareUpdateExclusive. If we ever want to parallelize writing > operations we'll regret this quite a bit. I can't really follow how this follows from anything I said. > Breaking the locking system - and that's what you're doing by removing > the usual guarantees - seems just fundamentally wrong. Yes, I can't > name all the dangers that I think are out there. > > The 'lets grant all the current locks at start of parallel query' > approach seems quite a bit safer than always doing that during parallel > query, don't get me wrong. "Lets grant all the current locks at start of parallel query" is the proposal I'm focused on right now. Based on your arguments and those of others, I have long-since abandoned the original proposal of having locks within a group never conflict. It seemed like a good idea at the time, but it would put way too much burden on the parallel workers to coordinate their activity with each other, so it's dead. What I want to establish at this time is what types of problems, if any, you or others can foresee with "Lets grant all the current locks at start of parallel query" - because I sense considerable skepticism, but cannot put my finger on anything concretely wrong with it. > Btw, what are your thoughts about SERIALIZABLE and parallel query? > Afaics that'd be pretty hard to support? Hmm, I haven't thought about that. Since we never *wait* for SIREAD locks, and there's no deadlock detection to worry about, we might be able to finagle things so that the parallel backends just take the same SIREAD locks the parent would have taken, using the parent's PGPROC. But that could easily turn out to be unworkable, insufficient, or otherwise brain-damaged. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-11-20 15:47:39 -0500, Robert Haas wrote: > On Thu, Nov 20, 2014 at 3:15 PM, Andres Freund <andres@2ndquadrant.com> wrote: > >> Haven't I responded to this a few times already? > > Not in a particularly convincing way at least. > > That's not a very helpful response. Well... > >> I see no way, even theoretically, that it can be sane for > >> CheckTableNotInUse() to succeed in a parallel context. Ever. > > > > It's not exactly inconceivable that you'd want a parallel > > CLUSTER/REINDEX/VACUUM or something similar. That'll require cooperating > > backends, true, but it's still a pretty sane case. > > > > You haven't really made your case why you want to allow access to AEL > > locked relations in a parallel context in the first place. > > Well, what the heck was I doing here? > > http://www.postgresql.org/message-id/CA+Tgmoa_+u-qCcFZ9zAVZFk+TGXbqQB1+0ktPZRGgPJ5PvrJVw@mail.gmail.com I don't see anything about actual use cases there. > I've *repeatedly* point out that if you *don't* allow this, stuff > breaks. You've pointed out some strange corner cases that might break yes. > And you haven't answered how you'd like to fix that. The > only answer you've given so far, that I can see, is that maybe we can > foresee all the locks that the child might take and not initiate > parallelism if any of them conflict with locks we already hold. But > that's not a reasonable approach; we have no reasonable way of > predicting what locks the child will need, and even if we did, things > can change between plan time and execution time. We don't need to predict all future locks. We only need to check which self-exclusive locks we are already holding. I don't buy the plan time/execution time argument. We'll need to be able to adapt plans to the availability of bgworker slots *anyway*. Otherwise we either need to to set the number of bgworkers to "MaxConnections * MaxParallelism" or live with errors as soon as too many processes use parallelism. The former is clearly unrealistic given PG's per-backend overhead and the latter will be a *much* bigger PITA than all the deadlock scenarios talked about here. It'll constantly fail, even if everything is ok. > >> If the > >> deadlock detector would kill the processes anyway, it doesn't matter, > >> because CheckTableNotInUse() should do it first, so that we get a > >> better error and without waiting for deadlock_timeout. So any > >> scenario that's predicated on the assumption that CheckTableNotInUse() > >> will succeed in a parallel context is 100% unconvincing to me as an > >> argument for anything. > > > > Meh, there's plenty cases where CheckTableNotInUse() isn't used where we > > still rely on locks actually working. And it's not just postgres code, > > this *will* break user code. > > It would help if you'd provide some specific examples of the problems > you foresee. In the absence of specificity, it's hard to tell come to > any conclusions about whether your concerns are well-founded, or > propose any way to overcome them. How about a frontend process having created a relation that then starts a parallel query. Then the frontend process ERRORs out and, in the course of that, does smgrDoPendingDeletes(). Which will delete the new relation. Boom. The background process might just be accessing it. If you think thats harmless, think e.g. what'd happen with heap cleanup records generated in the background process. They'd not replay. Another one is that the frontend process does, from some function or so, CREATE POLICY. Triggering a relcache flush. And RelationClearRelation() essentially assumes that a referenced relcache entry can't change (which is the reason the keep_* stuff is in RelationClearRelation()). I'm pretty sure there's at least a dozen other such hazards around. > > Your argument essentially is that we don't actually need to lock objects > > if the other side only reads. Yes, you add a condition or two ontop > > (e.g. CheckTableNotInUse() fails), but that's not that much. > > I don't think that's my argument at all. What I'm arguing is that > there's more than one reason for taking locks. Broadly, I think we > take some locks to preserve transaction isolation guarantees and > others to preserve data integrity. If you're adding or dropping a > column, nobody else had better be looking at that relation, because > they might get confused and crash the server or something. Backends, > even cooperating parallel backends, won't be able to cope with the > tuple descriptor changing under them. But once that operation is > complete, there's no reason why the *next* statement in that > transaction can't see the relation you just modified, even though it's > still now exclusively locked. As long as we make sure that new > parallel backends use our snapshot, they'll see the right metadata for > that relation and can safely access it. We still can't let *other* > people see that relation because the ALTER TABLE is an uncommitted > change, but that's no argument against letting our own parallel > workers look at it. I think the transaction abort example from above refutes that theory. > > Btw, what are your thoughts about SERIALIZABLE and parallel query? > > Afaics that'd be pretty hard to support? > > Hmm, I haven't thought about that. Since we never *wait* for SIREAD > locks, and there's no deadlock detection to worry about, we might be > able to finagle things so that the parallel backends just take the > same SIREAD locks the parent would have taken, using the parent's > PGPROC. But that could easily turn out to be unworkable, > insufficient, or otherwise brain-damaged. I think there's some local state as well, so that's probably not so easy. I guess this needs input from Kevin. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Nov 20, 2014 at 4:45 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> And you haven't answered how you'd like to fix that. The >> only answer you've given so far, that I can see, is that maybe we can >> foresee all the locks that the child might take and not initiate >> parallelism if any of them conflict with locks we already hold. But >> that's not a reasonable approach; we have no reasonable way of >> predicting what locks the child will need, and even if we did, things >> can change between plan time and execution time. > > We don't need to predict all future locks. We only need to check which > self-exclusive locks we are already holding. I can't follow that logic. Why do you think self-exclusive locks are the only ones that matter? > I don't buy the plan time/execution time argument. We'll need to be able > to adapt plans to the availability of bgworker slots *anyway*. Otherwise > we either need to to set the number of bgworkers to "MaxConnections * > MaxParallelism" or live with errors as soon as too many processes use > parallelism. The former is clearly unrealistic given PG's per-backend > overhead and the latter will be a *much* bigger PITA than all the > deadlock scenarios talked about here. It'll constantly fail, even if > everything is ok. I certainly think that any parallel node needs to be able to cope with getting fewer workers than it would ideally like to have. But that's different from saying that when our own backend takes new locks, we have to invalidate plans. Those seem like largely unrelated problems. >> >> If the >> >> deadlock detector would kill the processes anyway, it doesn't matter, >> >> because CheckTableNotInUse() should do it first, so that we get a >> >> better error and without waiting for deadlock_timeout. So any >> >> scenario that's predicated on the assumption that CheckTableNotInUse() >> >> will succeed in a parallel context is 100% unconvincing to me as an >> >> argument for anything. >> > >> > Meh, there's plenty cases where CheckTableNotInUse() isn't used where we >> > still rely on locks actually working. And it's not just postgres code, >> > this *will* break user code. >> >> It would help if you'd provide some specific examples of the problems >> you foresee. In the absence of specificity, it's hard to tell come to >> any conclusions about whether your concerns are well-founded, or >> propose any way to overcome them. > > How about a frontend process having created a relation that then starts > a parallel query. Then the frontend process ERRORs out and, in the > course of that, does smgrDoPendingDeletes(). Which will delete the new > relation. Boom. The background process might just be accessing it. If > you think thats harmless, think e.g. what'd happen with heap cleanup > records generated in the background process. They'd not replay. OK, that's an excellent example. Thanks. I'll think about that. > Another one is that the frontend process does, from some function or so, > CREATE POLICY. Triggering a relcache flush. And RelationClearRelation() > essentially assumes that a referenced relcache entry can't change (which > is the reason the keep_* stuff is in RelationClearRelation()). I don't quite follow that one. Are you saying that the frontend would do a CREATE POLICY while there are backend workers running? I seriously doubt that can ever be made safe, but I'd choose to prohibit it using something other than the heavyweight lock manager. > I think there's some local state as well, so that's probably not so > easy. I guess this needs input from Kevin. Yeah. I'd be OK to have v1 disable parallelism at serializable, and fix that in v2, but of course if we can avoid that it's even better. The heavyweight locking issue is really killing me, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-11-20 20:21:07 -0500, Robert Haas wrote: > On Thu, Nov 20, 2014 at 4:45 PM, Andres Freund <andres@2ndquadrant.com> wrote: > >> And you haven't answered how you'd like to fix that. The > >> only answer you've given so far, that I can see, is that maybe we can > >> foresee all the locks that the child might take and not initiate > >> parallelism if any of them conflict with locks we already hold. But > >> that's not a reasonable approach; we have no reasonable way of > >> predicting what locks the child will need, and even if we did, things > >> can change between plan time and execution time. > > > > We don't need to predict all future locks. We only need to check which > > self-exclusive locks we are already holding. > > I can't follow that logic. Why do you think self-exclusive locks are > the only ones that matter? All the others can be acquired by jumping in front of the lock's waitqueue? > > I don't buy the plan time/execution time argument. We'll need to be able > > to adapt plans to the availability of bgworker slots *anyway*. Otherwise > > we either need to to set the number of bgworkers to "MaxConnections * > > MaxParallelism" or live with errors as soon as too many processes use > > parallelism. The former is clearly unrealistic given PG's per-backend > > overhead and the latter will be a *much* bigger PITA than all the > > deadlock scenarios talked about here. It'll constantly fail, even if > > everything is ok. > > I certainly think that any parallel node needs to be able to cope with > getting fewer workers than it would ideally like to have. But that's > different from saying that when our own backend takes new locks, we > have to invalidate plans. Those seem like largely unrelated problems. No need to invalidate them if you have the infrastructure to run with no parallel workers - just use the path that's used if there's no workers available. > > Another one is that the frontend process does, from some function or so, > > CREATE POLICY. Triggering a relcache flush. And RelationClearRelation() > > essentially assumes that a referenced relcache entry can't change (which > > is the reason the keep_* stuff is in RelationClearRelation()). > > I don't quite follow that one. Are you saying that the frontend would > do a CREATE POLICY while there are backend workers running? Yes. > I > seriously doubt that can ever be made safe, but I'd choose to prohibit > it using something other than the heavyweight lock manager. Right. It's perfectly fine to forbid it imo. But there's lots of places that have assumptions about the locking behaviour baked in, and the relevant bugs will be hard to find. > > I think there's some local state as well, so that's probably not so > > easy. I guess this needs input from Kevin. > > Yeah. I'd be OK to have v1 disable parallelism at serializable, and > fix that in v2, but of course if we can avoid that it's even better. I don't have a problem with that either. > The heavyweight locking issue is really killing me, though. I don't really understand why you're not content with just detecting deadlocks for now. Everything else seems like bells and whistles for later. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Nov 20, 2014 at 8:31 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> I can't follow that logic. Why do you think self-exclusive locks are >> the only ones that matter? > > All the others can be acquired by jumping in front of the lock's > waitqueue? That's is true, as a practical matter, in many cases. But from the point of view of the lock manager, it's really not true. It is almost true if you assume that the lock level acquired by the parallel backend will be weaker than the lock level held by the user backed, but not quite, because RowExclusiveLock conflicts with ShareLock. The assumption that the parallel backend's lock level will always be weaker seems shaky. It's probably true for relation locks, but if we ever want to support parallel workers that can write any data whatsoever, they're going to need to be able to take relation extension locks. If one backend in a parallel group attempts to acquire the relation extension lock while another worker holds it, the rule that all group members should be regarded as waiting for each other will result in instantaneous deadlock. That's pretty undesirable, and suggests to me that the whole model is defective in some way. >> > I don't buy the plan time/execution time argument. We'll need to be able >> > to adapt plans to the availability of bgworker slots *anyway*. Otherwise >> > we either need to to set the number of bgworkers to "MaxConnections * >> > MaxParallelism" or live with errors as soon as too many processes use >> > parallelism. The former is clearly unrealistic given PG's per-backend >> > overhead and the latter will be a *much* bigger PITA than all the >> > deadlock scenarios talked about here. It'll constantly fail, even if >> > everything is ok. >> >> I certainly think that any parallel node needs to be able to cope with >> getting fewer workers than it would ideally like to have. But that's >> different from saying that when our own backend takes new locks, we >> have to invalidate plans. Those seem like largely unrelated problems. > > No need to invalidate them if you have the infrastructure to run with no > parallel workers - just use the path that's used if there's no workers > available. Well, it's OK for a query to run with less paralellism than the planner thought optimal. It's not OK for it to attempt to run with the requested degree of parallelism and deadlock. I think if the deadlock detector gets involved it's quite unlikely that you're going to be able to make things degrade smoothly to non-parallel operation. >> I >> seriously doubt that can ever be made safe, but I'd choose to prohibit >> it using something other than the heavyweight lock manager. > > Right. It's perfectly fine to forbid it imo. But there's lots of places > that have assumptions about the locking behaviour baked in, and the > relevant bugs will be hard to find. That's pretty vague. I think we can rule out the great bulk of the problem cases by prohibiting backends from starting a DDL command while parallel mode is in effect. (We may be parallelizing a DDL command or just a regular query, but once we've initiated parallelism, no NEW DDL commands can be started.) We should focus on trying to foresee what can go wrong apart from that. That's why I think your example about smgrDoPendingDeletes() is a pretty helpful one to think about, but this one seems like something that has no chance of slipping through the cracks. >> The heavyweight locking issue is really killing me, though. > > I don't really understand why you're not content with just detecting > deadlocks for now. Everything else seems like bells and whistles for > later. I don't think it's OK for a parallel query to just randomly deadlock. It's OK for parallel query to excuse itself politely and retire from the field, but I don't think it's OK for it to say, oh, sure, I can parallelize that for you, and then fail by deadlocking with itself. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2014-11-21 08:31:01 -0500, Robert Haas wrote: > On Thu, Nov 20, 2014 at 8:31 PM, Andres Freund <andres@2ndquadrant.com> wrote: > >> I can't follow that logic. Why do you think self-exclusive locks are > >> the only ones that matter? > > > > All the others can be acquired by jumping in front of the lock's > > waitqueue? > > That's is true, as a practical matter, in many cases. But from the > point of view of the lock manager, it's really not true. It is almost > true if you assume that the lock level acquired by the parallel > backend will be weaker than the lock level held by the user backed, > but not quite, because RowExclusiveLock conflicts with ShareLock. The assumption that a parallel backend will accquire a locklevel <= the worker's lock imo is a pretty sane one - we obviously can't predict things if not. And I don't think any of the other presented approaches can do that. I'm not really bothered by RowExclusiveLock vs. ShareLock. Unless I miss something that can only be problematic if the primary acquires a ShareLock and the worker tries a RowExclusive. That just can't be sane. > The assumption that the parallel backend's lock level will always be > weaker seems shaky. It's probably true for relation locks, but if we > ever want to support parallel workers that can write any data > whatsoever, they're going to need to be able to take relation > extension locks. I don't think extension locks are a good argument here. As you noted upthread, they really need to be used consistenly across processes. And they shouldn't/wouldn't be granted by your suggested mechanism either, right? > If one backend in a parallel group attempts to > acquire the relation extension lock while another worker holds it, the > rule that all group members should be regarded as waiting for each > other will result in instantaneous deadlock. That's pretty > undesirable, and suggests to me that the whole model is defective in > some way. Shouldn't the deadlock checker actually check that all involved processes are actually waiting for a lock? It seems a bit pointless to deadlock when one node is actually progressing. That seems like it'd be an absolute PITA for anything involving system tables and such. > >> > I don't buy the plan time/execution time argument. We'll need to be able > >> > to adapt plans to the availability of bgworker slots *anyway*. Otherwise > >> > we either need to to set the number of bgworkers to "MaxConnections * > >> > MaxParallelism" or live with errors as soon as too many processes use > >> > parallelism. The former is clearly unrealistic given PG's per-backend > >> > overhead and the latter will be a *much* bigger PITA than all the > >> > deadlock scenarios talked about here. It'll constantly fail, even if > >> > everything is ok. > >> > >> I certainly think that any parallel node needs to be able to cope with > >> getting fewer workers than it would ideally like to have. But that's > >> different from saying that when our own backend takes new locks, we > >> have to invalidate plans. Those seem like largely unrelated problems. > > > > No need to invalidate them if you have the infrastructure to run with no > > parallel workers - just use the path that's used if there's no workers > > available. > > Well, it's OK for a query to run with less paralellism than the > planner thought optimal. It's not OK for it to attempt to run with > the requested degree of parallelism and deadlock. I think if the > deadlock detector gets involved it's quite unlikely that you're going > to be able to make things degrade smoothly to non-parallel operation. That was in response to your argument about plan invalidation - which I don't buy because it'd only reuse the graceful degradiation capabilities we need anyway. > >> The heavyweight locking issue is really killing me, though. > > > > I don't really understand why you're not content with just detecting > > deadlocks for now. Everything else seems like bells and whistles for > > later. > > I don't think it's OK for a parallel query to just randomly deadlock. I think that should be the end goal, yes. > It's OK for parallel query to excuse itself politely and retire from > the field, but I don't think it's OK for it to say, oh, sure, I can > parallelize that for you, and then fail by deadlocking with itself. But I also think that right now it doesn't necessarily need to be the focus immediately. This is a topic that I think will be much easier to approach if some demonstration of actual parallel users would be available. Doesn't have to be committable or such, but I think it'll help to shape how this has to look. I think unrecognized deadlocks will make things too annoying to use, but I don't think it needs to be guaranteed deadlock free or such. Greetings, Andres Freund
On Thu, Nov 20, 2014 at 4:45 PM, Andres Freund <andres@2ndquadrant.com> wrote: > How about a frontend process having created a relation that then starts > a parallel query. Then the frontend process ERRORs out and, in the > course of that, does smgrDoPendingDeletes(). Which will delete the new > relation. Boom. The background process might just be accessing it. If > you think thats harmless, think e.g. what'd happen with heap cleanup > records generated in the background process. They'd not replay. I spent some time thinking about this case. I don't think this is really principally a locking issue; I think it it's really a timing issue around transaction abort. I believe that, if we wanted to be able to write any tuples at all from within a parallel worker, there are certain phases of transaction abort processing that would need to happen only once we're sure that all of the backends involved have done their local abort processing. smgrDoPendingDeletes() is one, but I think we have pretty much the same problem with RecordTransactionAbort() and ProcArrayEndTransaction(). Without some synchronization, it would be possible for a parallel backend to stamp a tuple with an XID after the abort record is already written, or after the transaction has already been removed from the ProcArray. Are we prepared to view that sort of thing as harmless? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Nov 23, 2014 at 3:59 PM, Andres Freund <andres@2ndquadrant.com> wrote: >> >> The heavyweight locking issue is really killing me, though. >> > >> > I don't really understand why you're not content with just detecting >> > deadlocks for now. Everything else seems like bells and whistles for >> > later. >> >> I don't think it's OK for a parallel query to just randomly deadlock. > > I think that should be the end goal, yes. > >> It's OK for parallel query to excuse itself politely and retire from >> the field, but I don't think it's OK for it to say, oh, sure, I can >> parallelize that for you, and then fail by deadlocking with itself. > > But I also think that right now it doesn't necessarily need to be the > focus immediately. I am in strong agreement with Robert that unprincipled deadlocks are not okay. Now, what that actually means in practice might be kind of hard to delineate. Take my ON CONFLICT UPDATE patch. The locking there is optimistic, and so in theory there are risks of lock starvation [1]. There is no axiomatic reason why a single backend cannot be starved of a lock in the event of a perfect storm of conflicts. Similarly, the Lehman and Yao paper makes what can only be called a plausibility argument around how livelocks are not a concern in practice [2] when an indefinite number of "move right" operations are required after a page split; having written a bunch of theorems around the correctness of other aspects of their algorithm, the argument here is more or less: "Come on, that'll never happen!". Concurrency is hard. If you want another example of this kind of thing, then there's how Oracle deals with READ COMMITTED row conflicts -- the statement is rolled back and restarted. Who is to say with certainty that it'll make any progress the second time? Or the third or forth? Note that the optimistic locking stuff was actually my solution to the "unprincipled deadlocks" problem, a solution that, as I say, also implies the theoretical risk of lock starvation (but not livelock). I think I would also have found it acceptable if there was a solution that reduced the original risk, the risk of *deadlocks* to merely a theoretical one, as opposed to a very real one (that could be shown with a simple testcase). > This is a topic that I think will be much easier to > approach if some demonstration of actual parallel users would be > available. Doesn't have to be committable or such, but I think it'll > help to shape how this has to look. I think unrecognized deadlocks will > make things too annoying to use, but I don't think it needs to be > guaranteed deadlock free or such. I couldn't agree more. The greater point about livelock, which I believe applies here as well, is that there are different standards by which we might consider that unprincipled deadlocks won't happen. Basically, it seems inevitable that a practical standard sometimes must be applied. If a well informed person, say myself or Andres, cannot make any given practical implementation deadlock (or alternatively be starved for locks, if we're talking about an "optimistic locking" work-around to unprincipled deadlocks that introduces that risk), then I guess that means that there are no unprincipled deadlocks/livelocks, and I for one will probably find that acceptable given enough testing. I realize that that might sound terribly loose (I probably forgot a few caveats), but what it boils down to is that we *don't* have the tools to analyze certain types of problems like this, or at least I don't think we do, because no one does. The closest thing we have is some well written custom stress testing, and lots of scrutiny. Of course I'd prefer a solution that is provably correct from first principles (in other words, I'd prefer to use algorithms that are in some formal sense non-blocking [3] or something along those lines), but that will often be cost prohibitive or otherwise illusive when designing concurrent algorithms. I think I agree with you both. [1] https://wiki.postgresql.org/wiki/UPSERT#Theoretical_lock_starvation_hazards [2] http://www.cs.cornell.edu/courses/cs4411/2009sp/blink.pdf , Section 6.4 Livelock [3] https://en.wikipedia.org/wiki/Non-blocking_algorithm -- Peter Geoghegan
On Thu, Nov 20, 2014 at 09:15:51PM +0100, Andres Freund wrote: > On 2014-11-20 13:57:25 -0500, Robert Haas wrote: > > On Thu, Nov 20, 2014 at 12:19 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > If the > > deadlock detector would kill the processes anyway, it doesn't matter, > > because CheckTableNotInUse() should do it first, so that we get a > > better error and without waiting for deadlock_timeout. So any > > scenario that's predicated on the assumption that CheckTableNotInUse() > > will succeed in a parallel context is 100% unconvincing to me as an > > argument for anything. > > Meh, there's plenty cases where CheckTableNotInUse() isn't used where we > still rely on locks actually working. And it's not just postgres code, > this *will* break user code. Today, holding a relation AccessExclusiveLock ensures no other process is using the table. That is one of two prerequisites for making an alteration that threatens concurrent users of the table. The other prerequisite is verifying that no other subsystem of the current process is using the table, hence CheckTableNotInUse(). Robert's designs do soften AccessExclusiveLock for relations; holding it will not preclude use of the table in workers of the same parallel group. Pessimizing CheckTableNotInUse() compensates. In every scenario where AccessExclusiveLock is softer than before, CheckTableNotInUse() will fail unconditionally. Code that, today, performs the correct checks before making a threatening table alteration will remain correct. ANALYZE and lazy VACUUM are the only core operations that take self-exclusive relation locks and skip CheckTableNotInUse(). PreventTransactionChain() will block naive concurrent VACUUM. (If that weren't the case, VACUUM would need CheckTableNotInUse() even without parallelism in the picture. I wouldn't oppose adding CheckTableNotInUse() to VACUUM as defense in depth.) ANALYZE is special. It's compatible with most concurrent use of the table, so CheckTableNotInUse() is unnecessary. Blocking ANALYZE during parallelism, even if we someday allow other database writes, is a wart I can live with. Given a choice between no parallelism and parallelism when the initiating transaction holds no self-exclusive locks, I'd pick the latter. But you have overstated the significance of holding a lock and the degree to which Robert's proposals change the same. > Your argument essentially is that we don't actually need to lock objects > if the other side only reads. Yes, you add a condition or two ontop > (e.g. CheckTableNotInUse() fails), but that's not that much. If we want > to do this we really need to forbid *any* modification to catalog/user > tables while parallel operations are ongoing. In the primary and worker > backends. I think that's going to end up being much more > problematic/restrictive than not allowing non-cooperative paralellism > after >= ShareUpdateExclusive. If we ever want to parallelize writing > operations we'll regret this quite a bit. https://wiki.postgresql.org/wiki/Parallel_Sort already contemplates blocking every kind of database write, due to challenges around combo CIDs (affects UPDATE and DELETE) and invalidation messages (mostly affects DDL). I doubt we'll ever care to allow a worker to start an independent DDL operation. We might want to allow INSERT, UPDATE, and DELETE. I don't see that allowing multiple workers to hold AccessExclusiveLock will make that more difficult. Difficulties around XactLockTableWait() are more noteworthy, and they stand regardless of what we do with heavyweight locking. > The 'lets grant all the current locks at start of parallel query' > approach seems quite a bit safer than always doing that during parallel > query, don't get me wrong. Agreed. > Btw, what are your thoughts about SERIALIZABLE and parallel query? > Afaics that'd be pretty hard to support? It didn't look obviously awful, because most of the relevant data structures live in shared memory. We already have the concept of one backend adjusting the predicate locks pertaining to another backend's transaction. (That's not to say the number of SERIALIZABLE-specific changes will be zero, either.) Thanks, nm
On Wed, Nov 5, 2014 at 9:26 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Nov 2, 2014 at 7:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> The procgloballist stuff should be the subject of a separate patch >> which I agree with. > > Yes, I think that's probably a net improvement in robustness quite > apart from what we decide to do about any of the rest of this. I've > attached it here as revise-procglobal-tracking.patch and will commit > that bit if nobody objects. In reviewing this thread I realized that I never got around to committing this bit. And it still seems like a good idea, so I've done that now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company