Thread: Admission Control
Recent discussions involving the possible benefits of a connection pool for certain users has reminded me of a brief discussion at The Royal Oak last month, where I said I would post a reference a concept which might alleviate the need for external connection pools. For those interested, check out section 2.4 of this document: Joseph M. Hellerstein, Michael Stonebraker and James Hamilton. 2007. Architecture of a Database System. Foundations and Trends(R) in Databases Vol. 1, No. 2 (2007) 141*259. http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf -Kevin
On Fri, Jun 25, 2010 at 1:33 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Recent discussions involving the possible benefits of a connection > pool for certain users has reminded me of a brief discussion at The > Royal Oak last month, where I said I would post a reference a > concept which might alleviate the need for external connection > pools. For those interested, check out section 2.4 of this > document: > > Joseph M. Hellerstein, Michael Stonebraker and James Hamilton. 2007. > Architecture of a Database System. Foundations and Trends(R) in > Databases Vol. 1, No. 2 (2007) 141*259. > > http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf I think a good admission control system for memory would be huge for us. There are innumerable threads on pgsql-performance where we tell people to set work_mem to a tiny value (like 4MB or 16MB) because any higher value risks driving the machine into swap in the event that they get an unusually large number of connections or those connections issue queries that require an unusual number of hashes or sorts. There are also innumerable postings from people complaining that external sorts are way slower than in-memory sorts, and of course a hash join frequently mops the floor with a nested loop with inner index-scan. A really trivial admission control system might let you set a system-wide limit on work_mem. As we build a plan, we could estimate the total amount of work_mem it will require by examining all the sort, hash, and hash aggregate nodes it contains. In shared memory, we keep a total of this value across all back-ends. Just before beginning to execute a plan that uses >0 work_mem, we bump this value up by the value for the current plan, unless that would make us exceed the system-wide limit. In that case, we sleep, and then next person to finish executing (and decrease the value in shared memory) will wake us up to try again. (Ideally, we'd want to make maintenance_work_mem part of this accounting process also; otherwise, a system that was humming along nicely might suddenly start thrashing when a maintenance operation kicks off.) I suppose this would take a good deal of performance testing to see how well it actually works. A refinement might be to try to consider an inferior plan that uses less memory when the system is tight on memory, rather than waiting. But you'd have to be careful about that, because waiting might be better (it's worth waiting 15 s if it means the execution time will decrease by > 15 s). The idea of doling out queries to engine processes in an interesting one, but seems very different than our current query execution model. I can't even begin to speculate as to whether there's anything interesting we could do in that area without reading some literature on the topic - got any pointers? But even if we can't or don't want to do that, we already know that limiting the number of backends and round-robining queries among them performs MUCH better that setting max_connections to a large value, and not just because of memory exhaustion. Our answer to that problem is "use a connection pooler", but that's not a very good solution even aside from the possible administrative nuisance, because it only solves half the problem. In the general case, the question is not whether we can currently afford to allocate 0 or 1 backends to a given query, but whether we can afford to allocate 0, 1, or >1; furthermore, if we can afford to allocate >1 backend, we'd ideally like to reuse an existing backend rather than starting a new one. I don't think anyone's going to be too happy with a parallel query implementation with a dependency on an external connection poooler. One of the difficulties in reusing an existing backend for a new query, or in maintaining a pool of backends that could be used as workers for parallel queries, is that our current system does not require us to have, nor do we have, a way of resetting a backend to a completely pristine state. DISCARD ALL is probably pretty close, because AIUI that's what connection poolers are using today, and if it didn't work reasonably well, we'd have gotten complaints. But DISCARD ALL will not let you rebind to a different database, for example, which would be a pretty useful thing to do in a pooling environment, so that you needn't maintain separate pools for each database, and it doesn't let you restart the authentication cycle (with no backsies) either. Of course, addressing these problems wouldn't by itself give us a built-in connection pooler or parallel query execution, and there's some danger of destabilizing the code, but possibly it would be a good way to get off the ground. Not sure, though. Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> wrote: > Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >> check out section 2.4 of this > A really trivial admission control system might let you set a > system-wide limit on work_mem. Heck, I think an even *more* trivial admission control policy which limits the number of active database transactions released to execution might solve a lot of problems. Of course, what you propose is more useful, although I'd be inclined to think that we'd want an admission control layer which could be configured so support both of these and much more. Done correctly, it could almost completely eliminate the downward slope after you hit the "knee" in many performance graphs. > A refinement might be to try to consider an inferior plan that > uses less memory when the system is tight on memory, rather than > waiting. I wouldn't try messing with that until we have the basics down. ;-) It is within the scope of what an execution admission controller is intended to be able to do, though. > The idea of doling out queries to engine processes in an > interesting one, but seems very different than our current query > execution model. That wasn't in section 2.4 itself -- you must have read the whole chapter. I think any discussion of that should spin off a separate thread -- the techniques are really orthogonal. And frankly, that's more ambitious a topic than *I'm* inclined to want to get into at the moment. An "execution admission controller" that starts simple but leaves room for growth seems within the realm of possibility. -Kevin
On 6/25/10 12:15 PM, Robert Haas wrote: > I think a good admission control system for memory would be huge for > us. There are innumerable threads on pgsql-performance where we tell > people to set work_mem to a tiny value (like 4MB or 16MB) because any > higher value risks driving the machine into swap in the event that > they get an unusually large number of connections or those connections > issue queries that require an unusual number of hashes or sorts. Greenplum did this several years ago with the Bizgres project; it had a resource control manager which was made available for PostgreSQL core. However, it would have required a large and unpredictable amount of work to make it compatible with OLTP workloads. The problem with centralized resource control is the need for centralized locking on requests for resources. That forces transactions to be serialized in order to make sure resources are not double-allocated. This isn't much of a problem in a DW application, but in a web app with thousands of queries per second it's deadly. Performance engineering for PostgreSQL over the last 7 years has been partly about eliminating centralized locking; we don't want to add new locking. That means that a realistic admissions control mechanism would need to operate based on projections and estimates and "best effort" policies. Not only is this mathematically more complex, it's an open question whether it puts us ahead of where we are now vis-a-vis underallocation of memory. Realistically, a lot of tuning and testing would be required before such a tool was actually an improvement. Or, to put it another way: the "poor man's admission control" is a waste of time because it doesn't actually help performance. We're basically facing doing the hard version, or not bothering. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> wrote: > Greenplum did this several years ago with the Bizgres project > However, it [was not] compatible with OLTP workloads. > the "poor man's admission control" is a waste of time because it > doesn't actually help performance. We're basically facing doing > the hard version, or not bothering. I think it's premature to assume that without any evidence. I'm sure it's possible to create a policy which does more harm than good for any particular workload; there's no denying that could happen, but things such as limiting open transactions (as just one example) might be accomplished at very low cost. Since I have seen dramatic performance improvements from restricting this through a connection pool, I'm inclined to believe there could be benefit from such a simple policy as this. The total work memory policy Robert proposed sounds likely to more than pay for itself by allowing larger work_mem settings without risking cache flushing or swapping. One thing that seems clear to me is that the admission policy should be configurable, so that it can be tuned base on workload. That would also be consistent with a "start simple and expand the capabilities" approach. C'mon, don't be such a buzz-kill. Why should Greenplum have all the fun? ;-) -Kevin
On Fri, Jun 25, 2010 at 4:10 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 6/25/10 12:15 PM, Robert Haas wrote: >> I think a good admission control system for memory would be huge for >> us. There are innumerable threads on pgsql-performance where we tell >> people to set work_mem to a tiny value (like 4MB or 16MB) because any >> higher value risks driving the machine into swap in the event that >> they get an unusually large number of connections or those connections >> issue queries that require an unusual number of hashes or sorts. > > Greenplum did this several years ago with the Bizgres project; it had a > resource control manager which was made available for PostgreSQL core. > However, it would have required a large and unpredictable amount of work > to make it compatible with OLTP workloads. > > The problem with centralized resource control is the need for > centralized locking on requests for resources. That forces transactions > to be serialized in order to make sure resources are not > double-allocated. This isn't much of a problem in a DW application, but > in a web app with thousands of queries per second it's deadly. > Performance engineering for PostgreSQL over the last 7 years has been > partly about eliminating centralized locking; we don't want to add new > locking. I haven't seen the Greenplum code - how did it actually work? The mechanism I just proposed would (except in the case of an overloaded system) only require holding a lock for long enough to test and update a single integer in shared memory, which doesn't seem like it would cause a serious serialization problem. I might be missing something, or it might suck for lots of other reasons, but if we already know that then let's try to be more specific about what the problems are. > That means that a realistic admissions control mechanism would need to > operate based on projections and estimates and "best effort" policies. > Not only is this mathematically more complex, it's an open question > whether it puts us ahead of where we are now vis-a-vis underallocation > of memory. Realistically, a lot of tuning and testing would be required > before such a tool was actually an improvement. Before today, that's the only approach I'd ever considered, but this article made me rethink that. If you have a stream of queries that can be run quickly with 1GB of memory and much more slowly with any lesser amount, the only sensible thing to do is wait until there's a GB of memory available for you to grab. What projection or estimate of "best effort" would arrive at even approximately the same result? > Or, to put it another way: the "poor man's admission control" is a waste > of time because it doesn't actually help performance. We're basically > facing doing the hard version, or not bothering. I think it's an oversimplification to group all approaches as "easy" and "hard", and even more of an oversimplification to say that all of the easy ones suck. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Fri, Jun 25, 2010 at 3:52 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Heck, I think an even *more* trivial admission control policy which > limits the number of active database transactions released to > execution might solve a lot of problems. That wouldn't have any benefit over what you can already do with a connection pooler, though, I think. In fact, it would probably be strictly worse, since enlarging the number of backends slows the system down even if they aren't actually doing anything much. > Of course, what you > propose is more useful, although I'd be inclined to think that we'd > want an admission control layer which could be configured so support > both of these and much more. Done correctly, it could almost > completely eliminate the downward slope after you hit the "knee" in > many performance graphs. And world peace! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Robert Haas <robertmhaas@gmail.com> wrote: > Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >> Heck, I think an even *more* trivial admission control policy >> which limits the number of active database transactions released >> to execution might solve a lot of problems. > > That wouldn't have any benefit over what you can already do with a > connection pooler, though, I think. In fact, it would probably be > strictly worse, since enlarging the number of backends slows the > system down even if they aren't actually doing anything much. Agreed -- *if* the connection pool is on another computer. Otherwise it'll probably consume about he same resources on the same machine as what I'm proposing, only with more setup and configuration involved. Until we build a connection pooler into the base product, though, you know what an uphill battle it is to get people to control contention that way. We can usually get someone to tweak a GUC when they report a problem, and maybe the tuning tools could start to take core count and effective spindle count into consideration and suggest a good setting for this, if we had it. With the right connection pooler built in to core, though, this one could go right out the window, and they could tune at that layer instead. [thinks] Actually, the transaction count limit doesn't need the planner to run first, so it could be considered part of the first-tier admission control. Essentially, it *would be* a simple but effective built-in connection pooler. I still think an execution admission controller would be worthwhile, but the policy I proposed doesn't belong there; it might be the subject of a pretty simple patch which might solve a lot of performance problems. Gotta sleep on that.... -Kevin
On Fri, Jun 25, 2010 at 03:15:59PM -0400, Robert Haas wrote: > A > refinement might be to try to consider an inferior plan that uses less > memory when the system is tight on memory, rather than waiting. But > you'd have to be careful about that, because waiting might be better > (it's worth waiting 15 s if it means the execution time will decrease > by > 15 s). I think you could go a long way by doing something much simpler. We already generate multiple plans and compare costs, why not just include memory usage as a cost? If you start doing accounting for memory across the cluster you can assign a "cost" to memory. When there are only a few processes running it's cheap and you get plans like now. But as the total memory usage increases you increase the "cost" of memory and there will be increased pressure to produce lower memory usage plans. I think this is better than just cutting plans out at a certain threshold since it would allow plans that *need* memory to work efficiently will still be able to. (It doesn't help in situations where you can't accurately predict memory usage, like hash tables.) Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patriotism is when love of your own people comes first; nationalism, > when hate for people other than your own comes first. > - Charles de Gaulle
On Sat, Jun 26, 2010 at 11:03 AM, Martijn van Oosterhout <kleptog@svana.org> wrote: > On Fri, Jun 25, 2010 at 03:15:59PM -0400, Robert Haas wrote: >> A >> refinement might be to try to consider an inferior plan that uses less >> memory when the system is tight on memory, rather than waiting. But >> you'd have to be careful about that, because waiting might be better >> (it's worth waiting 15 s if it means the execution time will decrease >> by > 15 s). > > I think you could go a long way by doing something much simpler. We > already generate multiple plans and compare costs, why not just include > memory usage as a cost? If you start doing accounting for memory across > the cluster you can assign a "cost" to memory. When there are only a > few processes running it's cheap and you get plans like now. But as the > total memory usage increases you increase the "cost" of memory and > there will be increased pressure to produce lower memory usage plans. > > I think this is better than just cutting plans out at a certain > threshold since it would allow plans that *need* memory to work > efficiently will still be able to. That's an interesting idea. > (It doesn't help in situations where you can't accurately predict > memory usage, like hash tables.) Not sure what you mean by this part. We already predict how much memory a hash table will use. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Sat, Jun 26, 2010 at 11:37:16AM -0400, Robert Haas wrote: > On Sat, Jun 26, 2010 at 11:03 AM, Martijn van Oosterhout > > (It doesn't help in situations where you can't accurately predict > > memory usage, like hash tables.) > > Not sure what you mean by this part. We already predict how much > memory a hash table will use. By this I mean where the memory usage of the HashAggregate depends on how many groups there are, and it's sometimes very difficult to predict that beforehand. Though maybe that got fixed. Another issue is cached plans. Say there is increased memory pressure, at what point do you start replanning existing plans? While this does have the advantage of being relatively simple to implement, I think it would be a bitch to tune... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patriotism is when love of your own people comes first; nationalism, > when hate for people other than your own comes first. > - Charles de Gaulle
On Sat, Jun 26, 2010 at 11:59 AM, Martijn van Oosterhout <kleptog@svana.org> wrote: > On Sat, Jun 26, 2010 at 11:37:16AM -0400, Robert Haas wrote: >> On Sat, Jun 26, 2010 at 11:03 AM, Martijn van Oosterhout >> > (It doesn't help in situations where you can't accurately predict >> > memory usage, like hash tables.) >> >> Not sure what you mean by this part. We already predict how much >> memory a hash table will use. > > By this I mean where the memory usage of the HashAggregate depends on > how many groups there are, and it's sometimes very difficult to predict > that beforehand. Though maybe that got fixed. Oh, I see. Well, yeah, it's possible the estimates aren't that good in that case. I think it's a fairly rare query that has more than one aggregate in it, though, so you're probably OK as long as you're not TOO far off - where you can really use up a lot of memory, I think, is on a query that has lots of sorts or hash joins. > Another issue is cached plans. Say there is increased memory pressure, > at what point do you start replanning existing plans? The obvious thing to do would be to send an invalidation message whenever you changed the system-wide cost value for use of memory, but maybe if you're changing it in small increments you'd want to be a bit more selective. > While this does have the advantage of being relatively simple to > implement, I think it would be a bitch to tune... I'm not sure. What does seem clear is that it's fundamentally at odds with the "admission control" approach Kevin is advocating. When you start to run short on a resource (perhaps memory), you have to decide between (a) waiting for memory to become available and (b) switching to a more memory-efficient plan. The danger of (b) is that using less memory probably means using more of some other resource, like CPU or disk, and now you've just switched around which release you're overloading - but on the other hand, if the difference in CPU/disk is small and the memory savings is large, maybe it makes sense. Perhaps in the end we'll find we need both capabilities. I can't help feeling like some good instrumentation would be helpful in answering some of these questions, although I don't know where to put it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On Sat, Jun 26, 2010 at 01:19:57PM -0400, Robert Haas wrote: > > I'm not sure. What does seem clear is that it's fundamentally at odds > with the "admission control" approach Kevin is advocating. When you > start to run short on a resource (perhaps memory), you have to decide > between (a) waiting for memory to become available and (b) switching > to a more memory-efficient plan. The danger of (b) is that using less > memory probably means using more of some other resource, like CPU or > disk, and now you've just switched around which release you're > overloading - but on the other hand, if the difference in CPU/disk is > small and the memory savings is large, maybe it makes sense. Perhaps > in the end we'll find we need both capabilities. > > I can't help feeling like some good instrumentation would be helpful > in answering some of these questions, although I don't know where to > put it. One issue with this is that no matter how expensive you make a query, it will run - it just may take a very long time (if the cost is a reasonable estimate) This is also an implied suggestion for a dynamically self-modifying cost param, since the memory cost isn't absolute, but rather relative to free memory. In addition, as Robert points out, the tradeoff between resources is dynamic, as well. Hmm, I'm suddenly struck by the idea of having a max_cost parameter, that refuses to run (or delays?) queries that have "too high" a cost. That might have some interactive-SQL uses, as well: catch the cases you forgot a join condition, so have an unintended cartesian explosion, etc. Could also be a belt-and-suspenders last defense for DB admins who aren't sure the client software completely stops the users from doing something stupid. Clearly, default to current behavior, -1 (infinity). Ross -- Ross Reedstrom, Ph.D. reedstrm@rice.edu Systems Engineer & Admin, Research Scientist phone: 713-348-6166 The Connexions Project http://cnx.org fax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE
"Ross J. Reedstrom" <reedstrm@rice.edu> writes: > Hmm, I'm suddenly struck by the idea of having a max_cost parameter, > that refuses to run (or delays?) queries that have "too high" a cost. That's been suggested before, and shot down on the grounds that the planner's cost estimates are not trustworthy enough to rely on for purposes of outright-failing a query. If you didn't want random unexpected failures, you'd have to set the limit so much higher than your regular queries cost that it'd be pretty much useless. Maybe it'd be all right if it were just used to delay launching the query a bit, but I'm not entirely sure I see the point of that. regards, tom lane
> While this does have the advantage of being relatively simple to > implement, I think it would be a bitch to tune... Precisely. So, there's a number of issues to solve here: 1) We'd need to add accouting for total memory usage to explain plans (worth doing on its own, really, even without admission control). 2) Our memory usage estimates aren't that accurate, which would still force us to underallocate (as we do with work_mem) if we were letting queries through based on memory usage, or force us to abort queries whose actual memory usage was too far off estimated usage. 3) Due to concurrency, by the time the query executes, there will be a different set of queries executing on the server than were executing at evaluation time. This will also cause underallocation to continue. 4) Many other processes can use major hunks of memory (autovacuum, for example) and would need to be brought under centralized management if this was to be a real fix for the underallocation issue. 5) Running out of CPU is, in my experience, more common than running out of memory on modern systems. So it's questionable how useful managing overall memory usage at all would be. Having tinkered with it, I'll tell you that (2) is actually a very hard problem, so any solution we implement should delay as long as possible in implementing (2). In the case of Greenplum, what Mark did originally IIRC was to check against the global memory pool for each work_mem allocation. This often resulted in 100's of global locking checks per query ... like I said, feasible for DW, not for OLTP. The same is the case with (3). In some ways, (3) is an even harder issue because it allows any kind of admission system to be its own enemy; you can imagine "admission storms" where the server tries to serve 150 queries which have been waiting for admission at once as soon as current load drops. Given this, I think that actually solving the issue of underallocation, or even materially improving on memory underallocation compared to where we are now, is a much harder issue than a lot of people realize. However, it does point towards two areas for further work: First, (1) would be valuable on its own as part of diagnostics, logging, pg_stat_statements, etc. And seems like a good "easy" TODO. We can go back to Kevin's originally proposed simple feature: just allowing the DBA to limit the number of concurrently executing queries by role and overall. This would consist of two parameters, max_concurrent_statements and max_statement_wait; the second would say how long the connection would wait for a query slot before giving up and returning an error. This might help the overall underallocation issue, and would be far simpler than schemes involving RAM, I/O and CPU accounting. However, even this solution has two thorny issues to be solved: a) the "waiting query storm" issue mentioned above, and b) pending queries are sitting on idle connections, which could easily block higher-priority queries, so managing max_connections per role would become much more complex. Overall, it sounds like a lot of work for improving underallocation issues for users who can't make effective use of connection pooling but nevertheless have high-performance needs. Are there enough of these users to make it worthwhile? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Josh Berkus <josh@agliodbs.com> wrote: > We can go back to Kevin's originally proposed simple feature: > just allowing the DBA to limit the number of concurrently > executing queries by role and overall. Well, that's more sophisticated than what I proposed, but it's an interesting twist on it. > This would consist of two parameters, > max_concurrent_statements and max_statement_wait; the second would > say how long the connection would wait for a query slot before > giving up and returning an error. The timeout is also an embellishment to what I proposed, but another interesting idea. > even this solution has two thorny issues to be solved: > > a) the "waiting query storm" issue mentioned above I fleshed out the idea a bit more on the thread titled "Built-in connection pool", since this would effectively function in a very similar way to a connection pool. If you look at that proposal, at most one transaction would be released to execution when one transaction completed. I'm not seeing anything resembling a "storm" in that, so you must be envisioning something rather different. Care to clarify? > b) pending queries are sitting on idle connections, which could > easily block higher-priority queries, so managing max_connections > per role would become much more complex. That is a good point. The biggest difference between the functionality of the proposal on the other thread and the connection pool built in to our application framework is that the latter has a prioritized FIFO queue, with ten levels of priority. A small query which is run as you tab between controls in a GUI window runs at a much higher priority than a query which fills a list with a large number of rows. This involves both connections reserved for higher priorities and having higher priority transactions "jump ahead" of lower priority transactions. This helps the user perception of performance in the fat-client GUI applications. I suppose if we had limits by role, we could approach this level of functionality within PostgreSQL. On the other hand, our web apps run everything at the same priority, so there would effectively be *no* performance difference between what I proposed we build in to PostgreSQL and what our shop currently puts in front of PostgreSQL for a connection pool. -Kevin
On 2010-06-25 22:44, Robert Haas wrote: > On Fri, Jun 25, 2010 at 3:52 PM, Kevin Grittner > <Kevin.Grittner@wicourts.gov> wrote: > >> Heck, I think an even *more* trivial admission control policy which >> limits the number of active database transactions released to >> execution might solve a lot of problems. >> > That wouldn't have any benefit over what you can already do with a > connection pooler, though, I think. In fact, it would probably be > strictly worse, since enlarging the number of backends slows the > system down even if they aren't actually doing anything much. > Sorry if I'm asking silly questions, but how does transactions and connection pooler's interact? Say if you have 100 clients all doing "fairly inactive" database work in transactions lasting a couple of minutes at the same time. If I understand connection poolers they dont help much in those situations where an "accounting" system on "limited resources" across all backends definately would help. (yes, its a real-world application here, wether it is clever or not... ) In a fully web environment where all transaction last 0.1s .. a pooler might make fully sense (when traffic goes up). -- Jesper
Jesper Krogh <jesper@krogh.cc> wrote: > Sorry if I'm asking silly questions, but how does transactions and > connection pooler's interact? That depends a great deal on the pooler and its configuration, as well as your client architecture. Our shop gathers up the information needed for our database transaction and submits it to a server application which has all the logic needed to use that data to apply the transaction. We determined long ago that it is a Very Bad Idea for us to have an open database transaction which is waiting for a user to do something before it can proceed. > Say if you have 100 clients all doing "fairly inactive" database > work in transactions lasting a couple of minutes at the same time. I'm not sure what you mean by that. If you mean that part of a database transaction hits the database, and then it takes a while for the rest of the statements for the transaction to be generated (based on network latency or waits for user input), then it is hard to see how you can effectively use a connection pool. I have yet to see an environment where I think that's a good approach, but I haven't seen everything yet. :-) On the other hand, if the issue is that 100 transactions are fired off at the same time and it takes two minutes for them all to finish, unless you have *a lot* of CPUs and spindles, that's not efficient use of your resources. A connection pool might indeed help with that; you might start getting transactions back in one second, and have them all done in a minute and a half if you used a properly configured connection pool. -Kevin
On 2010-06-28 21:24, Kevin Grittner wrote: > Jesper Krogh<jesper@krogh.cc> wrote: > > >> Sorry if I'm asking silly questions, but how does transactions and >> connection pooler's interact? >> > > That depends a great deal on the pooler and its configuration, as > well as your client architecture. Our shop gathers up the > information needed for our database transaction and submits it to a > server application which has all the logic needed to use that data > to apply the transaction. We determined long ago that it is a Very > Bad Idea for us to have an open database transaction which is > waiting for a user to do something before it can proceed. > The situation is more: 1) Grab a bunch of data (using pg_try_advisory_lock() to lock out other processes from grabbing the same). 2) Process the data (in external software). 3) Push results back into the database, including a flag telling that the data has been processed. 4) Release advisory locks. Step 2 takes somewhere between a couple of seconds to a couple of minutes depending on the task to be done. It might not be "optimal" but it is extremely robust and simple to wrap 1 to 4 within a BEGIN / COMMIT block. On the application side is really nice not having to deal with "partly processed" data in the database, which I can get around with by just keeping the transaction open. From my POV, a connection pooler doesn't buy anything, and I cannot stop all processes from executing at the same time, allthough it "most likely" will not happen. There is no "wait for user" involved. And that means somewhere in the 100+ backends, allthough they are "mostly" idle, seen from a database perspective. I have not hit any issues with the work_mem being too high, but I'm absolutely sure that I could flood the system if they happened to be working at the same time. Jesper -- Jesper
Jesper Krogh <jesper@krogh.cc> wrote: > I have not hit any issues with the work_mem being too high, but > I'm absolutely sure that I could flood the system if they happened > to be working at the same time. OK, now that I understand your workload, I agree that a connection pool at the transaction level won't do you much good. Something which limited the concurrent *query* count, or an execution admission controller based on resource usage, could save you from occasional random incidents of resource over-usage. -Kevin
On 29/06/10 04:48, Tom Lane wrote: > "Ross J. Reedstrom"<reedstrm@rice.edu> writes: > >> Hmm, I'm suddenly struck by the idea of having a max_cost parameter, >> that refuses to run (or delays?) queries that have "too high" a cost. >> > That's been suggested before, and shot down on the grounds that the > planner's cost estimates are not trustworthy enough to rely on for > purposes of outright-failing a query. If you didn't want random > unexpected failures, you'd have to set the limit so much higher than > your regular queries cost that it'd be pretty much useless. > > I wrote something along the lines of this for Greenplum (is probably still available in the Bizgres cvs). Yes, cost is not an ideal metric to use for bounding workload (but was perhaps better than nothing at all in the case it was intended for). One difficulty with looking at things from the statement cost point of view is that all the requisite locks are already taken by the time you have a plan - so if you delay execution, these are still held, so deadlock likelihood is increased (unless you release locks for waiters, and retry for them later - but possibly need to restart executor from scratch to cope with possible table or schema changes). > Maybe it'd be all right if it were just used to delay launching the > query a bit, but I'm not entirely sure I see the point of that. > I recall handling this by having a configurable option to let these queries run if nothing else was. Clearly to have this option on you would have to be confident that no single query could bring the system down. Cheers Mark
On Fri, 2010-06-25 at 13:10 -0700, Josh Berkus wrote: > The problem with centralized resource control We should talk about the problem of lack of centralized resource control as well, to balance. Another well observed problem is that work_mem is user settable, so many programs acting together with high work_mem can cause swapping. The reality is that inefficient resource control leads to about 50% resource wastage. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Development, 24x7 Support, Training and Services
On 29/06/10 05:36, Josh Berkus wrote: > > Having tinkered with it, I'll tell you that (2) is actually a very > hard problem, so any solution we implement should delay as long as > possible in implementing (2). In the case of Greenplum, what Mark did > originally IIRC was to check against the global memory pool for each > work_mem allocation. This often resulted in 100's of global locking > checks per query ... like I said, feasible for DW, not for OLTP. Actually only 1 lock check per query, but certainly extra processing and data structures to maintain the pool information... so, yes certainly much more suitable for DW (AFAIK we never attempted to measure the additional overhead for non DW workload). Cheers Mark
Simon, Mark, > Actually only 1 lock check per query, but certainly extra processing and > data structures to maintain the pool information... so, yes certainly > much more suitable for DW (AFAIK we never attempted to measure the > additional overhead for non DW workload). I recall testing it when the patch was submitted for 8.2., and the overhead was substantial in the worst case ... like 30% for an in-memory one-liner workload. I've been going over the greenplum docs and it looks like the attempt to ration work_mem was dropped. At this point, Greenplum 3.3 only rations by # of concurrent queries and total cost. I know that work_mem rationing was in the original plans; what made that unworkable? My argument in general is that in the general case ... where you can't count on a majority of long-running queries ... any kind of admission control or resource management is a hard problem (if it weren't, Oracle would have had it before 11). I think that we'll need to tackle it, but I don't expect the first patches we make to be even remotely usable. It's definitely not an SOC project. I should write more about this. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
On 09/07/10 05:10, Josh Berkus wrote: > Simon, Mark, > >> Actually only 1 lock check per query, but certainly extra processing and >> data structures to maintain the pool information... so, yes certainly >> much more suitable for DW (AFAIK we never attempted to measure the >> additional overhead for non DW workload). > > I recall testing it when the patch was submitted for 8.2., and the > overhead was substantial in the worst case ... like 30% for an > in-memory one-liner workload. > Interesting - quite high! However I recall you tested the initial committed version, later additions dramatically reduced the overhead (what is in the Bizgres repo *now* is the latest). > I've been going over the greenplum docs and it looks like the attempt > to ration work_mem was dropped. At this point, Greenplum 3.3 only > rations by # of concurrent queries and total cost. I know that > work_mem rationing was in the original plans; what made that unworkable? > That certainly was my understanding too. I left Greenplum about the time this was being discussed, and I think the other staff member involved with the design left soon afterwards as well, which might have been a factor! > My argument in general is that in the general case ... where you can't > count on a majority of long-running queries ... any kind of admission > control or resource management is a hard problem (if it weren't, > Oracle would have had it before 11). I think that we'll need to > tackle it, but I don't expect the first patches we make to be even > remotely usable. It's definitely not an SOC project. > > I should write more about this. > +1 Cheers Mark
On 09/07/10 12:58, Mark Kirkwood wrote: > On 09/07/10 05:10, Josh Berkus wrote: >> Simon, Mark, >> >>> Actually only 1 lock check per query, but certainly extra processing >>> and >>> data structures to maintain the pool information... so, yes certainly >>> much more suitable for DW (AFAIK we never attempted to measure the >>> additional overhead for non DW workload). >> >> I recall testing it when the patch was submitted for 8.2., and the >> overhead was substantial in the worst case ... like 30% for an >> in-memory one-liner workload. >> > > Interesting - quite high! However I recall you tested the initial > committed version, later additions dramatically reduced the overhead > (what is in the Bizgres repo *now* is the latest). Purely out of interest, since the old repo is still there, I had a quick look at measuring the overhead, using 8.4's pgbench to run two custom scripts: one consisting of a single 'SELECT 1', the other having 100 'SELECT 1' - the latter being probably the worst case scenario. Running 1,2,4,8 clients and 1000-10000 tramsactions gives an overhead in the 5-8% range [1] (i.e transactions/s decrease by this amount with the scheduler turned on [2]). While a lot better than 30% (!) it is certainly higher than we'd like. Cheers Mark [1] I got the same range for pgbench select-only using its usual workload [2] As compared to Bizgres(8.2.4) and also standard Postgres 8.2.12.
On Thu, Jul 8, 2010 at 10:21 PM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote: > Purely out of interest, since the old repo is still there, I had a quick > look at measuring the overhead, using 8.4's pgbench to run two custom > scripts: one consisting of a single 'SELECT 1', the other having 100 'SELECT > 1' - the latter being probably the worst case scenario. Running 1,2,4,8 > clients and 1000-10000 tramsactions gives an overhead in the 5-8% range [1] > (i.e transactions/s decrease by this amount with the scheduler turned on > [2]). While a lot better than 30% (!) it is certainly higher than we'd like. Isn't the point here to INCREASE throughput? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On 09/07/10 14:26, Robert Haas wrote: > On Thu, Jul 8, 2010 at 10:21 PM, Mark Kirkwood > <mark.kirkwood@catalyst.net.nz> wrote: > >> Purely out of interest, since the old repo is still there, I had a quick >> look at measuring the overhead, using 8.4's pgbench to run two custom >> scripts: one consisting of a single 'SELECT 1', the other having 100 'SELECT >> 1' - the latter being probably the worst case scenario. Running 1,2,4,8 >> clients and 1000-10000 tramsactions gives an overhead in the 5-8% range [1] >> (i.e transactions/s decrease by this amount with the scheduler turned on >> [2]). While a lot better than 30% (!) it is certainly higher than we'd like. >> > Isn't the point here to INCREASE throughput? > > LOL - yes it is! Josh wanted to know what the overhead was for the queue machinery itself, so I'm running a test to show that (i.e so I have a queue with the thresholds set higher than the test will load them). In the situation where (say) 11 concurrent queries of a certain type make your system become unusable, but 10 are fine, then constraining it to have a max of 10 will tend to improve throughput. By how much is hard to say, for instance preventing the Linux OOM killer shutting postgres down would be infinite I guess :-) Cheers Mark
On Thu, Jul 8, 2010 at 11:00 PM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote: > On 09/07/10 14:26, Robert Haas wrote: >> >> On Thu, Jul 8, 2010 at 10:21 PM, Mark Kirkwood >> <mark.kirkwood@catalyst.net.nz> wrote: >> >>> >>> Purely out of interest, since the old repo is still there, I had a quick >>> look at measuring the overhead, using 8.4's pgbench to run two custom >>> scripts: one consisting of a single 'SELECT 1', the other having 100 >>> 'SELECT >>> 1' - the latter being probably the worst case scenario. Running 1,2,4,8 >>> clients and 1000-10000 tramsactions gives an overhead in the 5-8% range >>> [1] >>> (i.e transactions/s decrease by this amount with the scheduler turned on >>> [2]). While a lot better than 30% (!) it is certainly higher than we'd >>> like. >>> >> >> Isn't the point here to INCREASE throughput? >> >> > > LOL - yes it is! Josh wanted to know what the overhead was for the queue > machinery itself, so I'm running a test to show that (i.e so I have a queue > with the thresholds set higher than the test will load them). > > In the situation where (say) 11 concurrent queries of a certain type make > your system become unusable, but 10 are fine, then constraining it to have a > max of 10 will tend to improve throughput. By how much is hard to say, for > instance preventing the Linux OOM killer shutting postgres down would be > infinite I guess :-) Hmm. Well, those numbers seem awfully high, for what you're doing, then. An admission control mechanism that's just letting everything in shouldn't knock 5% off performance (let alone 30%). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
On 09/07/10 15:57, Robert Haas wrote: > Hmm. Well, those numbers seem awfully high, for what you're doing, > then. An admission control mechanism that's just letting everything > in shouldn't knock 5% off performance (let alone 30%). > > Yeah it does, on the other hand both Josh and I were trying to elicit the worst case overhead. Cheers Mark
On Fri, Jul 9, 2010 at 12:03 AM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote: > On 09/07/10 15:57, Robert Haas wrote: >> >> Hmm. Well, those numbers seem awfully high, for what you're doing, >> then. An admission control mechanism that's just letting everything >> in shouldn't knock 5% off performance (let alone 30%). > > Yeah it does, on the other hand both Josh and I were trying to elicit the > worst case overhead. Even so... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote: > Purely out of interest, since the old repo is still there, I had a > quick look at measuring the overhead, using 8.4's pgbench to run > two custom scripts: one consisting of a single 'SELECT 1', the > other having 100 'SELECT 1' - the latter being probably the worst > case scenario. Running 1,2,4,8 clients and 1000-10000 transactions > gives an overhead in the 5-8% range [1] (i.e transactions/s > decrease by this amount with the scheduler turned on [2]). While a > lot better than 30% (!) it is certainly higher than we'd like. Hmmm... In my first benchmarks of the serializable patch I was likewise stressing a RAM-only run to see how much overhead was added to a very small database transaction, and wound up with about 8%. By profiling where the time was going with and without the patch, I narrowed it down to lock contention. Reworking my LW locking strategy brought it down to 1.8%. I'd bet there's room for similar improvement in the "active transaction" limit you describe. In fact, if you could bring the code inside blocks of code already covered by locks, I would think you could get it down to where it would be hard to find in the noise. -Kevin
On 10/07/10 03:54, Kevin Grittner wrote: > Mark Kirkwood<mark.kirkwood@catalyst.net.nz> wrote: > > >> Purely out of interest, since the old repo is still there, I had a >> quick look at measuring the overhead, using 8.4's pgbench to run >> two custom scripts: one consisting of a single 'SELECT 1', the >> other having 100 'SELECT 1' - the latter being probably the worst >> case scenario. Running 1,2,4,8 clients and 1000-10000 transactions >> gives an overhead in the 5-8% range [1] (i.e transactions/s >> decrease by this amount with the scheduler turned on [2]). While a >> lot better than 30% (!) it is certainly higher than we'd like. >> > > Hmmm... In my first benchmarks of the serializable patch I was > likewise stressing a RAM-only run to see how much overhead was added > to a very small database transaction, and wound up with about 8%. > By profiling where the time was going with and without the patch, > I narrowed it down to lock contention. Reworking my LW locking > strategy brought it down to 1.8%. I'd bet there's room for similar > improvement in the "active transaction" limit you describe. In fact, > if you could bring the code inside blocks of code already covered by > locks, I would think you could get it down to where it would be hard > to find in the noise. > > Yeah, excellent suggestion - I suspect there is room for considerable optimization along the lines you suggest... at the time the focus was heavily biased toward a purely DW workload where the overhead vanished against large plan and execute times, but this could be revisited. Having said that I suspect a re-architect is needed for a more wideranging solution suitable for Postgres (as opposed to Bizgres or Greenplum) Cheers Mark