Thread: Mini improvement: statement_cost_limit
hello ... i picked up csaba nagy's idea and implemented a very simple yet very useful extension. i introduced a GUC called statement_cost_limit which can be used to error out if a statement is expected to be too expensive. the advantage over statement_timeout is that we are actually able to error out before spending many seconds which is killed by statement_timeout anyway. best regards, hans -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: www.postgresql-support.de
Attachment
Hans-Jürgen Schönig wrote: > hello ... > > i picked up csaba nagy's idea and implemented a very simple yet very > useful extension. > i introduced a GUC called statement_cost_limit which can be used to > error out if a statement is expected to be too expensive. > the advantage over statement_timeout is that we are actually able to > error out before spending many seconds which is killed by > statement_timeout anyway. > > > You clearly have far more faith in the cost estimates than I do. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Hans-J�rgen Sch�nig wrote: >> i introduced a GUC called statement_cost_limit which can be used to >> error out if a statement is expected to be too expensive. > You clearly have far more faith in the cost estimates than I do. Wasn't this exact proposal discussed and rejected awhile back? regards, tom lane
On Aug 2, 2008, at 8:38 PM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> Hans-Jürgen Schönig wrote: >>> i introduced a GUC called statement_cost_limit which can be used to >>> error out if a statement is expected to be too expensive. > >> You clearly have far more faith in the cost estimates than I do. > > Wasn't this exact proposal discussed and rejected awhile back? > > regards, tom lane > i don't remember precisely. i have seen it on simon's wiki page and it is something which would have been useful in some cases in the past. many thanks, hans -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: www.postgresql-support.de
On Sat, Aug 02, 2008 at 09:30:08PM +0200, Hans-Jürgen Schönig wrote: > On Aug 2, 2008, at 8:38 PM, Tom Lane wrote: > > >Andrew Dunstan <andrew@dunslane.net> writes: > >>Hans-Jürgen Schönig wrote: > >>>i introduced a GUC called statement_cost_limit which can be used to > >>>error out if a statement is expected to be too expensive. > > > >>You clearly have far more faith in the cost estimates than I do. > > > >Wasn't this exact proposal discussed and rejected awhile back? > > > > regards, tom lane > > > > > i don't remember precisely. > i have seen it on simon's wiki page and it is something which would > have been useful in some cases in the past. I think a variation on this could be very useful in development and test environments. Suppose it raised a warning or notice if the cost was over the limit. Then one could set a limit of a few million on the development and test servers and developers would at least have a clue that they needed to look at explain for that query. As it is now, one can exhort them to run explain, but it has no effect. Instead we later see queries killed by a 24 hour timeout with estimated costs ranging from "until they unplug the machine and dump it" to "until the sun turns into a red giant". -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.
On Sun, 2008-08-03 at 00:44 -0700, daveg wrote: > On Sat, Aug 02, 2008 at 09:30:08PM +0200, Hans-Jürgen Schönig wrote: > > On Aug 2, 2008, at 8:38 PM, Tom Lane wrote: > > > > >Andrew Dunstan <andrew@dunslane.net> writes: > > >>Hans-Jürgen Schönig wrote: > > >>>i introduced a GUC called statement_cost_limit which can be used to > > >>>error out if a statement is expected to be too expensive. > > > > > >>You clearly have far more faith in the cost estimates than I do. > > > > > >Wasn't this exact proposal discussed and rejected awhile back? > > > > > i don't remember precisely. > > i have seen it on simon's wiki page and it is something which would > > have been useful in some cases in the past. I still support it. Regrettably, many SQL developers introduce product joins and other unintentional errors. Why let problem queries through? Security-wise they're great Denial of Service attacks, bringing the server to its knees better than most ways I know, in conjunction with a nice hefty work_mem setting. 27 table product joins: memory, CPU, I/O and diskspace resources used all in a simple killer query. If anybody thinks costs are inaccurate, don't use it. Or better still improve the cost models. It isn't any harder or easier to find a useful value than it is to use statement_timeout. What's the difference between picking an arbitrary time and an arbitrary cost? You need to alter the value according to people's complaints in both cases. > I think a variation on this could be very useful in development and test > environments. Suppose it raised a warning or notice if the cost was over > the limit. Then one could set a limit of a few million on the development > and test servers and developers would at least have a clue that they needed > to look at explain for that query. As it is now, one can exhort them to > run explain, but it has no effect. Instead we later see queries killed > by a 24 hour timeout with estimated costs ranging from "until they unplug > the machine and dump it" to "until the sun turns into a red giant". Great argument. So that's 4 in favour at least. A compromise would be to have log_min_statement_cost (or warn_min_statement_cost) which will at least help find these problems in testing before we put things live, but that still won't help with production issues. Another alternative would be to have a plugin that can examine the plan immediately after planner executes, so you can implement this yourself, plus some other possibilities. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
hello ... > > I still support it. Regrettably, many SQL developers introduce product > joins and other unintentional errors. Why let problem queries through? i think the killer is that we don't have to wait until the query dies with a statement_timeout. it is ways more elegant to kill things before they have already eaten too many cycles. one thing which is important as well: statement_cost_limit does not kill queries which have just been waiting for a lock. this makes things slightly more predictable. > Security-wise they're great Denial of Service attacks, bringing the > server to its knees better than most ways I know, in conjunction > with a > nice hefty work_mem setting. 27 table product joins: memory, CPU, I/O > and diskspace resources used all in a simple killer query. > i am not too concerned about DNS, i have to admit. i would rather see it as a way to make developers do better things. > If anybody thinks costs are inaccurate, don't use it. Or better still > improve the cost models. It isn't any harder or easier to find a > useful > value than it is to use statement_timeout. What's the difference > between > picking an arbitrary time and an arbitrary cost? You need to alter the > value according to people's complaints in both cases. the cost model is good enough to see if something is good or bad. this is basically all we want to do here --- killing all evil. > *snip* > > > A compromise would be to have log_min_statement_cost (or > warn_min_statement_cost) which will at least help find these > problems in > testing before we put things live, but that still won't help with > production issues. > definitely. a good idea as well - but people will hardly read it, i guess :(. > Another alternative would be to have a plugin that can examine the > plan > immediately after planner executes, so you can implement this > yourself, > plus some other possibilities. > this would be really fancy. how could a plugin like that look like? hans -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: www.postgresql-support.de
On Sun, 2008-08-03 at 22:09 +0200, Hans-Jürgen Schönig wrote: > > Another alternative would be to have a plugin that can examine the > > plan > > immediately after planner executes, so you can implement this > > yourself, > > plus some other possibilities. > > > this would be really fancy. > how could a plugin like that look like? Hmm...thinks: exactly like the existing planner_hook(). So, rewrite this as a planner hook and submit as a contrib module. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Tom, > Wasn't this exact proposal discussed and rejected awhile back? We rejected Greenplum's much more invasive resource manager, because it created a large performance penalty on small queries whether or not it was turned on. However, I don't remember any rejection of an idea as simple as a cost limit rejection. This would, IMHO, be very useful for production instances of PostgreSQL. The penalty for mis-rejection of a poorly costed query is much lower than the penalty for having a bad query eat all your CPU. -- --Josh Josh Berkus PostgreSQL San Francisco
On Sunday 03 August 2008 15:12:22 Simon Riggs wrote: > On Sun, 2008-08-03 at 00:44 -0700, daveg wrote: > > On Sat, Aug 02, 2008 at 09:30:08PM +0200, Hans-Jürgen Schönig wrote: > > > On Aug 2, 2008, at 8:38 PM, Tom Lane wrote: > > > >Andrew Dunstan <andrew@dunslane.net> writes: > > > >>Hans-Jürgen Schönig wrote: > > > >>>i introduced a GUC called statement_cost_limit which can be used to > > > >>>error out if a statement is expected to be too expensive. > > > >> > > > >>You clearly have far more faith in the cost estimates than I do. > > > > > > > >Wasn't this exact proposal discussed and rejected awhile back? > > > > > > i don't remember precisely. > > > i have seen it on simon's wiki page and it is something which would > > > have been useful in some cases in the past. > > I still support it. Regrettably, many SQL developers introduce product > joins and other unintentional errors. Why let problem queries through? > Security-wise they're great Denial of Service attacks, bringing the > server to its knees better than most ways I know, in conjunction with a > nice hefty work_mem setting. 27 table product joins: memory, CPU, I/O > and diskspace resources used all in a simple killer query. > ISTR that what ended up killing the enthusiasm for this was that most people realized that this GUC was just a poor tool to take a stab at solving other problems (ie. rate limiting cpu for queries). > If anybody thinks costs are inaccurate, don't use it. Or better still > improve the cost models. It isn't any harder or easier to find a useful > value than it is to use statement_timeout. What's the difference between > picking an arbitrary time and an arbitrary cost? You need to alter the > value according to people's complaints in both cases. > I think the original argument for statement_timeout was that long running queries were known to cause have wrt vacuum strategies (remember, that one has been in the back end a long time). ISTR some recent threds on -hackers questioning whether statement_timeout should be eliminated itself. > > I think a variation on this could be very useful in development and test > > environments. Suppose it raised a warning or notice if the cost was over > > the limit. Then one could set a limit of a few million on the development > > and test servers and developers would at least have a clue that they > > needed to look at explain for that query. As it is now, one can exhort > > them to run explain, but it has no effect. Instead we later see queries > > killed by a 24 hour timeout with estimated costs ranging from "until they > > unplug the machine and dump it" to "until the sun turns into a red > > giant". > > Great argument. So that's 4 in favour at least. > Not such a great argument. Cost models on development servers can and often are quite different from those on production, so you might be putting an artifical limit on top of your developers. > A compromise would be to have log_min_statement_cost (or > warn_min_statement_cost) which will at least help find these problems in > testing before we put things live, but that still won't help with > production issues. > > Another alternative would be to have a plugin that can examine the plan > immediately after planner executes, so you can implement this yourself, > plus some other possibilities. > I still think it is worth revisiting what problems people are trying to solve, and see if there are better tools they can be given to solve them. Barring that, I suppose a crude solution is better than nothing, though I fear people might point at the crude solution as a good enough solution to justify not working on better solutions. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Josh Berkus wrote: > Tom, > > >> Wasn't this exact proposal discussed and rejected awhile back? >> > > We rejected Greenplum's much more invasive resource manager, because it > created a large performance penalty on small queries whether or not it was > turned on. However, I don't remember any rejection of an idea as simple > as a cost limit rejection. > > This would, IMHO, be very useful for production instances of PostgreSQL. > The penalty for mis-rejection of a poorly costed query is much lower than > the penalty for having a bad query eat all your CPU. > > Greenplum's introduced a way to creating a cost "threshold" a bit like the way Simon was going to do "shared" work_mem. It did 2 things: 1/ Counted the cost of an about-to-be run query against the threshold, and made the query wait if it would exhaust it 2/ Aborted the query if its cost was greater than the threshold Initially there was quite a noticeable performance penalty with it enabled - but as the guy working on it (me) redid bits and pieces then penalty decreased massively. Note that in all cases, disabling the feature meant there was no penalty. The latest variant of the code is around in the Bizgres repository (src/backend/utils/resscheduler I think) - some bits might be worth looking at! Best wishes Mark P.s : I'm not working for Greenplum now.
On Sun, 2008-08-03 at 22:57 -0400, Robert Treat wrote: > I still think it is worth revisiting what problems people are trying > to solve, and see if there are better tools they can be given to solve > them. Barring that, I suppose a crude solution is better than > nothing, though I fear people might point at the crude solution as a > good enough solution to justify not working on better solutions. I advocate solutions to the problems of users I've worked with. My preference is to help people in the next release, then improve from there. We need to work with what we have. In this case, an existing solution has been found. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Sun, Aug 03, 2008 at 10:57:55PM -0400, Robert Treat wrote: > > ISTR that what ended up killing the enthusiasm for this was that most people > realized that this GUC was just a poor tool to take a stab at solving other > problems (ie. rate limiting cpu for queries). I'm not concerned with that, I want developers to have feed back on costs in a way that is obvious. > > > I think a variation on this could be very useful in development and test > > > environments. Suppose it raised a warning or notice if the cost was over > > > the limit. Then one could set a limit of a few million on the development > > > and test servers and developers would at least have a clue that they > > > needed to look at explain for that query. As it is now, one can exhort > > > them to run explain, but it has no effect. Instead we later see queries > > > killed by a 24 hour timeout with estimated costs ranging from "until they > > > unplug the machine and dump it" to "until the sun turns into a red > > > giant". > > > > Great argument. So that's 4 in favour at least. > > > > Not such a great argument. Cost models on development servers can and often > are quite different from those on production, so you might be putting an > artifical limit on top of your developers. We load the production dumps into our dev environment, which are the same hardware spec, so the costs should be identical. > I still think it is worth revisiting what problems people are trying to solve, > and see if there are better tools they can be given to solve them. Barring > that, I suppose a crude solution is better than nothing, though I fear people > might point at the crude solution as a good enough solution to justify not > working on better solutions. Alerting developers and QA to potentially costly queries would help solve some of the probems we are trying to solve. Better tools are welcome, an argument that the good is the enemy of the best so we should be content with nothing is not. -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.
"Josh Berkus" <josh@agliodbs.com> writes: > Tom, > >> Wasn't this exact proposal discussed and rejected awhile back? > > We rejected Greenplum's much more invasive resource manager, because it > created a large performance penalty on small queries whether or not it was > turned on. However, I don't remember any rejection of an idea as simple > as a cost limit rejection. The idea's certainly come up before. It probably received the usual non-committal cold shoulder rather than an outright "rejection". > This would, IMHO, be very useful for production instances of PostgreSQL. > The penalty for mis-rejection of a poorly costed query is much lower than > the penalty for having a bad query eat all your CPU. Well that's going to depend on the application.... But I suppose there's nothing wrong with having options which aren't always a good idea to use. The real question I guess is whether there's ever a situation where it would be a good idea to use this. I'm not 100% sure. What I would probably use myself is an option to print a warning before starting the query. That would be handy for interactive sessions so you would be able to hit C-c instead of waiting for several minutes and then wondering whether you got the query wrong. I wonder if it would be useful to have a flag on some GUC options to make them not globally settable. That is, for example, you could set enable_seqscan in an individual session but not in postgres.conf. Or perhaps again just print a warning that it's not recommended as a global configuration. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support!
On Monday 04 August 2008 03:50:40 daveg wrote: > On Sun, Aug 03, 2008 at 10:57:55PM -0400, Robert Treat wrote: > > ISTR that what ended up killing the enthusiasm for this was that most > > people realized that this GUC was just a poor tool to take a stab at > > solving other problems (ie. rate limiting cpu for queries). > > I'm not concerned with that, I want developers to have feed back on costs > in a way that is obvious. > That was one of the other use cases that was pushed forward in the past. > > > > I think a variation on this could be very useful in development and > > > > test environments. Suppose it raised a warning or notice if the cost > > > > was over the limit. Then one could set a limit of a few million on > > > > the development and test servers and developers would at least have a > > > > clue that they needed to look at explain for that query. As it is > > > > now, one can exhort them to run explain, but it has no effect. > > > > Instead we later see queries killed by a 24 hour timeout with > > > > estimated costs ranging from "until they unplug the machine and dump > > > > it" to "until the sun turns into a red giant". > > > > > > Great argument. So that's 4 in favour at least. > > > > Not such a great argument. Cost models on development servers can and > > often are quite different from those on production, so you might be > > putting an artifical limit on top of your developers. > > We load the production dumps into our dev environment, which are the same > hardware spec, so the costs should be identical. > That's great for you, I am talking in the scope of a general solution. (Note I'd also bet that even given the same hardware, different production loads can produce different relative mappings of cost vs. performance, but whatever) > > I still think it is worth revisiting what problems people are trying to > > solve, and see if there are better tools they can be given to solve them. > > Barring that, I suppose a crude solution is better than nothing, though > > I fear people might point at the crude solution as a good enough solution > > to justify not working on better solutions. > > Alerting developers and QA to potentially costly queries would help solve > some of the probems we are trying to solve. Better tools are welcome, an > argument that the good is the enemy of the best so we should be content > with nothing is not. > And you'll note, I specifically said that a crude tool is better than nothing. But your completely ignoring that a crude tool can often end-up as a foot-gun once relased into the wild. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Greg, > Well that's going to depend on the application.... But I suppose there's > nothing wrong with having options which aren't always a good idea to use. The > real question I guess is whether there's ever a situation where it would be a > good idea to use this. I'm not 100% sure. I can think of *lots*. Primarily, simple web applications, where queries are never supposed to take more than 50ms. If a query turns up with an estimated cost of 10000000000, then you know something's wrong; in the statistics if not in the query. In either case, that query has a good chance of dragging down the whole system. In such a production application, it is better to have false positives and reject otherwise-OK queries becuase their costing is wrong, than to let a single cartesian join bog down an application serving 5000 simultaneous users. Further, with a SQL error, this would allow the query rejection to be handled in a user-friendly way from the UI ("Search too complex. Try changing search terms.") rather than timing out, which is very difficult to handle well. The usefulness of this feature for interactive sessions is limited-to-nonexistant. It's for production applications. --Josh Berkus
On Mon, 4 Aug 2008, daveg wrote: > On Sun, Aug 03, 2008 at 10:57:55PM -0400, Robert Treat wrote: >> Not such a great argument. Cost models on development servers can and often >> are quite different from those on production, so you might be putting an >> artifical limit on top of your developers. > > We load the production dumps into our dev environment, which are the same > hardware spec, so the costs should be identical. Not identical, just close. ANALYZE samples data from your table randomly. The statistics used to compute the costs will therefore be slightly different on the two servers even if the data is the same. The problem of discovering one plan on production and another on development is not quite that easy to remove. Ultimately, if your developers aren't thorough enough to do thinks like look at EXPLAIN plans enough to discover things that are just bad, I just chuckle at your thinking that putting a single limiter on their bad behavior will somehow magically make that better. Anyway, if your production server is small enough that you can afford to have another one just like it for the developers to work on, that's great. Robert's point is that many installs don't work like that. The development teams in lots of places only get a subset of the production data because it's too large to deploy on anything but a big server, which often is hard to cost justify buying just for development purposes. I like the concept of a cost limit, but I'm a bit horrified by the thought of it being exposed simply through the internal cost numbers because they are so arbitrary. One of the endless projects I think about but never start coding is to write something that measures the things the planner cost constants estimate on a particular machine, so that all those numbers actually can be tied to some real-world time measure. If you did that, you'd actually have a shot at accomplishing the real goal here, making statement_cost_limit cut off statements expected to take longer than statement_timeout before they even get started. -- * Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD
On Mon, Aug 04, 2008 at 03:09:34PM -0400, Greg Smith wrote: > On Mon, 4 Aug 2008, daveg wrote: > >We load the production dumps into our dev environment, which are the same > >hardware spec, so the costs should be identical. > > Not identical, just close. ANALYZE samples data from your table randomly. > The statistics used to compute the costs will therefore be slightly > different on the two servers even if the data is the same. The problem of > discovering one plan on production and another on development is not quite > that easy to remove. Ultimately, if your developers aren't thorough > enough to do thinks like look at EXPLAIN plans enough to discover things > that are just bad, I just chuckle at your thinking that putting a single > limiter on their bad behavior will somehow magically make that better. Not all developers can be persuaded to run explain on every change. However, many will investigate a new message. I'm only hoping to try to focus their attention toward possible problem queries. > Anyway, if your production server is small enough that you can afford to > have another one just like it for the developers to work on, that's great. > Robert's point is that many installs don't work like that. The > development teams in lots of places only get a subset of the production > data because it's too large to deploy on anything but a big server, which > often is hard to cost justify buying just for development purposes. Not to get into a size war ;-), but the production environment I'd like this feature for is over 40 32GB 16 scsi drive quadcore boxes. These are dedicated to postgresql and run one or just a few databases. There are also a bunch of client boxes that we will not speak of. The staging and test environments are similar hardware but have only a subset of the databases copied to them. There are probably than a dozen DB hosts for that. > I like the concept of a cost limit, but I'm a bit horrified by the thought > of it being exposed simply through the internal cost numbers because they > are so arbitrary. One of the endless projects I think about but never > start coding is to write something that measures the things the planner > cost constants estimate on a particular machine, so that all those numbers > actually can be tied to some real-world time measure. If you did that, > you'd actually have a shot at accomplishing the real goal here, making > statement_cost_limit cut off statements expected to take longer than > statement_timeout before they even get started. That is a nice idea. Possibly it could be a utility like the fsync tester. But planner estimates are never going to be all that accurate even with solid cost numbers because for some classes of queries, particularly those with many joins the stats can be good at each level but the error accumulates exponentially. Which is why I think a warning is appropriate instead of an error. Even a notice in the logs would be useful. -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.
On Mon, Aug 04, 2008 at 02:35:07PM -0400, Robert Treat wrote: > On Monday 04 August 2008 03:50:40 daveg wrote: > > That's great for you, I am talking in the scope of a general solution. (Note > I'd also bet that even given the same hardware, different production loads > can produce different relative mappings of cost vs. performance, but > whatever) Even on different hardware it would still likely warn of mistakes like products due to missing join conditions etc. > > > I still think it is worth revisiting what problems people are trying to > > > solve, and see if there are better tools they can be given to solve them. > > > Barring that, I suppose a crude solution is better than nothing, though > > > I fear people might point at the crude solution as a good enough solution > > > to justify not working on better solutions. > > > > Alerting developers and QA to potentially costly queries would help solve > > some of the probems we are trying to solve. Better tools are welcome, an > > argument that the good is the enemy of the best so we should be content > > with nothing is not. > > And you'll note, I specifically said that a crude tool is better than nothing. I released somewhat after I sent the above that it might have sounded a bit snippy. I hope I have not offended. > But your completely ignoring that a crude tool can often end-up as a foot-gun > once relased into the wild. I'm suggesting a warning, or even just a notice into the logs, I don't see the footgun. What am I missing? Regards -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.
On Mon, 2008-08-04 at 14:35 -0400, Robert Treat wrote: > On Monday 04 August 2008 03:50:40 daveg wrote: > > On Sun, Aug 03, 2008 at 10:57:55PM -0400, Robert Treat wrote: ... > > > I still think it is worth revisiting what problems people are trying to > > > solve, and see if there are better tools they can be given to solve them. > > > Barring that, I suppose a crude solution is better than nothing, though > > > I fear people might point at the crude solution as a good enough solution > > > to justify not working on better solutions. > > > > Alerting developers and QA to potentially costly queries would help solve > > some of the probems we are trying to solve. Better tools are welcome, an > > argument that the good is the enemy of the best so we should be content > > with nothing is not. > > > > And you'll note, I specifically said that a crude tool is better than nothing. > But your completely ignoring that a crude tool can often end-up as a foot-gun > once relased into the wild. On the other other hand, _anything_ can end up as a foot-gun in hands of ingenious users. I was once told about a company, who claimed to have produced a positively fool-proof lawn-mower, only to find out, that a university professor had tried to use it to trim a hedge and cut off his toes. ------------- Hannu
"Josh Berkus" <josh@agliodbs.com> writes: > In such a production application, it is better to have false positives and > reject otherwise-OK queries becuase their costing is wrong, than to let a > single cartesian join bog down an application serving 5000 simultaneous users. > Further, with a SQL error, this would allow the query rejection to be handled > in a user-friendly way from the UI ("Search too complex. Try changing search > terms.") rather than timing out, which is very difficult to handle well. > > The usefulness of this feature for interactive sessions is > limited-to-nonexistant. It's for production applications. Wow. I couldn't disagree more. For such an application this would be a major foot-gun which would give a false sense of security simultaneously causing random outages and not providing even the protection you're counting on. It would be quite likely to miss some cartesian joins and allow problematic queries through randomly and block other perfectly legitimate queries. I's no substitute for writing your search engine query generator to actually check that it has enough constraints to avoid any disallowed cartesion joins. That people might think it's reliable enough to use for such applications is my major concern and if my guess is right, Tom's as well. I suspect you may have just sunk any chance of getting him on-side. Where I see it useful is a) during development when it might help catch erroneous queries as a kind of sql-lint. and b) when running ad-hoc DBA queries where it might let the DBA catch the error before letting it run for a while. I'm sure I'm not the only DBA who let a query run for 5 minutes before wondering if it should really be taking that long. I would be much more comfortable if it produced a warning, not an error. And much more if we implemented my previous thought of having some settings which generate warnings if they're set at startup saying that's not recommended. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support!
On Mon, Aug 04, 2008 at 11:59:03AM -0700, Josh Berkus wrote: > Greg, > > >Well that's going to depend on the application.... But I suppose there's > >nothing wrong with having options which aren't always a good idea to use. > >The > >real question I guess is whether there's ever a situation where it would > >be a > >good idea to use this. I'm not 100% sure. > > I can think of *lots*. Primarily, simple web applications, where > queries are never supposed to take more than 50ms. If a query turns up > with an estimated cost of 10000000000, then you know something's wrong; > in the statistics if not in the query. In either case, that query has a > good chance of dragging down the whole system. > > In such a production application, it is better to have false positives > and reject otherwise-OK queries becuase their costing is wrong, than to > let a single cartesian join bog down an application serving 5000 > simultaneous users. Further, with a SQL error, this would allow the > query rejection to be handled in a user-friendly way from the UI > ("Search too complex. Try changing search terms.") rather than timing > out, which is very difficult to handle well. > > The usefulness of this feature for interactive sessions is > limited-to-nonexistant. It's for production applications. Ok, that is a different use case where an error seems very useful. What about slightly extending the proposal to have the severity of exceeding the limit configurable too. Something like: costestimate_limit = 1000000000 # default 0 to ignore limit costestimate_limit_severity = error # debug, notice,warning, error -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.
>>> Hannu Krosing <hannu@krosing.net> wrote: > I was once told about a company, who claimed to have produced a > positively fool-proof lawn-mower, only to find out, that a university > professor had tried to use it to trim a hedge and cut off his toes. Odd. Seriously, about 45 years ago I lived next door to a university botany professor who cut off his fingertips that way. I wonder if professors are more prone to this or whether the story got mangled over time. Perhaps software should have special protection for professors.... -Kevin
On Mon, 2008-08-04 at 14:35 -0400, Robert Treat wrote: > On Monday 04 August 2008 03:50:40 daveg wrote: > And you'll note, I specifically said that a crude tool is better than > nothing. But your completely ignoring that a crude tool can often > end-up as a foot-gun once relased into the wild. The proposal is for an option with no consequences when turned off. We respect your right not to use it. What is the danger exactly? If we cancel stupid queries before people run them, everybody is a winner. Even the person who submitted the stupid query, since they find out faster. Sure, its an estimate, but it's got to be a based upon an estimate if it acts *before* it runs. And surely there is no better estimate of the cost than the plan cost? It doesn't stop anyone from putting in resource limits, later. We'll have to do something with enable_seqscan, BTW, chaps. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Greg, > For such an application this would be a major foot-gun which would give > a false sense of security simultaneously causing random outages and not > providing even the protection you're counting on. Hmmm. That sounds like a call for some testing. While our cost estimation has some issues, I don't think it's unreliable as all that. And it's easy enough to document the limitations. If it's 80% accurate, then it's fixing more problems than it causes. If it's 30% accurate, then obviously it's a bad idea. Personally, I don't see much use for this interactively, because an experienced DBA can easily enough run an EXPLAIN before running the query. I usually do, on production systems. -- --Josh Josh Berkus PostgreSQL San Francisco
On Monday 04 August 2008 16:49:43 Simon Riggs wrote: > On Mon, 2008-08-04 at 14:35 -0400, Robert Treat wrote: > > On Monday 04 August 2008 03:50:40 daveg wrote: > > > > And you'll note, I specifically said that a crude tool is better than > > nothing. But your completely ignoring that a crude tool can often > > end-up as a foot-gun once relased into the wild. > > The proposal is for an option with no consequences when turned off. We > respect your right not to use it. What is the danger exactly? > All of the proposals I have seen for adding query hints would also have no consequence if not used, but no one seems to care about that argument. :-) -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
On Monday 04 August 2008 15:56:25 daveg wrote: > On Mon, Aug 04, 2008 at 02:35:07PM -0400, Robert Treat wrote: > > On Monday 04 August 2008 03:50:40 daveg wrote: > > > > That's great for you, I am talking in the scope of a general solution. > > (Note I'd also bet that even given the same hardware, different > > production loads can produce different relative mappings of cost vs. > > performance, but whatever) > > Even on different hardware it would still likely warn of mistakes like > products due to missing join conditions etc. > See, this is what we ended up talking about before. Someone will say "I'd like to prevent my devs from accidentally doing queries with cartesian products" and they will use this to do it... but that will only work in some cases, so it becomes a poor tool to solve a different problem. BTW, what I really love about statement costs, is that they aren't even reliable on the same machine with the same data. I have seen query plans which run on the same data on the same machine where the resultant query runtime can vary from 2 hours to 5 hours, depending on how much other concurrent traffic is on the machine. Awesome eh? > > > > I still think it is worth revisiting what problems people are trying > > > > to solve, and see if there are better tools they can be given to > > > > solve them. Barring that, I suppose a crude solution is better than > > > > nothing, though I fear people might point at the crude solution as a > > > > good enough solution to justify not working on better solutions. > > > > > > Alerting developers and QA to potentially costly queries would help > > > solve some of the probems we are trying to solve. Better tools are > > > welcome, an argument that the good is the enemy of the best so we > > > should be content with nothing is not. > > > > And you'll note, I specifically said that a crude tool is better than > > nothing. > > I released somewhat after I sent the above that it might have sounded a bit > snippy. I hope I have not offended. > > > But your completely ignoring that a crude tool can often end-up as a > > foot-gun once relased into the wild. > > I'm suggesting a warning, or even just a notice into the logs, I don't see > the footgun. What am I missing? > The footgun in my mind is that people will think this solves a number of problems even though it doesnt solve them well. However, the footgun for you might be that the current proposal will actually abort the query, not emit a warning (not sure if that changes your opinion of it). -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
On Mon, Aug 04, 2008 at 05:19:50PM -0400, Robert Treat wrote: > See, this is what we ended up talking about before. Someone will say "I'd like > to prevent my devs from accidentally doing queries with cartesian products" > and they will use this to do it... but that will only work in some cases, so > it becomes a poor tool to solve a different problem. > > BTW, what I really love about statement costs, is that they aren't even > reliable on the same machine with the same data. I have seen query plans > which run on the same data on the same machine where the resultant query > runtime can vary from 2 hours to 5 hours, depending on how much other > concurrent traffic is on the machine. Awesome eh? Sure, I don't think anyone believes that costs are precise. But the case that is interesting is 2 hours versus years and years. > The footgun in my mind is that people will think this solves a number of > problems even though it doesnt solve them well. However, the footgun for yo I suspect that a good solution to this problem is impossible as it is more or less the halting problem. So I'm willing to accept a poor solution based on costs and then hope we improve the cost model. -dg -- David Gould daveg@sonic.net 510 536 1443 510 282 0869 If simplicity worked, the world would be overrun with insects.
Simon Riggs wrote: > On Sun, 2008-08-03 at 22:09 +0200, Hans-Jürgen Schönig wrote: > >>> Another alternative would be to have a plugin that can examine the >>> plan >>> immediately after planner executes, so you can implement this >>> yourself, >>> plus some other possibilities. >>> > >> this would be really fancy. >> how could a plugin like that look like? > > Hmm...thinks: exactly like the existing planner_hook(). > > So, rewrite this as a planner hook and submit as a contrib module. Now that's a good idea! I personally don't think this feature is a good idea, for all the reasons others have mentioned, but as a pgfoundry project it can be downloaded by those who want it, and perhaps prove its usefulness for others as well. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Aug 3, 2008, at 9:57 PM, Robert Treat wrote: >>> I think a variation on this could be very useful in development >>> and test >>> environments. Suppose it raised a warning or notice if the cost >>> was over >>> the limit. Then one could set a limit of a few million on the >>> development >>> and test servers and developers would at least have a clue that they >>> needed to look at explain for that query. As it is now, one can >>> exhort >>> them to run explain, but it has no effect. Instead we later see >>> queries >>> killed by a 24 hour timeout with estimated costs ranging from >>> "until they >>> unplug the machine and dump it" to "until the sun turns into a red >>> giant". >> >> Great argument. So that's 4 in favour at least. >> > > Not such a great argument. Cost models on development servers can > and often > are quite different from those on production, so you might be > putting an > artifical limit on top of your developers. We should have an approved API for dumping stats from one database and loading them into another. pg_dump needs this as well, IMO. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
On Aug 4, 2008, at 3:49 PM, Simon Riggs wrote: > On Mon, 2008-08-04 at 14:35 -0400, Robert Treat wrote: >> On Monday 04 August 2008 03:50:40 daveg wrote: > >> And you'll note, I specifically said that a crude tool is better than >> nothing. But your completely ignoring that a crude tool can often >> end-up as a foot-gun once relased into the wild. > > The proposal is for an option with no consequences when turned off. We > respect your right not to use it. What is the danger exactly? > > If we cancel stupid queries before people run them, everybody is a > winner. Even the person who submitted the stupid query, since they > find > out faster. I could *really* use this. Unfortunately, we have a lot of folks writing some horrible queries and killing our slave databases. I'd *love* to be able to throw out any queries that had insane limits... > We'll have to do something with enable_seqscan, BTW, chaps. My thought would be to back the cost penalty out if we end up with a seqscan anyway. Speaking of which, there is a semi-related issue... if you have a large enough table the fixed-size cost we add to a seqscan won't be enough to make an alternative plan come out cheaper. Instead of adding a fixed cost, I think we should multiply by the estimated number of rows. -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
On Aug 4, 2008, at 1:04 PM, daveg wrote: > Ok, that is a different use case where an error seems very useful. > What > about slightly extending the proposal to have the severity of > exceeding > the limit configurable too. Something like: > > costestimate_limit = 1000000000 # default 0 to ignore limit > costestimate_limit_severity = error # debug, notice, warning, > error I very much like this idea, and I would definitely use something like this on our production oltp app. We had a case recently where a query joining two large tables was very fast 99.9% of the time (i.e., a few ms), but for particular, rare key combinations the planner would make a poor choice turning into a multi-minute monster. It ran longer than the web server timeout, and the client was programmed to retry on error, essentially causing a database DoS. The monster version of the plan had an outrageous cost estimate, many orders of magnitude higher than any regular app query, and would be easy to peg using even a crudely chosen limit value. The problem was first mitigated by setting a query timeout a little longer than the web server timeout (since the query results are discarded for anything running longer), but even this was not a solution, since the client would retry on timeout, still keeping the db too busy. The real solution was to not do the query, but it would have been better to identify this via ERRORs in the logs than by the database becoming saturated in the middle of the day. For our application it is far better for an expensive query to be rejected outright than to attempt to run it in vain. Just thought I'd throw that out as anecdotal support. -Casey
Josh Berkus wrote: > Greg, > > > Well that's going to depend on the application.... But I suppose there's > > nothing wrong with having options which aren't always a good idea to use. The > > real question I guess is whether there's ever a situation where it would be a > > good idea to use this. I'm not 100% sure. > > I can think of *lots*. Primarily, simple web applications, where > queries are never supposed to take more than 50ms. If a query turns up > with an estimated cost of 10000000000, then you know something's wrong; > in the statistics if not in the query. In either case, that query has a > good chance of dragging down the whole system. > > In such a production application, it is better to have false positives > and reject otherwise-OK queries becuase their costing is wrong, than to > let a single cartesian join bog down an application serving 5000 > simultaneous users. Further, with a SQL error, this would allow the How about a simpler approach that throws an error or warning for cartesian products? That seems fool-proof. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian wrote: > Josh Berkus wrote: >> ...simple web applications, where >> queries are never supposed to take more than 50ms. If a query turns up >> with an estimated cost of 10000000000, then you know something's wrong; >> ... > > How about a simpler approach that throws an error or warning for > cartesian products? That seems fool-proof. Seems less fool-proof to me. Sometimes cartesian products produce plans that run 200 times faster than plans that don't use the cartesian product. The first link below shows a cartesian join that took 1.1 seconds (within the range of OK for some web apps), while plans for the same query that don't use one took 200 seconds. http://archives.postgresql.org/pgsql-performance/2008-03/msg00391.php http://archives.postgresql.org/pgsql-performance/2007-12/msg00090.php http://archives.postgresql.org/pgsql-performance/2008-03/msg00361.php
Ron Mayer wrote: > Bruce Momjian wrote: > > Josh Berkus wrote: > >> ...simple web applications, where > >> queries are never supposed to take more than 50ms. If a query turns up > >> with an estimated cost of 10000000000, then you know something's wrong; > >> ... > > > > How about a simpler approach that throws an error or warning for > > cartesian products? That seems fool-proof. > > Seems less fool-proof to me. > > Sometimes cartesian products produce plans that run 200 times > faster than plans that don't use the cartesian product. > > The first link below shows a cartesian join that took 1.1 > seconds (within the range of OK for some web apps), while > plans for the same query that don't use one took 200 seconds. > > http://archives.postgresql.org/pgsql-performance/2008-03/msg00391.php > http://archives.postgresql.org/pgsql-performance/2007-12/msg00090.php > http://archives.postgresql.org/pgsql-performance/2008-03/msg00361.php My point is that people should _know_ they are using a cartesian product, and a warning would do that for users who have no need for a cartesian product and want to be warned about a possible error. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
Bruce Momjian <bruce@momjian.us> writes: > Ron Mayer wrote: >> Seems less fool-proof to me. > My point is that people should _know_ they are using a cartesian > product, and a warning would do that for users who have no need for a > cartesian product and want to be warned about a possible error. There are quite a lot of standard applications where small cartesian products make sense --- star schemas are the traditional example. I recall some discussions awhile back about how to persuade the planner to consider such cases, in fact. Right now it'll generally use a series of hash joins where maybe just one would be better. I concur with Ron that the only merit of this proposal is that it's (relatively) simple to implement. regards, tom lane
> My point is that people should _know_ they are using a cartesian > product, and a warning would do that for users who have no need for a > cartesian product and want to be warned about a possible error. I think Cartesian products are a red herring. Cartesian products are primarily bad if they generate bad performance, and bad performance can be generated without Cartesian products. I've certainly written them intentionally, from time to time. The bigger issue is - if you start critiquing people's query-writing, where will you stop? SELECT * FROM foo WHERE a = NULL WARNING: a = NULL is always false. Did you mean id IS NULL? SELECT * FROM foo LEFT JOIN bar ON foo.a = bar.a LEFT JOIN baz ON foo.b = bar.b AND foo.c = baz.c WARNING: Maybe you meant foo.b = baz.b instead of foo.b = bar.b? I'm sure there are a hundred others - these just happen to be a few of my old mistakes (the first one was generated by some buggy Perl code... the second by poor cut-and-paste skills). In any event, I don't think it's the job of the database to argue with you about whether you really want the data you asked for - it's job is just to get you that data. Of course if doing so will take longer than the amount of time remaining before the heat death of the universe, a warning might be appropriate. ...Robert
Bruce, > How about a simpler approach that throws an error or warning for > cartesian products? That seems fool-proof. Well, throwing a warning is pretty useless for an unattended application. Also, it's perfectly possible to write queries which will never complete without a cartesian join. Basically, *unless* someone has a plan to integrate the greenplum resource management stuff sometime soon, I think we should take Hans' idea (pending patch quality, of course). There's an argument to be made that even if we took the greenplum resource controller, statement_cost_limit would be much simpler and worth having for the small-effort simple-application users. FYI, the concept behind the Greenplum RM is similar, except that it maintains a total "pool" of query costs on a per-role basis. -- --Josh Josh Berkus PostgreSQL San Francisco
>> this entire thing is not about cartesian products at all. it is about kicking out "expensive" queries before they even start to eat up tons of CPU. imagine a user asking for "give me all phone call in the US within the past 10 years". you could kill the guy instantly because you know that this would take ages. in addition to that you know that in an OLTP context everything which is expected to take longer than X cannot be useful anyway. this has nothing to do with cartesian products or other bad things you can do in SQL. it is just a simple and heuristic check. many thanks, hans > My point is that people should _know_ they are using a cartesian > product, and a warning would do that for users who have no need for a > cartesian product and want to be warned about a possible error. > > -- > Bruce Momjian <bruce@momjian.us> http://momjian.us > EnterpriseDB http://enterprisedb.com > > + If your life is a hard drive, Christ can be your backup. + -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt Web: www.postgresql-support.de