Thread: Re: [PERFORM] Hints proposal
[ This is off-topic for -performance, please continue the thread in -hackers ] "Jim C. Nasby" <jim@nasby.net> writes: > These hints would outright force the planner to do things a certain way. > ... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */ This proposal seems to deliberately ignore every point that has been made *against* doing things that way. It doesn't separate the hints from the queries, it doesn't focus on fixing the statistical or cost misestimates that are at the heart of the issue, and it takes no account of the problem of hints being obsoleted by system improvements. > It would also be useful to allow tweaking of planner cost estimates. > This would take the general form of > node operator value This is at least focusing on the right sort of thing, although I still find it completely misguided to be attaching hints like this to individual queries. What I would like to see is information *stored in a system catalog* that affects the planner's cost estimates. As an example, the DBA might know that a particular table is touched sufficiently often that it's likely to remain RAM-resident, in which case reducing the page fetch cost estimates for just that table would make sense. (BTW, this is something the planner could in principle know, but we're unlikely to do it anytime soon, for a number of reasons including a desire for plan stability.) The other general category of thing I think we need is a way to override selectivity estimates for particular forms of WHERE clauses. regards, tom lane
On 10/12/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > [ This is off-topic for -performance, please continue the thread in > -hackers ] > This proposal seems to deliberately ignore every point that has been > made *against* doing things that way. It doesn't separate the hints > from the queries, it doesn't focus on fixing the statistical or cost > misestimates that are at the heart of the issue, and it takes no account > of the problem of hints being obsoleted by system improvements. what about extending the domain system so that we can put in ranges that override the statistics or (imo much more importantly) provide information when the planner would have to restort to a guess. my case for this is prepared statements with a parameterized limit clause. prepare foo(l int) as select * from bar limit $1; maybe: create domain foo_lmt as int hint 1; -- probably needs to be fleshed out prepare foo(l foolmt) as select * from bar limit $1; this says: "if you have to guess me, please use this" what I like about this over previous attempts to persuade you is the grammar changes are localized and also imo future proofed. planner can ignore the hints if they are not appropriate for the oparation. merlin
On Thu, Oct 12, 2006 at 11:42:32AM -0400, Tom Lane wrote: > [ This is off-topic for -performance, please continue the thread in > -hackers ] > > "Jim C. Nasby" <jim@nasby.net> writes: > > These hints would outright force the planner to do things a certain way. > > ... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */ > > This proposal seems to deliberately ignore every point that has been > made *against* doing things that way. It doesn't separate the hints > from the queries, it doesn't focus on fixing the statistical or cost > misestimates that are at the heart of the issue, and it takes no account > of the problem of hints being obsoleted by system improvements. Yes, but it does one key thing: allows DBAs to fix problems *NOW*. See also my comment below. > > It would also be useful to allow tweaking of planner cost estimates. > > This would take the general form of > > node operator value > > This is at least focusing on the right sort of thing, although I still > find it completely misguided to be attaching hints like this to > individual queries. Yes, but as I mentioned the idea here was to come up with something that is (hopefully) easy to define and implement. In other words, something that should be doable for 8.3. Because this proposal essentially amounts to limiting plans the planner will consider and tweaking it's cost estimates, I'm hoping that it should be (relatively) easy to implement. > What I would like to see is information *stored in a system catalog* > that affects the planner's cost estimates. As an example, the DBA might > know that a particular table is touched sufficiently often that it's > likely to remain RAM-resident, in which case reducing the page fetch > cost estimates for just that table would make sense. (BTW, this is > something the planner could in principle know, but we're unlikely to > do it anytime soon, for a number of reasons including a desire for plan > stability.) All this stuff is great and I would love to see it! But this is all so abstract that I'm doubtful this could make it into 8.4, let alone 8.3. Especially if we want a comprehensive system that will handle most/all cases. I don't know if we even have a list of all the cases we need to handle. > The other general category of thing I think we need is a > way to override selectivity estimates for particular forms of WHERE > clauses. I hadn't thought about that for hints, but it would be a good addition. I think the stats-tweaking model would work, but we'd probably want to allow "=" as well (which could go into the other stats tweaking hints as well). ... WHERE a = b /* SELECTIVITY {+|-|*|/|=} value */ -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
OK, I just have to comment... "Jim C. Nasby" <jim@nasby.net> writes: > > These hints would outright force the planner to do things a certain way. > > ... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */ > > This proposal seems to deliberately ignore every point that has been > made *against* doing things that way. It doesn't separate the hints > from the queries, it doesn't focus on fixing the statistical or cost > misestimates that are at the heart of the issue, and it takes no account > of the problem of hints being obsoleted by system improvements. But whatever arguments you made about planner improvements and the like, it will NEVER be possible to correctly estimate in all cases the statistics for a query, even if you perfectly know WHAT statistics you need, which is also not the case all the time. Tom, you're the one who knows best how the planner works... can you bet anything you care about on the fact that one day the planner will never ever generate a catastrophic plan without DBA tweaking ? And how far in time we'll get to that point ? Until that point is achieved, the above proposal is one of the simplest to understand for the tweaking DBA, and the fastest to deploy when faced with catastrophic plans. And I would guess it is one of the simplest to be implemented and probably not very high maintenance either, although this is just a guess. If I could hint some of my queries, I would enable anonymous prepared statements to take into account the parameter values, but I can't because that results in runaway queries every now and then, so I had to force postgres generate generic queries without knowing anything about parameter values... so the effect for me is an overall slower postgres system because I couldn't fix the particular problems I had and had to tweak general settings. And when I have a problem I can't wait until the planner is fixed, I have to solve it immediately... the current means to do that are suboptimal. The argument that planner hints would hide problems from being solved is a fallacy. To put a hint in place almost the same amount of analysis is needed from the DBA as solving the problem now, so users who ask now for help will further do it even in the presence of hints. The ones who wouldn't are not coming for help now either, they know their way out of the problems... and the ones who still report a shortcoming of the planner will do it with hints too. I would even say it would be an added benefit, cause then you could really see how well a specific plan will do without having the planner capable to generate alone that plan... so knowledgeable users could come to you further down the road when they know where the planner is wrong, saving you time. I must say it again, this kind of query-level hinting would be the easiest to understand for the developers... there are many trial-end-error type of programmers out there, if you got a hint wrong, you fix it and move on, doesn't need to be perfect, it just have to be good enough. I heavily doubt that postgres will get bad publicity because user Joe sot himself in the foot by using bad hints... the probability for that is low, you must actively put those hints there, and if you take the time to do that then you're not the average Joe, and probably not so lazy either, and if you're putting random hints, then you would probably mess it up some other way anyway. And the thing about missing new features is also not very founded. If I would want to exclude a full table scan on a specific table for a specific query, than that's about for sure that I want to do that regardless what new features postgres will offer in the future. Picking one specific access method is more prone to missing new access methods, but even then, when I upgrade the DB server to a new version, I usually have enough other compatibility problems (till now I always had some on every upgrade I had) that making a round of upgrading hints is not an outstanding problem. And if the application works good enough with suboptimal plans, why would I even take that extra effort ? I guess the angle is: I, as a practicing DBA would like to be able to experiment and get most out of the imperfect tool I have, and you, the developers, want to make the tool perfect... I don't care about perfect tools, it just have to do the job... hints or anything else, if I can make it work GOOD ENOUGH, it's all fine. And hints is something I would understand and be able to use. Thanks for your patience if you're still reading this... Cheers, Csaba.
"Jim C. Nasby" <jim@nasby.net> writes: > Yes, but as I mentioned the idea here was to come up with something that > is (hopefully) easy to define and implement. In other words, something > that should be doable for 8.3. Sorry, but that is not anywhere on my list of criteria for an important feature. Having to live with a quick-and-dirty design for the foreseeable future is an ugly prospect --- and anything that puts hints into application code is going to lock us down to supporting it forever. regards, tom lane
Jim, >>> These hints would outright force the planner to do things a certain way. >>> ... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */ >> This proposal seems to deliberately ignore every point that has been >> made *against* doing things that way. It doesn't separate the hints >> from the queries, it doesn't focus on fixing the statistical or cost >> misestimates that are at the heart of the issue, and it takes no account >> of the problem of hints being obsoleted by system improvements. > > Yes, but it does one key thing: allows DBAs to fix problems *NOW*. See > also my comment below. I don't see how adding extra tags to queries is easier to implement than an ability to modify the system catalogs. Quite the opposite, really. And, as I said, if you're going to push for a feature that will be obsolesced in one version, then you're going to have a really rocky row to hoe. > Yes, but as I mentioned the idea here was to come up with something that > is (hopefully) easy to define and implement. In other words, something > that should be doable for 8.3. Because this proposal essentially amounts > to limiting plans the planner will consider and tweaking it's cost > estimates, I'm hoping that it should be (relatively) easy to implement. Even I, the chief marketing geek, am more concerned with getting a feature that we will still be proud of in 5 years than getting one in the next nine months. Keep your pants on! I actually think the way to attack this issue is to discuss the kinds of errors the planner makes, and what tweaks we could do to correct them. Here's the ones I'm aware of: -- Incorrect selectivity of WHERE clause -- Incorrect selectivity of JOIN -- Wrong estimate of rows returned from SRF -- Incorrect cost estimate for index use Can you think of any others? I also feel that a tenet of the design of the "planner tweaks" system ought to be that the tweaks are collectible and analyzable in some form. This would allow DBAs to mail in their tweaks to -performance or -hackers, and then allow us to continue improving the planner. --Josh Berkus
Csaba, > I guess the angle is: I, as a practicing DBA would like to be able to > experiment and get most out of the imperfect tool I have, and you, the > developers, want to make the tool perfect... I don't care about perfect > tools, it just have to do the job... hints or anything else, if I can > make it work GOOD ENOUGH, it's all fine. And hints is something I would > understand and be able to use. Hmmm, if you already understand Visual Basic syntax, should we support that too? Or maybe we should support MySQL's use of '0000-00-00' as the "zero" date because people "understand" that? We're just not going to adopt a bad design because Oracle DBAs are used to it. If we wanted to do that, we could shut down the project and join a proprietary DB staff. The current discussion is: a) Planner tweaking is sometimes necessary; b) Oracle HINTS are a bad design for planner tweaking; c) Can we come up with a good design for planner tweaking? So, how about suggestions for a good design? --Josh Berkus
> Hmmm, if you already understand Visual Basic syntax, should we support > that too? Or maybe we should support MySQL's use of '0000-00-00' as the > "zero" date because people "understand" that? You completely misunderstood me... I have no idea about oracle hints, never used Oracle in fact. My company uses oracle, but I have only very very limited contact with oracle issues, and never touched a hint. I'm only talking about ease of use, learning curves, and complexity in general. While I do like the idea of an all automatic system optimizer which takes your query portofolio and analyzes the data based on those queries and creates you all the indexes you need and all that, that's not gonna happen soon, because it's a very complex thing to implement. The alternative is that you take your query portofolio, analyze it yourself, figure out what statistics you need, create indexes, tweak queries, hint the planner for correlations and stuff... which is a complex task, and if you have to tell the server about some correlations with the phase of the moon, you're screwed cause there will never be any DB engine which will understand that. But you always can put the corresponding hint in the query when you know the correlation is there... The problem is that the application sometimes really knows better than the server, when the correlations are not standard. > We're just not going to adopt a bad design because Oracle DBAs are used > to it. If we wanted to do that, we could shut down the project and > join a proprietary DB staff. I have really nothing to do with Oracle. I think you guys are simply too blinded by Oracle hate... I don't care about Oracle. > The current discussion is: > > a) Planner tweaking is sometimes necessary; > b) Oracle HINTS are a bad design for planner tweaking; While there are plenty of arguments you made against query level hints (can we not call them Oracle-hints ?), there are plenty of users of postgres who expressed they would like them. I guess they were tweaking postgres installations when they needed it, and not Oracle installations. I expressed it clearly that for me query level hinting would give more control and better understanding of what I have to do for the desired result. Perfect planning -> forget it, I only care about good enough with reasonable tuning effort. If I have to tweak statistics I will NEVER be sure postgres will not backfire on me again. On the other hand if I say never do a seq scan on this table for this query, I could be sure it won't... > c) Can we come up with a good design for planner tweaking? Angles again: good enough now is better for end users, but programmers always go for perfect tomorrow... pity. Cheers, Csaba.
Csaba Nagy <nagy@ecircle-ag.com> writes: > Until that point is achieved, the above proposal is one of the simplest > to understand for the tweaking DBA, and the fastest to deploy when faced > with catastrophic plans. And I would guess it is one of the simplest to > be implemented and probably not very high maintenance either, although > this is just a guess. That guess is wrong ... but more to the point, if you think that "simple and easy to implement" should be the overriding concern for designing a new feature, see mysql. They've used that design approach for years and look what a mess they've got. This project has traditionally done things differently and I feel no need to change that mindset now. regards, tom lane
On Thu, Oct 12, 2006 at 11:25:25AM -0500, Jim C. Nasby wrote: > Yes, but it does one key thing: allows DBAs to fix problems *NOW*. See > also my comment below. If I may argue in the other direction, speaking as one whose career (if we may be generous enough to call it that) has been pretty much exclusively on the operations end of things, I think that's an awful idea. There are two ways that quick-fix solve-the-problem-now hints are going to be used. One is in the sort of one-off query that a DBA has to run from time to time, that takes a long time, but that isn't really a part of regular application load. The thing is, if you already know your data well enough to provide a useful hint, you also know your data well enough to work around the problem in the short run (with some temp table tricks and the like). The _other_ way it's going to be used is as a stealthy alteration to regular behaviour, to solve a particular nasty performance problem that happens to result on a given day. And every single time I've seen anything like that done, the long term effect is always monstrous. Two releases later, all your testing and careful inspection and planning goes to naught one Saturday night at 3 am (because we all know computers know what time it is _where you are_) when the one-off trick that you pulled last quarter to solve the manager's promise (which was made while out golfing, so nobody wrote anything down) turns out to have a nasty effect now that the data distribution is different. Or you think so. But now you're not sure, because the code was tweaked a little to take some advantage of something you now have because of the query plans that you ended up getting because of the hint that was there because of the golf game, so now if you start fiddling with the hints, maybe you break something else. And you're tired, but the client is on the phone from Hong King _right now_. The second case is, from my experience, exactly the sort of thing you want really a lot when the golf game is just over, and the sort of thing you end up kicking yourself for in run-on sentences in the middle of the night six months after the golf game is long since forgotten. The idea for knobs on the planner that allows the DBA to give directed feedback, from which new planner enhancements can also come, seems to me a really good idea. But any sort of quick and dirty hint for right now gives me the willies. A -- Andrew Sullivan | ajs@crankycanuck.ca "The year's penultimate month" is not in truth a good way of saying November. --H.W. Fowler
On 10/12/06, Andrew Sullivan <ajs@crankycanuck.ca> wrote: > On Thu, Oct 12, 2006 at 11:25:25AM -0500, Jim C. Nasby wrote: > > Yes, but it does one key thing: allows DBAs to fix problems *NOW*. See > > also my comment below. > > If I may argue in the other direction, speaking as one whose career > (if we may be generous enough to call it that) has been pretty much > exclusively on the operations end of things, I think that's an awful > idea. > > There are two ways that quick-fix solve-the-problem-now hints are > going to be used. One is in the sort of one-off query that a DBA has third way: to solve the problem of data (especially constants) not being available to the planner at the time the plan was generated. this happens most often with prepared statements and sql udfs. note that changes to the plan generation mechanism (i think proposed by peter e a few weeks back) might also solve this. In a previous large project I had to keep bitmap scan and seqscan off all the time because of this problem (the project used a lot of prepared statements). or am i way off base here? merlin
On Thu, Oct 12, 2006 at 02:21:55PM -0400, Merlin Moncure wrote: > third way: to solve the problem of data (especially constants) not > being available to the planner at the time the plan was generated. > this happens most often with prepared statements and sql udfs. note > that changes to the plan generation mechanism (i think proposed by > peter e a few weeks back) might also solve this. You're right about this, but you also deliver the reason why we don't need hints for that: the plan generation mechanism is a better solution to that problem. It's this latter thing that I keep coming back to. As a user of PostgreSQL, the thing that I really like about it is its pragmatic emphasis on correctness. In my experience, it's a system that feels very UNIX-y: there's a willingness to accept "80/20" answers to a problem in the event you at least have a way to get the last 20, but the developers are opposed to anything that seems really kludgey. In the case you're talking about, it seems to me that addressing the problems where they come from is a better solution that trying to find some way to work around them. And most of the use-cases I hear for a statement-level hints system fall into this latter category. A -- Andrew Sullivan | ajs@crankycanuck.ca Unfortunately reformatting the Internet is a little more painful than reformatting your hard drive when it gets out of whack. --Scott Morris
On Thu, Oct 12, 2006 at 09:40:30AM -0700, Josh Berkus wrote: > Jim, > > >>>These hints would outright force the planner to do things a certain way. > >>>... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */ > >>This proposal seems to deliberately ignore every point that has been > >>made *against* doing things that way. It doesn't separate the hints > >>from the queries, it doesn't focus on fixing the statistical or cost > >>misestimates that are at the heart of the issue, and it takes no account > >>of the problem of hints being obsoleted by system improvements. > > > >Yes, but it does one key thing: allows DBAs to fix problems *NOW*. See > >also my comment below. > > I don't see how adding extra tags to queries is easier to implement than > an ability to modify the system catalogs. Quite the opposite, really. > > And, as I said, if you're going to push for a feature that will be > obsolesced in one version, then you're going to have a really rocky row > to hoe. Unless you've got a time machine or a team of coders in your back pocket, I don't see how the planner will suddenly become perfect in 8.4... > >Yes, but as I mentioned the idea here was to come up with something that > >is (hopefully) easy to define and implement. In other words, something > >that should be doable for 8.3. Because this proposal essentially amounts > >to limiting plans the planner will consider and tweaking it's cost > >estimates, I'm hoping that it should be (relatively) easy to implement. > > Even I, the chief marketing geek, am more concerned with getting a > feature that we will still be proud of in 5 years than getting one in > the next nine months. Keep your pants on! Hey, I wrote that email while dressed! :P We've been seeing the same kinds of problems that are very difficult (or impossible) to fix cropping up for literally years... it'd be really good to at least be able to force the planner to do the sane thing even if we don't have the manpower to fix it right now... > I actually think the way to attack this issue is to discuss the kinds of > errors the planner makes, and what tweaks we could do to correct them. > Here's the ones I'm aware of: > > -- Incorrect selectivity of WHERE clause > -- Incorrect selectivity of JOIN > -- Wrong estimate of rows returned from SRF > -- Incorrect cost estimate for index use > > Can you think of any others? There's a range of correlations where the planner will incorrectly choose a seqscan over an indexscan. Function problems aren't limited to SRFs... we have 0 statistics ability for functions. There's the whole issue of multi-column statistics. > I also feel that a tenet of the design of the "planner tweaks" system > ought to be that the tweaks are collectible and analyzable in some form. > This would allow DBAs to mail in their tweaks to -performance or > -hackers, and then allow us to continue improving the planner. Well, one nice thing about the per-query method is you can post before and after EXPLAIN ANALYZE along with the hints. But yes, as we move towards a per-table/index/function solution, there should be an easy way to see how those hints are affecting the system and to report that data back to the community. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Jim, > > I don't see how adding extra tags to queries is easier to implement > > than an ability to modify the system catalogs. Quite the opposite, > > really. > > > > And, as I said, if you're going to push for a feature that will be > > obsolesced in one version, then you're going to have a really rocky > > row to hoe. > > Unless you've got a time machine or a team of coders in your back > pocket, I don't see how the planner will suddenly become perfect in > 8.4... Since you're not a core code contributor, I really don't see why you continue to claim that query hints are going to be easier to implement than relation-level statistics modification. You think it's easier, but the people who actually work on the planner don't believe that it is. > We've been seeing the same kinds of problems that are very difficult (or > impossible) to fix cropping up for literally years... it'd be really > good to at least be able to force the planner to do the sane thing even > if we don't have the manpower to fix it right now... As I've said to other people on this thread, you keep making the incorrect assumption that Oracle-style query hints are the only possible way of manual nuts-and-bolts query tuning. They are not. > > I actually think the way to attack this issue is to discuss the kinds > > of errors the planner makes, and what tweaks we could do to correct > > them. Here's the ones I'm aware of: > > > > -- Incorrect selectivity of WHERE clause > > -- Incorrect selectivity of JOIN > > -- Wrong estimate of rows returned from SRF > > -- Incorrect cost estimate for index use > > > > Can you think of any others? > > There's a range of correlations where the planner will incorrectly > choose a seqscan over an indexscan. Please list some if you have ones which don't fall into one of the four problems above. > Function problems aren't limited to SRFs... we have 0 statistics ability > for functions. > > There's the whole issue of multi-column statistics. Sure, but again that falls into the category of "incorrect selectivity for WHERE/JOIN". Don't make things more complicated than they need to be. > Well, one nice thing about the per-query method is you can post before > and after EXPLAIN ANALYZE along with the hints. One bad thing is that application designers will tend to use the hint, fix the immediate issue, and never report a problem at all. And query hints would not be collectable in any organized way except the query log, which would then require very sophisticated text parsing to get any useful information at all. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
> > Well, one nice thing about the per-query method is you can post before > > and after EXPLAIN ANALYZE along with the hints. > > One bad thing is that application designers will tend to use the hint, fix > the immediate issue, and never report a problem at all. And query hints > would not be collectable in any organized way except the query log, which > would then require very sophisticated text parsing to get any useful > information at all. > Or they'll report it when the next version of Postgres "breaks" their app because the hints changed, or because the planner does something else which makes those hints obsolete. My main concern with hints (aside from the fact I'd rather see more intelligence in the planner/stats) is managing them appropriately. I have two general types of SQL where I'd want to use hints- big OLAP stuff (where I have a lot of big queries, so it's not just one or two where I'd need them) or large dynamically generated queries (Users building custom queries). Either way, I don't want to put them on a query itself. What about using regular expressions, plus, if you have a function (views, or any other statement that is stored), you can assign a rule to that particular function. So you get matching, plus explicit selection. This way it's easy to find all your hints, turn them off, manage them, etc. (Not to mention dynamically generated SQL is ugly enough without having to put hints in there). - Bucky
Bucky Jordan wrote: > What about using regular expressions, plus, if you have a function > (views, or any other statement that is stored), you can assign a rule to > that particular function. So you get matching, plus explicit selection. > This way it's easy to find all your hints, turn them off, manage them, > etc. (Not to mention dynamically generated SQL is ugly enough without > having to put hints in there). The regular expression idea that's being floated around makes my brain feel like somebody is screeching a blackboard nearby. I don't think it's a sane idea. I think you could achieve something similar by using stored plan representations, like we do for rewrite rules. So you'd look for, say, a matching join combination in a catalog, and get a selectivity from a function that would get the selectivities of the conditions on the base tables. Or something like that anyway. That gets ugly pretty fast when you have to extract selectivities for all the possible join paths in any given query. But please don't talk about regular expressions. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
> The _other_ way it's going to be used is as a stealthy > alteration to regular behaviour, to solve a particular nasty > performance problem that happens to result on a given day. > And every single time I've seen anything like that done, the > long term effect is always monstrous. Funny, I very seldom use Informix hints (mostly none, maybe 2 per project), but I have yet to see one that backfires on me, even lightly. I use hints like: don't use that index, use that join order, use that index Can you give us an example that had such a monstrous effect in Oracle, other than that the hint was a mistake in the first place ? Andreas
> I actually think the way to attack this issue is to discuss the kinds of errors the planner makes, and what tweaks we could do to correct them. > Here's the ones I'm aware of: > > -- Incorrect selectivity of WHERE clause > -- Incorrect selectivity of JOIN > -- Wrong estimate of rows returned from SRF > -- Incorrect cost estimate for index use > > Can you think of any others? I think your points are too generic, there is no way to get them all 100% correct from statistical data even with data hints (and it is usually not at all necessary for good enough plans). I think we need to more precisely define the problems of our system with point in time statistics -- no reaction to degree of other concurrent activity -- no way to react to abnormal skew that only persists for a very short duration -- too late reaction to changing distribution (e.g. current date column when a new year starts) and the variant: too late adaption when a table is beeing filled -- missing cost/selectivity estimates for several parts of the system Andreas
On Fri, Oct 13, 2006 at 10:41:36AM +0200, Zeugswetter Andreas ADI SD wrote: > Can you give us an example that had such a monstrous effect in Oracle, > other than that the hint was a mistake in the first place ? Of course the hint was a mistake in the first place; the little story I told was exactly an example of such a case. The hint shouldn't have been put in place at the beginning; instead, the root cause should have been uncovered. It was not, the DBA added a hint, and later that hint turned out to have unfortunate consequences for some other use case. And it's a long-term monstrosity, remember, not a short one: the problem is in maintenance overall. This is a particularly sensitive area for PostgreSQL, because the planner has been making giant leaps forward with every release. Indeed, as Oracle's planner got better, the hints people had in place sometimes started to cause them to have to re-tune everything. My Oracle-using acquaintances tell me this has gotten better in recent releases; but in the last two days, one person pointed out that hints are increasingly relied on by one part of Oracle, even as another Oracle application insists that they never be used. That's exactly the sort of disagreement I'd expect to see when people have come to rely on what is basically a kludge in the first place. And remember, the places where PostgreSQL is getting used most heavily are still the sort of environments where people will take a lot of short cuts to achieve an immediate result, and be annoyed when that short cut later turns out to have been expensive. Postgres will get a black eye from that ("Too hard to manage! Upgrades cause all sorts of breakage!"). A -- Andrew Sullivan | ajs@crankycanuck.ca The plural of anecdote is not data. --Roger Brinner
> And remember, the places where PostgreSQL is getting used most > heavily are still the sort of environments where people will take a > lot of short cuts to achieve an immediate result, and be annoyed when > that short cut later turns out to have been expensive. Postgres will > get a black eye from that ("Too hard to manage! Upgrades cause all > sorts of breakage!"). Those guys will do their shortcuts anyway, and possibly reject postgres as not suitable even before that if they can't take any shortcuts. And upgrades are always causing breakage, I didn't have one upgrade without some things to fix, so I would expect people is expecting that. And that's true for Oracle too, our oracle guys always have something to fix after an upgrade. And I repeat, I always had something to fix for postgres too on all upgrades I've done till now. Cheers, Csaba.
> > Can you give us an example that had such a monstrous effect in Oracle, > > other than that the hint was a mistake in the first place ? > > Of course the hint was a mistake in the first place; the > little story I told was exactly an example of such a case. > The hint shouldn't have been put in place at the beginning; > instead, the root cause should have been uncovered. This is not an example. For us to understand, we need an actual case with syntax and all, and what happened. Imho the use of a stupid hint, that was added without analyzing the cause and background of the problem is no proof that statement hints are bad, only that the person involved was not doing his job. Andreas
On Thu, Oct 12, 2006 at 01:58:22PM -0700, Josh Berkus wrote: > > Unless you've got a time machine or a team of coders in your back > > pocket, I don't see how the planner will suddenly become perfect in > > 8.4... > > Since you're not a core code contributor, I really don't see why you > continue to claim that query hints are going to be easier to implement > than relation-level statistics modification. You think it's easier, but > the people who actually work on the planner don't believe that it is. Well, that's not what I said (my point being that until the planner and stats are perfect you need a way to over-ride them)... but I've also never said hints would be faster or easier than stats modification (I said I hope they would). But we'll never know which will be faster or easier until there's actually a proposal for improving the stats. > > We've been seeing the same kinds of problems that are very difficult (or > > impossible) to fix cropping up for literally years... it'd be really > > good to at least be able to force the planner to do the sane thing even > > if we don't have the manpower to fix it right now... > > As I've said to other people on this thread, you keep making the incorrect > assumption that Oracle-style query hints are the only possible way of > manual nuts-and-bolts query tuning. They are not. No, I've never said that. What I've said is a) I doubt that any system will always be correct for every query, meaning you need to be able to change things on a per-query basis, and b) I'm hoping that simple hints will be easy enough to implement that they can go into 8.3. I completely agree that it's much better *in the long run* to improve the planner and the statistics system so that we don't need hints. But there's been no plan put forward for how to do that, which means we also have no idea when some of these problems will be resolved. If someone comes up with a plan for that, then we can actually look at which options are better and how soon we can get fixes for these problems in place. Unfortunately, this problem is difficult enough that I suspect it could take a long time just to come up with an idea of how to fix these problems, which means that without some way to override the planner our users are stuck in the same place for the foreseeable future. If that turns out to be the case, then I think we should implement per-query hints now so that users can handle bad plans while we focus on how to improve the stats and planner so that in the future hints will become pointless. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
"Jim C. Nasby" <jim@nasby.net> writes: > I completely agree that it's much better *in the long run* to improve > the planner and the statistics system so that we don't need hints. But > there's been no plan put forward for how to do that, which means we also > have no idea when some of these problems will be resolved. You keep arguing on the assumption that the planner is static and there's no one working on it. That is false --- although this thread is certainly wasting a lot of time that could have been used more productively ;-). I also dispute your assumption that hints of the style you propose will be easier to implement or maintain than the sort of statistical-assumption tweaking that's been counter-proposed. Just for starters, how are you going to get those hints through the parser and rewriter? That's going to take an entire boatload of very ugly code that isn't needed at all in a saner design. regards, tom lane
Josh Berkus <josh@agliodbs.com> writes: >> > I actually think the way to attack this issue is to discuss the kinds >> > of errors the planner makes, and what tweaks we could do to correct >> > them. Here's the ones I'm aware of: >> > >> > -- Incorrect selectivity of WHERE clause >> > -- Incorrect selectivity of JOIN >> > -- Wrong estimate of rows returned from SRF >> > -- Incorrect cost estimate for index use >> > >> > Can you think of any others? -- Incorrect estimate for result of DISTINCT or GROUP BY. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Thu, 2006-10-12 at 18:02 -0400, Alvaro Herrera wrote: > Bucky Jordan wrote: > > > What about using regular expressions, plus, if you have a function > > (views, or any other statement that is stored), you can assign a rule to > > that particular function. So you get matching, plus explicit selection. > > This way it's easy to find all your hints, turn them off, manage them, > > etc. (Not to mention dynamically generated SQL is ugly enough without > > having to put hints in there). > > The regular expression idea that's being floated around makes my brain > feel like somebody is screeching a blackboard nearby. I don't think > it's a sane idea. I think you could achieve something similar by using > stored plan representations, like we do for rewrite rules. So you'd > look for, say, a matching join combination in a catalog, and get a > selectivity from a function that would get the selectivities of the > conditions on the base tables. Or something like that anyway. > > That gets ugly pretty fast when you have to extract selectivities for > all the possible join paths in any given query. > > But please don't talk about regular expressions. > It sounds horrible to me too, and I'm the one that thought of it (or at least I'm the one that introduced it to this thread). However, everything is relative. Since the other idea floating around is to put the same hinting information into the client queries themselves, regexes look great by comparison (in my opinion). Regards,Jeff Davis
> -----Original Message----- > From: Jeff Davis [mailto:pgsql@j-davis.com] > Sent: Friday, October 13, 2006 1:00 PM > To: Alvaro Herrera > Cc: Bucky Jordan; josh@agliodbs.com; Jim C. Nasby; pgsql- > hackers@postgresql.org > Subject: Re: [HACKERS] [PERFORM] Hints proposal > > On Thu, 2006-10-12 at 18:02 -0400, Alvaro Herrera wrote: > > Bucky Jordan wrote: > > > > > What about using regular expressions, plus, if you have a function > > > (views, or any other statement that is stored), you can assign a rule > to > > > that particular function. So you get matching, plus explicit > selection. > > > This way it's easy to find all your hints, turn them off, manage them, > > > etc. (Not to mention dynamically generated SQL is ugly enough without > > > having to put hints in there). > > > > The regular expression idea that's being floated around makes my brain > > feel like somebody is screeching a blackboard nearby. I don't think > > it's a sane idea. I think you could achieve something similar by using > > stored plan representations, like we do for rewrite rules. So you'd > > look for, say, a matching join combination in a catalog, and get a > > selectivity from a function that would get the selectivities of the > > conditions on the base tables. Or something like that anyway. > > > > That gets ugly pretty fast when you have to extract selectivities for > > all the possible join paths in any given query. > > > > But please don't talk about regular expressions. > > > > It sounds horrible to me too, and I'm the one that thought of it (or at > least I'm the one that introduced it to this thread). > > However, everything is relative. Since the other idea floating around is > to put the same hinting information into the client queries themselves, > regexes look great by comparison (in my opinion). I was merely expressing the same opinion. But I'm not one of those working on the planner, and all I can say to those of you who are is your efforts on good design are most appreciated, even if they do take longer than we users would like at times. My only point was that they should *NOT* be put in queries themselves as this scatters the nightmare into user code as well. Of course, other more sane ideas are most welcome. I don't like screeching on blackboards either. (regular expressions, although very valuable at times, seem to have that effect quite often...) - Bucky
On Fri, 2006-10-13 at 13:08 -0400, Bucky Jordan wrote: > > It sounds horrible to me too, and I'm the one that thought of it (or > at > > least I'm the one that introduced it to this thread). > > > > However, everything is relative. Since the other idea floating around > is > > to put the same hinting information into the client queries > themselves, > > regexes look great by comparison (in my opinion). > > I was merely expressing the same opinion. But I'm not one of those I didn't mean to imply otherwise. > working on the planner, and all I can say to those of you who are is > your efforts on good design are most appreciated, even if they do take > longer than we users would like at times. > > My only point was that they should *NOT* be put in queries themselves as > this scatters the nightmare into user code as well. Of course, other > more sane ideas are most welcome. I don't like screeching on blackboards > either. (regular expressions, although very valuable at times, seem to > have that effect quite often...) Right. And I think the sane ideas are along the lines of estimate & cost corrections (like Tom is saying). Regards,Jeff Davis
On Fri, Oct 13, 2006 at 10:23:31AM -0700, Jeff Davis wrote: > Right. And I think the sane ideas are along the lines of estimate & cost > corrections (like Tom is saying). Let me ask this... how long do you (and others) want to wait for those? It's great that the planner is continually improving, but it also appears that there's still a long road ahead. Having a dune-buggy to get to your destination ahead of the road might not be a bad idea... :) -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Jim C. Nasby wrote: > On Fri, Oct 13, 2006 at 10:23:31AM -0700, Jeff Davis wrote: >> Right. And I think the sane ideas are along the lines of estimate & cost >> corrections (like Tom is saying). > > Let me ask this... how long do you (and others) want to wait for those? > It's great that the planner is continually improving, but it also > appears that there's still a long road ahead. Having a dune-buggy to get > to your destination ahead of the road might not be a bad idea... :) It's all about resources Jim.. I have yet to see anyone step up and offer to help work on the planner in this thread (except Tom of course). Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutionssince 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
> I completely agree that it's much better *in the long run* to improve > the planner and the statistics system so that we don't need hints. But > there's been no plan put forward for how to do that, which means we also > have no idea when some of these problems will be resolved. If someone > comes up with a plan for that, then we can actually look at which options > are better and how soon we can get fixes for these problems in place. > Would it be helpful to have a database of EXPLAIN ANALYZE results and related details that developers could search through? I guess we sort of have that on the mailing list, but search/reporting features on that are pretty limited. Something like the "Report Bug" feature that seems to be growing popular in other software (Windows, OS X, Firefox, etc) might allow collection of useful data. The goal would be to identify the most common problems, and some hints at what's causing them. Maybe have a form based submission so you could ask the user some required questions, ensure that they aren't just submitting EXPLAIN results (parse and look for times maybe?), etc? I guess the general question is, what information could the users provide developers to help with this, and how can it be made easy for the users to submit the information, and easy for the developers to access in a meaningful way? As a developer/contributor, what questions would you want to ask a user? From reading the mailing lists, these seem to be common ones: - Copy of your postgres.conf - Basic hardware info - Explain Analyze Results of poor performing query - Explain Analyze Results of anything you've gotten to run better - Comments If there's interest- web development is something I can actually do (unlike pg development) so I might actually be able to help with something like this. - Bucky
On Fri, Oct 13, 2006 at 12:30:24PM -0500, Jim C. Nasby wrote: > On Fri, Oct 13, 2006 at 10:23:31AM -0700, Jeff Davis wrote: > > Right. And I think the sane ideas are along the lines of estimate > > & cost corrections (like Tom is saying). > > Let me ask this... how long do you (and others) want to wait for > those? That's a good question, but see below. > It's great that the planner is continually improving, but it > also appears that there's still a long road ahead. Having a > dune-buggy to get to your destination ahead of the road might not be > a bad idea... :) What evidence do you have that adding per-query hints would take less time and be less work, even in the short term, than the current strategy of continuously improving the planner and optimizer? Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote!
On Fri, 2006-10-13 at 12:30 -0500, Jim C. Nasby wrote: > On Fri, Oct 13, 2006 at 10:23:31AM -0700, Jeff Davis wrote: > > Right. And I think the sane ideas are along the lines of estimate & cost > > corrections (like Tom is saying). > > Let me ask this... how long do you (and others) want to wait for those? > It's great that the planner is continually improving, but it also > appears that there's still a long road ahead. Having a dune-buggy to get > to your destination ahead of the road might not be a bad idea... :) Fair enough. I can wait indefinitely right now, because I don't have any serious problems with the planner as-is. I am trying to empathize with people who are desperate to force plans sometimes. Your original proposal included hints in the client queries. I suggested that regexes on the server can accomplish the same goal while avoiding a serious drawback. Don't you think some kind of server matching rule is better? I think an idea to get the hints into the server, regardless of the types of hints you want to use, make it more likely to be accepted. Don't you think that's a better road to take? Regards,Jeff Davis
Joshua D. Drake wrote: > Jim C. Nasby wrote: > >> On Fri, Oct 13, 2006 at 10:23:31AM -0700, Jeff Davis wrote: >> >>> Right. And I think the sane ideas are along the lines of estimate & cost >>> corrections (like Tom is saying). >>> >> Let me ask this... how long do you (and others) want to wait for those? >> It's great that the planner is continually improving, but it also >> appears that there's still a long road ahead. Having a dune-buggy to get >> to your destination ahead of the road might not be a bad idea... :) >> > > It's all about resources Jim.. I have yet to see anyone step up and > offer to help work on the planner in this thread (except Tom of course). > > > It's worse than that. Dune buggies do not run cost free. They require oil, petrol, and maintenance. Somebody wants to build a Maserati and you want to divert resources to maintaining dune buggies? cheers andrew
Jim C. Nasby wrote: > On Fri, Oct 13, 2006 at 10:23:31AM -0700, Jeff Davis wrote: >> Right. And I think the sane ideas are along the lines of estimate & cost >> corrections (like Tom is saying). > > Let me ask this... how long do you (and others) want to wait for those? well - we waited and got other features in the past and we will wait and get new features in the future too... > It's great that the planner is continually improving, but it also > appears that there's still a long road ahead. Having a dune-buggy to get > to your destination ahead of the road might not be a bad idea... :) well the planner has improved dramatically in the last years - we have apps here that are magnitudes faster with 8.1(even faster with 8.2) then they were with 7.3/7.4 - most of that is pure plain planner improvements. And I don't really believe that adding proper per statement hint support is any easier then continuing to improve the planner or working on (imho much more useful) improvements to the statistics infrastructure or functionality to tweak the statistics usage of the planner. The later however would prove to be much more useful for most of the current "issues" and have benefits for most of the userbase. Stefan
Jim, > Well, that's not what I said (my point being that until the planner and > stats are perfect you need a way to over-ride them)... but I've also > never said hints would be faster or easier than stats modification (I > said I hope they would). Yes, you did. Repeatedly. On this and other threads, you've made the statement at least three times that per-query hints are the only way to go for 8.3. Your insistence on this view has been so strident that if I didn't know you better, I would assume some kind of hidden agenda. Stop harping on the "per-query hints are the true way and the only way", or prepare to have people start simply ignoring you. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
Andreas, > I think we need to more precisely define the problems of our system with > point in time statistics > > -- no reaction to degree of other concurrent activity > -- no way to react to abnormal skew that only persists for a very short > duration > -- too late reaction to changing distribution (e.g. current date column > when a new year starts) > and the variant: too late adaption when a table is beeing filled > -- missing cost/selectivity estimates for several parts of the system How would we manage point-in-time statistics? How would we collect them & store them? I think this is an interesting idea, but very, very hard to do ... -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
Csaba, > And upgrades are always causing breakage, I didn't have one upgrade > without some things to fix, so I would expect people is expecting that. > And that's true for Oracle too, our oracle guys always have something to > fix after an upgrade. And I repeat, I always had something to fix for > postgres too on all upgrades I've done till now. Really? Since 7.4, I've been able to do most upgrades without any troubleshooting. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
On Fri, Oct 13, 2006 at 03:57:23PM -0700, Josh Berkus wrote: > Jim, > > > Well, that's not what I said (my point being that until the planner and > > stats are perfect you need a way to over-ride them)... but I've also > > never said hints would be faster or easier than stats modification (I > > said I hope they would). > > Yes, you did. Repeatedly. On this and other threads, you've made the > statement at least three times that per-query hints are the only way to go > for 8.3. Your insistence on this view has been so strident that if I > didn't know you better, I would assume some kind of hidden agenda. Let me clarify, because that's not what I meant. Right now, there's not even a shadow of a design for anything else, and this is a tough nut to crack. That means it doesn't appear that anything else could be done for 8.3. If I'm wrong, great. If not, we should get something in place for users now while we come up with something better. So, does anyone out there have a plan for how we could give user's the ability to control the planner at a per-table level in 8.3 or even 8.4? -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Jim C. Nasby wrote: > So, does anyone out there have a plan for how we could give user's the > ability to control the planner at a per-table level in 8.3 or even 8.4? Per-table level? Some of the problems that have been put forward have to do with table combinations (for example selectivity of joins), so not all problems will be solved with a per-table design. I think if it were per table, you could get away with storing stuff in pg_statistics or some such. But how do you express statistics for joins? How do you express cross-column correlation? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
"Jim C. Nasby" <jim@nasby.net> writes: > Let me clarify, because that's not what I meant. Right now, there's not > even a shadow of a design for anything else, and this is a tough nut to > crack. I think you are not exactly measuring on a level playing field. On the textually-embedded-hints side, I see a very handwavy suggestion of a syntax and absolutely nothing about how it might be implemented --- in particular, nothing about how the information would be transmitted through to the planner, and nothing about exactly how the planner would use it if it had it. (No, I don't think "the planner will obey the hints" is an implementation sketch.) On the other side, the concept of system catalog(s) containing overrides for statistical or costing estimates is pretty handwavy too, but at least it's perfectly clear where it would plug into the planner: before running one of the current stats estimation or costing functions, we'd look for a matching override command in the catalogs. The main question seems to be what we'd like to be able to match on ... but that doesn't sound amazingly harder than specifying what an embedded hint does. IMO a textual hint facility will actually require *more* infrastructure code to be written than what's being suggested for alternatives. regards, tom lane
On Friday 13 October 2006 12:46, Gregory Stark wrote: > Josh Berkus <josh@agliodbs.com> writes: > >> > I actually think the way to attack this issue is to discuss the kinds > >> > of errors the planner makes, and what tweaks we could do to correct > >> > them. Here's the ones I'm aware of: > >> > > >> > -- Incorrect selectivity of WHERE clause > >> > -- Incorrect selectivity of JOIN > >> > -- Wrong estimate of rows returned from SRF > >> > -- Incorrect cost estimate for index use > >> > > >> > Can you think of any others? > > -- Incorrect estimate for result of DISTINCT or GROUP BY. Yeah, that one is bad. I also ran into one the other day where the planner did not seem to understand the distinctness of a columns values across table partitions... -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL