Thread: enforcing a plan (in brief)
Is there a way to bypass the optimizer and to specify a plan to be executed? Something like: Limit -> Nested Loop -> Nested Loop -> Seq Scan on tab00 t00 -> Index Scan using tab03_pkey on tab03t03 Index Cond: ("outer".id = t03.id) -> Index Scan using tab01_pkey on tab01 t01 IndexCond: ("outer".id = t01.id)
Hicham G. Elmongui wrote: > Is there a way to bypass the optimizer and to specify a plan to be executed? > > Something like: > > > > Limit > -> Nested Loop > -> Nested Loop > -> Seq Scan on tab00 t00 > -> Index Scan using tab03_pkey on tab03 t03 > Index Cond: ("outer".id = t03.id) > -> Index Scan using tab01_pkey on tab01 t01 > Index Cond: ("outer".id = t01.id) No, we feel that is of limited value. If the optimizer isn't doing things properly, we will fix it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Hicham For your experiments (VLDB ? :-) your best bet of specifically bolting on a plan (if you can't convince the optimizer to do the right thing) is to hack it in the query planner. I've done similar hacks in the past, but only in the TelegraphCQ code and not in PostgreSQL. -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh
On Thu, 2005-02-10 at 14:37 -0500, Bruce Momjian wrote: > No, we feel that is of limited value. If the optimizer isn't doing > things properly, we will fix it. I agree that improving the optimizer is the right answer for normal usage, so I can't get excited about query-level plan hints, but I can see the capability to instruct the planner being useful in an academic context. -Neil
> On Thu, 2005-02-10 at 14:37 -0500, Bruce Momjian wrote: >> No, we feel that is of limited value. If the optimizer isn't doing >> things properly, we will fix it. > > I agree that improving the optimizer is the right answer for normal > usage, so I can't get excited about query-level plan hints, but I can > see the capability to instruct the planner being useful in an academic > context. > I think that is sort of arrogant. Look at Oracle, you can give the planner hints in the form of comments. The idea that constructing a planner that will always do the best job is like creating a program that can predict the weather. There are too many subtle variations in datasets that are impossible to really evalute. I posted a message last week called "One Big trend vs multiple smaller trends." and you'll see what I mean. Yea, on a simple data organization, you could make a great planner, but someone who has studied the nature of their data can almost always toss their hands up in frustration because the planner isn't working right. I have had multiple issues with the inability to guide the planner on its decisions. I'll give a couple examples: A music database where the artist name is "Various Artists," given any normal database of music recordings, "Various Artists" will be *THE* most popular artist, usually close to almost half the data. Most of the time I've had to turn off sequential scans for these queries. (I filter out "various artists") Being able to say: select * from cdtitles where artist = 'foo' /* index scan ctitles_artist_ndx */ ; Would be helpful as disabling sequential scan isn't always the right think either. The whole "Query optimizer 8.0.1 (and 8.0)" series of posts show a different problem. It all comes down to that the planner *can not* be perfect, and thus will always be lacking in some respect. This is because you can not anticipate every physical data storage pattern, therefore, the analyzer will not correctly characterize them, and the planner will not create an optimal plan. Allowing the user to suggest alternate query strategies is a good idea.
pgsql@mohawksoft.com writes: > I think that is sort of arrogant. Look at Oracle, you can give the planner > hints in the form of comments. Arrogant or not, that's the general view of the people who work on the planner. The real issue is not so much whether the planner will always get things right --- it won't, and no one imagines that it will ever be perfect. The real issue is that we have limited manpower, and designing and implementing a useful hint facility is a nontrivial project. (Not to mention that maintaining such a thing in the face of frequent, fundamental changes to the underlying planner and executor capabilities would be an outright nightmare.) The people who are actually doing the work think their time is more usefully spent on improving the planner's intelligence than on devising ways to override it. regards, tom lane
Tom Lane wrote: > pgsql@mohawksoft.com writes: > > I think that is sort of arrogant. Look at Oracle, you can give the planner > > hints in the form of comments. > > Arrogant or not, that's the general view of the people who work on the > planner. > > The real issue is not so much whether the planner will always get things > right --- it won't, and no one imagines that it will ever be perfect. > The real issue is that we have limited manpower, and designing and > implementing a useful hint facility is a nontrivial project. (Not to > mention that maintaining such a thing in the face of frequent, > fundamental changes to the underlying planner and executor capabilities > would be an outright nightmare.) And the user maintenance of updating those hints for every release of PostgreSQL as we improve the database engine. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Mon, 2005-02-14 at 22:56 -0500, Bruce Momjian wrote: > And the user maintenance of updating those hints for every release of > PostgreSQL as we improve the database engine. ... and maintaining those hints as the data changes over time. But I think this thread has been hijacked toward a subject that has been beaten to death in the past, and away from something that I think might be worth exploring. IMHO, people deploying PostgreSQL for production use are just one of the groups of users of this project. Another group are those people using PostgreSQL in an academic environment. I think it would be really cool to make it absolutely simple to use PostgreSQL as a starting point for DBMS research. That would mean things like: - good, thorough documentation of the internals (naturally this would help attract OSS developers as well) - APIs that allow people to drive the planner and executor programmatically (as in the original question) - plugin APIs that make it relatively easy to replace the implementation of a subsystem whole-sale (if there's a cost to these APIs in terms of complexity or performance, it is perhaps not worth doing) (Of course, I'm partially guessing here -- but if those people who actually _are_ using PostgreSQL in an academic context have some additional ideas for how we can make your lives easier, I'd be curious to hear them.) -Neil
> pgsql@mohawksoft.com writes: >> I think that is sort of arrogant. Look at Oracle, you can give the >> planner >> hints in the form of comments. > > Arrogant or not, that's the general view of the people who work on the > planner. > > The real issue is not so much whether the planner will always get things > right --- it won't, and no one imagines that it will ever be perfect. > The real issue is that we have limited manpower, and designing and > implementing a useful hint facility is a nontrivial project. (Not to > mention that maintaining such a thing in the face of frequent, > fundamental changes to the underlying planner and executor capabilities > would be an outright nightmare.) > > The people who are actually doing the work think their time is more > usefully spent on improving the planner's intelligence than on devising > ways to override it. > I know I come on strong, and I know I'm probably irritating in many ways, however, I have been a PostgreSQL user since just after it was named postgreSQL from Postgres95. I've seen a lot of changes, and almost all of them have been quite good. I have over 10 years of using it on various projects. While I have not been lucky enough to get a gig in which I could contribute more, I do try to contribute and sometimes it is quite difficult. The one thing that I think you guys miss is actually using PostgreSQL in some projects where the company and the deliverables don't give a rat's ass about whether you use PostgreSQL or Oracle or something else. Over the years I have beat my head against the walls suggesting features, most of which eventually have come to PostgreSQL, but every one was a fight. I have some well founded opinions about PostgreSQL hard earned from real world situations. Agree or not, I have experience with this database and I have hit many of its short comings. One consistent problem is the planner not being able to handle this or that scenario. At this stage, the *best* way to improve the planner is to add the ability to place hints in the plan. It *is* good enough for 90% of the types of queries you would ever want to do. I am dubious that you can get it demonstrably better in the last 10% or so without making it worse. Simple hints would go a HUGE way to improving the last 10%. Many of the "Why doesn't PostgreSQL use my index" questions would go away. Most of the time Tom spends looking at people's pg_stats info would drop. It would actually save time. As a PostgreSQL user, I can tell you with 100% confidence, if I had this tool, I could do my job easier. I can also tell you that while I have genuine appreciation for the current quality of the planner, I still would like to be able to tailor queries specifically to test various approaches for performance reasons.
Neil Conway <neilc@samurai.com> writes: > - good, thorough documentation of the internals (naturally this would > help attract OSS developers as well) I don't know what software you work with but the Postgres source is far and away the best documented source I've had the pleasure to read. I think it's challenging to jump into because it's a legitimately complex piece of software, not because of any deficiency in the documentation. > - plugin APIs that make it relatively easy to replace the implementation > of a subsystem whole-sale (if there's a cost to these APIs in terms of > complexity or performance, it is perhaps not worth doing) And Postgres's extensibility features like plugin languages and indexing methods are one of its strengths. > - APIs that allow people to drive the planner and executor > programmatically (as in the original question) Actually, I think that would be a neat experiment. I've often wondered about an environment where SQL is the source language and it's compiled statically into data structures representing plans. But you have to be careful, it would be easy to come up with nonsensical plans, or even plans that would be infinite loops or cause crashes. -- greg
Although this is all true, consider that adding hints will mean that the Pg developers *never* get bug reports to drive the optimizer improvement process. This will have the effect of stagnating its development. I think this would be a bad thing :-) As an aside note that DB2 UDB does not let you hint its optimizer either...I have heard it argued (by a IBM acquaintance of mine) that their optimizer is better than that other database's whose name begins with O, precisely because of this (He is biased of coarse, but it is an interesting point). regards Mark pgsql@mohawksoft.com wrote: > One consistent problem is the planner not being able to handle this or > that scenario. At this stage, the *best* way to improve the planner is to > add the ability to place hints in the plan. It *is* good enough for 90% of > the types of queries you would ever want to do. I am dubious that you can > get it demonstrably better in the last 10% or so without making it worse. > > Simple hints would go a HUGE way to improving the last 10%. Many of the > "Why doesn't PostgreSQL use my index" questions would go away. Most of the > time Tom spends looking at people's pg_stats info would drop. It would > actually save time. > > As a PostgreSQL user, I can tell you with 100% confidence, if I had this > tool, I could do my job easier. I can also tell you that while I have > genuine appreciation for the current quality of the planner, I still would > like to be able to tailor queries specifically to test various approaches > for performance reasons.
> And the user maintenance of updating those hints for every release of > PostgreSQL as we improve the database engine. I don't think so. Basically an optimizer hint simply raises or lowers the cost of an index, mandates a certain join order, allows or disallows a seq scan ... Imho it is not so far from the things people currently do with the set seq_scan= type of commands. (I don't think actually giving a certain plan is a good idea) A good optimizer hint system would imho not circumvent the optimizer, but only give it hints. The hints should be very specifically aimed, like "an index on column x" is going to be more expensive than you (the optimizer) think, if used with this query. like: select /*+ avoid_index(atab atab_x0) */ * from atab ... > > The people who are actually doing the work think their time is more > > usefully spent on improving the planner's intelligence than on devising > > ways to override it. The subject of this mail and "override it" imho goes too far, I would like to be able to give advice in the form of hints to the optimizer. > One consistent problem is the planner not being able to handle this or > that scenario. At this stage, the *best* way to improve the planner is to > add the ability to place hints in the plan. ^^^^ sql statement I agree. Andreas
On Tue, 2005-02-15 at 02:38 -0500, Greg Stark wrote: > I don't know what software you work with but the Postgres source is far and > away the best documented source I've had the pleasure to read. I agree the PostgreSQL source is very nice (for the most part), but I think there could be more higher-level documentation of the internals. For example, until a few days ago the access method API was completely undocumented (in SGML, at least). Tom has now written some good docs for it -- that's an example of the kind of improvement I'm talking about. Having documents describing "how to add a new index type", "how to add a new planner node", "how to add a new DML/DDL command", and so forth would be cool. -Neil