Thread: On query rewrite
Hackers Are there any default query rewrite rules that kick in, in the absence of any user-defined RULE or VIEW ? Also, is there any place that lists any "interesting" query rewrite that PG does on queries for perf. improvement ? For instance, in the presence of a view or a subquery, does PG do a subquery to join transformation ? Thanks ! -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh
On Thu, May 27, 2004 at 05:14:48PM -0700, Sailesh Krishnamurthy wrote: > Are there any default query rewrite rules that kick in, in the absence > of any user-defined RULE or VIEW ? > > Also, is there any place that lists any "interesting" query rewrite > that PG does on queries for perf. improvement ? > > For instance, in the presence of a view or a subquery, does PG do a > subquery to join transformation ? Yes, there are transformations of this sort, but they are not called query rewrite in the code's terminology, but "optimization" -- rewrite (rules and views) happens to the parsed statement, and the optimizer works on the output of rewriting. So actually the optimizations happen whether there were or not rules or views. The query's path is SQL -> parse -> rewrite -> optimize -> execute -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "El número de instalaciones de UNIX se ha elevado a 10, y se espera que este número aumente" (UPM, 1972)
>>>>> "Alvaro" == Alvaro Herrera <alvherre@dcc.uchile.cl> writes: >> For instance, in the presence of a view or a subquery, does PG >> do a subquery to join transformation ? Alvaro> Yes, there are transformations of this sort, but they are Alvaro> not called query rewrite in the code's terminology,but Alvaro> "optimization" -- rewrite (rules and views) happens to the Alvaro> parsed statement, and theoptimizer works on the output of Alvaro> rewriting. So actually the optimizations happen whether Alvaro> there wereor not rules or views. Interesting .. so these are rule-based then ? Not cost-based ? I understand that there is a cost-based optimizer anyway that does the planning and selects the right plan .. but does this come _after_ all these transformations ? Or does it happen along with the transformations ? Alvaro> The query's path is SQL -> parse -> rewrite -> optimize -> Alvaro> execute Can you please point me to the code that indeed does such transformations ? -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh
On Thu, May 27, 2004 at 06:27:47PM -0700, Sailesh Krishnamurthy wrote: > >>>>> "Alvaro" == Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > > >> For instance, in the presence of a view or a subquery, does PG > >> do a subquery to join transformation ? > > Alvaro> Yes, there are transformations of this sort, but they are > Alvaro> not called query rewrite in the code's terminology, but > Alvaro> "optimization" -- rewrite (rules and views) happens to the > Alvaro> parsed statement, and the optimizer works on the output of > Alvaro> rewriting. So actually the optimizations happen whether > Alvaro> there were or not rules or views. > > Interesting .. so these are rule-based then ? Not cost-based ? > I understand that there is a cost-based optimizer anyway that does the > planning and selects the right plan .. but does this come _after_ all > these transformations ? Or does it happen along with the > transformations ? No, there's no rules optimizer, only the cost-based one you already know of. > Alvaro> The query's path is SQL -> parse -> rewrite -> optimize -> > Alvaro> execute > > Can you please point me to the code that indeed does such > transformations ? Sorry, I don't know the optimizer code. You can find a lot of detail in backend/optimizer/README. Probably you want to look at what happens to JOIN_IN nodes, for example, regarding the conversion of a WHERE foo IN (SELECT bar FROM ...) into some kind of join. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "La espina, desde que nace, ya pincha" (Proverbio africano)
>>>>> "Alvaro" == Alvaro Herrera <alvherre@dcc.uchile.cl> writes: >> I understand that there is a cost-based optimizer anyway that >> does the planning and selects the right plan ..but does this >> come _after_ all these transformations ? Or does it happen >> along with the transformations ? Alvaro> No, there's no rules optimizer, only the cost-based one Alvaro> you already know of. Okay ... Alvaro> The query's path is SQL -> parse -> rewrite -> optimize -> Alvaro> execute >> Can you please point me tothe code that indeed does such >> transformations ? Alvaro> Sorry, I don't know the optimizer code. You can find a Alvaro> lot of detail in backend/optimizer/README. Probably you Alvaro> want to look at what happens to JOIN_IN nodes, for Alvaro> example, regarding the conversion ofa Couldn't find the README but I'm operating on an older souce base. Anyway, thanks for the tips. I think I found what I'm looking for: the function is probably pull_up_subqueries .. and what it tries to do is check if a subquery is "simple" or not .. "simple" means not having Aggs or something more complicated. If that's the case, and if some NULLable conditions are safe then, the subquery gets pulled up - essentially, a subquery to join transformation. Now my next question is more subtle .. Are these alternatives (pulling up vs not pulling up subqueries) considered in different plans ? Because, if not, it seems that this is really just a query rewrite .. it's just that it happens as part of the "optimizer" component. In fact, there is an implicit rule here in operation (by rule, I don't mean a pgsql RULE). Are more such transformations spread around the optimizer component ? Is there any reason to have it integrated with the planner as opposed to having it be part of the rewrite component (apart from historical .. plus the code is solid and works etc.) ? Sorry for all the questions .. as I stated before I'm working on a chapter of a text book that is a case-study of pgsql (the 4th ed contains case studies of Oracle, DB2 and MSSQL) and the 5th ed is gonna have pgsql. Another question about regular RULE processing .. suppose after applying a rule the resultant query tree is eligible for another rule, does pgsql's rule system keep iterating over and over until it reaches a fixed point or is there some heuristic in operation (just apply the rules twice ..) ? From my cursory inspection of the code it looks like the latter, but I'd like to know for sure. Thanks ! -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh
On Thu, May 27, 2004 at 07:35:56PM -0700, Sailesh Krishnamurthy wrote: > Anyway, thanks for the tips. I think I found what I'm looking for: the > function is probably pull_up_subqueries .. and what it tries to do is > check if a subquery is "simple" or not .. "simple" means not having > Aggs or something more complicated. If that's the case, and if some > NULLable conditions are safe then, the subquery gets pulled up - > essentially, a subquery to join transformation. Hmm, this code has been heavily hacked during the last few versions so if you want to know the state of the art be sure to check a recent version (for example if you don't see pull_up_IN_clauses() you are missing some of the fun.) > Are these alternatives (pulling up vs not pulling up subqueries) > considered in different plans ? Yes, AFAIU. > Are more such transformations spread around the optimizer component ? Yes, AFAIU. > Is there any reason to have it integrated with the planner as opposed > to having it be part of the rewrite component (apart from historical > .. plus the code is solid and works etc.) ? The current code uses cost estimation to determine whether to apply them or not; in some situations they would lead to a more expensive (== slower) plan, or to using huge amounts of memory (for example hash based aggregation). -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) Dios hizo a Adán, pero fue Eva quien lo hizo hombre.
Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> writes: > Now my next question is more subtle .. > Are these alternatives (pulling up vs not pulling up subqueries) > considered in different plans ? That particular choice is not --- we do it if we can, else not. Comparisons between different alternatives are always handled by computing cost estimates and choosing the lowest. In most cases that's mechanized as generating Paths for all alternatives and letting the fittest Path survive. I think there are one or two places where there's a more hard-wired cost comparison, because there are only a couple of possibilities. > Is there any reason to have it integrated with the planner as opposed > to having it be part of the rewrite component (apart from historical Yes --- the rewriter generally does not have as much semantic or statistical information available as the planner does. > Another question about regular RULE processing .. suppose after > applying a rule the resultant query tree is eligible for another rule, > does pgsql's rule system keep iterating over and over until it reaches > a fixed point or is there some heuristic in operation (just apply the > rules twice ..) ? As of recent releases it expands till it runs out of rules or detects infinite recursion. You may be looking at a version that had a give-up-after-N-levels heuristic instead of actual recursion detection. regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> Are these alternatives (pulling up vs not pulling up >> subqueries) considered in different plans ? Tom> That particular choice is not --- we do it if we can, else Tom> not. Comparisons between different alternativesare always Tom> handled by computing cost estimates and choosing the lowest. Thank you Tom (and Alvarro) So I've understood pull_up_subqueries (called SELECT MERGE in Starburst). What about things like: 1. DISTINCT PULLUP (Where you realize that you don't have to have an explicit duplicate elimination operation because of what's done in the subquery) 2. DISTINCT pushdown (when a dup elim. can be pushed down if the upper querytree is performign DISTINCT set operations (UNION, INTERSECT etc) 3. Discarding DISTINCT in a subquery because the upper query uses the subquery with existential quantification In general, I'm trying to understand all the transformations that pgsql will try to do .. I'm not trying to figure out plan enumeration for basic boxes (simple query tree). -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh
Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> writes: > What about things like: > 1. DISTINCT PULLUP (Where you realize that you don't have to have an > explicit duplicate elimination operation because of what's done in the > subquery) > 2. DISTINCT pushdown (when a dup elim. can be pushed down if the upper > querytree is performign DISTINCT set operations (UNION, INTERSECT etc) > 3. Discarding DISTINCT in a subquery because the upper query uses the > subquery with existential quantification Our bottom-up planning approach isn't very conducive to #2 or #3, but we do make a stab at #1. See create_unique_path() and is_distinct_query() in optimizer/util/pathnode.c (note this is new code in CVS tip, 7.4 did not have any such optimization). > In general, I'm trying to understand all the transformations that > pgsql will try to do .. I'm not trying to figure out plan enumeration > for basic boxes (simple query tree). This particular issue is handled as part of our Path enumeration mechanism, but the more hard-wired sorts of transformations that you are asking about live mostly in optimizer/prep/* and plan/planner.c. In particular you probably want to look at prepjointree.c and prepqual.c. (Note prepqual also looks considerably different in CVS tip than in prior releases.) regards, tom lane
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> This particular issue is handled as part of our Path Tom> enumeration mechanism, but the more hard-wired sortsof Tom> transformations that you are asking about live mostly in Thanks again. To confirm the actual cost comparison with plan enumeration is a dynamic programming algorithm, is it not ? Selinger-style with 2-way join paths enumerated, then 3-way using the best 2-way etc. ? BTW, do lots of people use the GEQO ? Tom> optimizer/prep/* and plan/planner.c. In particular you Tom> probably want to look at prepjointree.c and prepqual.c. Tom> (Note prepqual also looks considerably different in CVS tip Tom> than in prior releases.) Thanks .. I've extracted cvstip .. sigh .. one of these days I'll have to do another merge with the TelegraphCQ code. You guys hack too much :-) -- Pip-pip Sailesh http://www.cs.berkeley.edu/~sailesh
On Thu, May 27, 2004 at 19:35:56 -0700, Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> wrote: > > Another question about regular RULE processing .. suppose after > applying a rule the resultant query tree is eligible for another rule, > does pgsql's rule system keep iterating over and over until it reaches > a fixed point or is there some heuristic in operation (just apply the > rules twice ..) ? From my cursory inspection of the code it looks like > the latter, but I'd like to know for sure. Rule processing continues as long as there are rules to apply or the query is terminated. You might want to read up on rules in the documentation. They are the mechanism used to make updateable views and can do some other interesting things. And because they are fully visible to the optimizer (unlike triggers) they don't prevent optimization.
Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> writes: > Thanks again. To confirm the actual cost comparison with plan > enumeration is a dynamic programming algorithm, is it not ? > Selinger-style with 2-way join paths enumerated, then 3-way using the > best 2-way etc. ? Correct. For details see make_one_rel_by_joins in path/allpaths.c and make_rels_by_joins in path/joinrels.c (dunno why what's basically a single algorithm is split across two files). There are some heuristics involved concerning whether to consider clauseless joins, so it's not totally trivial. > BTW, do lots of people use the GEQO ? Only people writing queries that join more than a dozen or so tables. GEQO is another thing we've improved (I think) recently, but it's still pretty weak IMHO. The algorithm is really designed to solve Traveling Salesman problems, which bear only a crude resemblance to the behavior of join problems. I'd like to see a more principled solution in there someday. regards, tom lane
Sailesh, > BTW, do lots of people use the GEQO ? I do. I've several clients with data mining databases that literally require 45-way joins on some queries. Even a state-of-the-art CPU balks at that. -- -Josh BerkusAglio Database SolutionsSan Francisco