Thread: On query rewrite

On query rewrite

From
Sailesh Krishnamurthy
Date:
Hackers

Are there any default query rewrite rules that kick in, in the absence
of any user-defined RULE or VIEW ?

Also, is there any place that lists any "interesting" query rewrite
that PG does on queries for perf. improvement ? 

For instance, in the presence of a view or a subquery, does PG do a
subquery to join transformation ?

Thanks ! 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh




Re: On query rewrite

From
Alvaro Herrera
Date:
On Thu, May 27, 2004 at 05:14:48PM -0700, Sailesh Krishnamurthy wrote:

> Are there any default query rewrite rules that kick in, in the absence
> of any user-defined RULE or VIEW ?
> 
> Also, is there any place that lists any "interesting" query rewrite
> that PG does on queries for perf. improvement ? 
> 
> For instance, in the presence of a view or a subquery, does PG do a
> subquery to join transformation ?

Yes, there are transformations of this sort, but they are not called
query rewrite in the code's terminology, but "optimization" -- rewrite
(rules and views) happens to the parsed statement, and the optimizer
works on the output of rewriting.  So actually the optimizations happen
whether there were or not rules or views.

The query's path is
SQL -> parse -> rewrite -> optimize -> execute

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"El número de instalaciones de UNIX se ha elevado a 10,
y se espera que este número aumente" (UPM, 1972)



Re: On query rewrite

From
Sailesh Krishnamurthy
Date:
>>>>> "Alvaro" == Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
   >> For instance, in the presence of a view or a subquery, does PG   >> do a subquery to join transformation ?
   Alvaro> Yes, there are transformations of this sort, but they are   Alvaro> not called query rewrite in the code's
terminology,but   Alvaro> "optimization" -- rewrite (rules and views) happens to the   Alvaro> parsed statement, and
theoptimizer works on the output of   Alvaro> rewriting.  So actually the optimizations happen whether   Alvaro> there
wereor not rules or views.
 

Interesting .. so these are rule-based then ? Not cost-based ?

I understand that there is a cost-based optimizer anyway that does the
planning and selects the right plan .. but does this come _after_ all
these transformations ? Or does it happen along with the
transformations ? 
   Alvaro> The query's path is SQL -> parse -> rewrite -> optimize ->   Alvaro> execute

Can you please point me to the code that indeed does such
transformations ? 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh




Re: On query rewrite

From
Alvaro Herrera
Date:
On Thu, May 27, 2004 at 06:27:47PM -0700, Sailesh Krishnamurthy wrote:
> >>>>> "Alvaro" == Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
> 
>     >> For instance, in the presence of a view or a subquery, does PG
>     >> do a subquery to join transformation ?
> 
>     Alvaro> Yes, there are transformations of this sort, but they are
>     Alvaro> not called query rewrite in the code's terminology, but
>     Alvaro> "optimization" -- rewrite (rules and views) happens to the
>     Alvaro> parsed statement, and the optimizer works on the output of
>     Alvaro> rewriting.  So actually the optimizations happen whether
>     Alvaro> there were or not rules or views.
> 
> Interesting .. so these are rule-based then ? Not cost-based ?

> I understand that there is a cost-based optimizer anyway that does the
> planning and selects the right plan .. but does this come _after_ all
> these transformations ? Or does it happen along with the
> transformations ? 

No, there's no rules optimizer, only the cost-based one you already know
of.

>     Alvaro> The query's path is SQL -> parse -> rewrite -> optimize ->
>     Alvaro> execute
> 
> Can you please point me to the code that indeed does such
> transformations ? 

Sorry, I don't know the optimizer code.  You can find a lot of detail in
backend/optimizer/README.  Probably you want to look at what happens to
JOIN_IN nodes, for example, regarding the conversion of a 

WHERE foo IN (SELECT bar FROM ...)

into some kind of join.

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
"La espina, desde que nace, ya pincha" (Proverbio africano)



Re: On query rewrite

From
Sailesh Krishnamurthy
Date:
>>>>> "Alvaro" == Alvaro Herrera <alvherre@dcc.uchile.cl> writes:
   >> I understand that there is a cost-based optimizer anyway that   >> does the planning and selects the right plan
..but does this   >> come _after_ all these transformations ? Or does it happen   >> along with the transformations ?
 
   Alvaro> No, there's no rules optimizer, only the cost-based one   Alvaro> you already know of.

Okay ...
   Alvaro> The query's path is SQL -> parse -> rewrite -> optimize ->   Alvaro> execute   >>  Can you please point me
tothe code that indeed does such   >> transformations ?
 
   Alvaro> Sorry, I don't know the optimizer code.  You can find a   Alvaro> lot of detail in backend/optimizer/README.
Probably you   Alvaro> want to look at what happens to JOIN_IN nodes, for   Alvaro> example, regarding the conversion
ofa
 

Couldn't find the README but I'm operating on an older souce base. 

Anyway, thanks for the tips. I think I found what I'm looking for: the
function is probably pull_up_subqueries .. and what it tries to do is
check if a subquery is "simple" or not .. "simple" means not having
Aggs or something more complicated. If that's the case, and if some
NULLable conditions are safe then, the subquery gets pulled up -
essentially, a subquery to join transformation.


Now my next question is more subtle .. 

Are these alternatives (pulling up vs not pulling up subqueries)
considered in different plans ? 

Because, if not, it seems that this is really just a query rewrite
.. it's just that it happens as part of the "optimizer" component. In
fact, there is an implicit rule here in operation (by rule, I don't
mean a pgsql RULE).

Are more such transformations spread around the optimizer component ?
Is there any reason to have it integrated with the planner as opposed
to having it be part of the rewrite component (apart from historical
.. plus the code is solid and works etc.) ? 

Sorry for all the questions .. as I stated before I'm working on a
chapter of a text book that is a case-study of pgsql (the 4th ed
contains case studies of Oracle, DB2 and MSSQL) and the 5th ed is
gonna have pgsql. 

Another question about regular RULE processing .. suppose after
applying a rule the resultant query tree is eligible for another rule,
does pgsql's rule system keep iterating over and over until it reaches
a fixed point or is there some heuristic in operation (just apply the
rules twice ..) ? From my cursory inspection of the code it looks like
the latter, but I'd like to know for sure. 

Thanks ! 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh




Re: On query rewrite

From
Alvaro Herrera
Date:
On Thu, May 27, 2004 at 07:35:56PM -0700, Sailesh Krishnamurthy wrote:

> Anyway, thanks for the tips. I think I found what I'm looking for: the
> function is probably pull_up_subqueries .. and what it tries to do is
> check if a subquery is "simple" or not .. "simple" means not having
> Aggs or something more complicated. If that's the case, and if some
> NULLable conditions are safe then, the subquery gets pulled up -
> essentially, a subquery to join transformation.

Hmm, this code has been heavily hacked during the last few versions so
if you want to know the state of the art be sure to check a recent
version (for example if you don't see pull_up_IN_clauses() you are
missing some of the fun.)

> Are these alternatives (pulling up vs not pulling up subqueries)
> considered in different plans ? 

Yes, AFAIU.

> Are more such transformations spread around the optimizer component ?

Yes, AFAIU.

> Is there any reason to have it integrated with the planner as opposed
> to having it be part of the rewrite component (apart from historical
> .. plus the code is solid and works etc.) ? 

The current code uses cost estimation to determine whether to apply them
or not; in some situations they would lead to a more expensive (==
slower) plan, or to using huge amounts of memory (for example hash based
aggregation).

-- 
Alvaro Herrera (<alvherre[a]dcc.uchile.cl>)
Dios hizo a Adán, pero fue Eva quien lo hizo hombre.



Re: On query rewrite

From
Tom Lane
Date:
Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> writes:
> Now my next question is more subtle .. 

> Are these alternatives (pulling up vs not pulling up subqueries)
> considered in different plans ? 

That particular choice is not --- we do it if we can, else not.
Comparisons between different alternatives are always handled by
computing cost estimates and choosing the lowest.  In most cases
that's mechanized as generating Paths for all alternatives and letting
the fittest Path survive.  I think there are one or two places where
there's a more hard-wired cost comparison, because there are only a
couple of possibilities.

> Is there any reason to have it integrated with the planner as opposed
> to having it be part of the rewrite component (apart from historical

Yes --- the rewriter generally does not have as much semantic or
statistical information available as the planner does.

> Another question about regular RULE processing .. suppose after
> applying a rule the resultant query tree is eligible for another rule,
> does pgsql's rule system keep iterating over and over until it reaches
> a fixed point or is there some heuristic in operation (just apply the
> rules twice ..) ?

As of recent releases it expands till it runs out of rules or detects
infinite recursion.  You may be looking at a version that had a
give-up-after-N-levels heuristic instead of actual recursion detection.
        regards, tom lane


Re: On query rewrite

From
Sailesh Krishnamurthy
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
   >> Are these alternatives (pulling up vs not pulling up   >> subqueries) considered in different plans ?
   Tom> That particular choice is not --- we do it if we can, else   Tom> not.  Comparisons between different
alternativesare always   Tom> handled by computing cost estimates and choosing the lowest.
 

Thank you Tom (and Alvarro)

So I've understood pull_up_subqueries (called SELECT MERGE in
Starburst). 

What about things like:

1. DISTINCT PULLUP (Where you realize that you don't have to have an
explicit duplicate elimination operation because of what's done in the
subquery)

2. DISTINCT pushdown (when a dup elim. can be pushed down if the upper
querytree is performign DISTINCT set operations (UNION, INTERSECT etc)

3. Discarding DISTINCT in a subquery because the upper query uses the
subquery with existential quantification

In general, I'm trying to understand all the transformations that
pgsql will try to do .. I'm not trying to figure out plan enumeration
for basic boxes (simple query tree).

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh




Re: On query rewrite

From
Tom Lane
Date:
Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> writes:
> What about things like:

> 1. DISTINCT PULLUP (Where you realize that you don't have to have an
> explicit duplicate elimination operation because of what's done in the
> subquery)
> 2. DISTINCT pushdown (when a dup elim. can be pushed down if the upper
> querytree is performign DISTINCT set operations (UNION, INTERSECT etc)
> 3. Discarding DISTINCT in a subquery because the upper query uses the
> subquery with existential quantification

Our bottom-up planning approach isn't very conducive to #2 or #3, but we
do make a stab at #1.  See create_unique_path() and is_distinct_query()
in optimizer/util/pathnode.c (note this is new code in CVS tip, 7.4 did
not have any such optimization).

> In general, I'm trying to understand all the transformations that
> pgsql will try to do .. I'm not trying to figure out plan enumeration
> for basic boxes (simple query tree).

This particular issue is handled as part of our Path enumeration
mechanism, but the more hard-wired sorts of transformations that you are
asking about live mostly in optimizer/prep/* and plan/planner.c.  In
particular you probably want to look at prepjointree.c and prepqual.c.
(Note prepqual also looks considerably different in CVS tip than in
prior releases.)
        regards, tom lane


Re: On query rewrite

From
Sailesh Krishnamurthy
Date:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
   Tom> This particular issue is handled as part of our Path   Tom> enumeration mechanism, but the more hard-wired
sortsof   Tom> transformations that you are asking about live mostly in
 

Thanks again. To confirm the actual cost comparison with plan
enumeration is a dynamic programming algorithm, is it not ?
Selinger-style with 2-way join paths enumerated, then 3-way using the
best 2-way etc. ? 

BTW, do lots of people use the GEQO ? 
   Tom> optimizer/prep/* and plan/planner.c.  In particular you   Tom> probably want to look at prepjointree.c and
prepqual.c.  Tom> (Note prepqual also looks considerably different in CVS tip   Tom> than in prior releases.)
 

Thanks .. I've extracted cvstip .. sigh .. one of these days I'll have
to do another merge with the TelegraphCQ code. You guys hack too much :-)

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh




Re: On query rewrite

From
Bruno Wolff III
Date:
On Thu, May 27, 2004 at 19:35:56 -0700, Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> wrote:
> 
> Another question about regular RULE processing .. suppose after
> applying a rule the resultant query tree is eligible for another rule,
> does pgsql's rule system keep iterating over and over until it reaches
> a fixed point or is there some heuristic in operation (just apply the
> rules twice ..) ? From my cursory inspection of the code it looks like
> the latter, but I'd like to know for sure. 

Rule processing continues as long as there are rules to apply or the
query is terminated.

You might want to read up on rules in the documentation. They are the
mechanism used to make updateable views and can do some other interesting
things. And because they are fully visible to the optimizer (unlike
triggers) they don't prevent optimization.


Re: On query rewrite

From
Tom Lane
Date:
Sailesh Krishnamurthy <sailesh@cs.berkeley.edu> writes:
> Thanks again. To confirm the actual cost comparison with plan
> enumeration is a dynamic programming algorithm, is it not ?
> Selinger-style with 2-way join paths enumerated, then 3-way using the
> best 2-way etc. ? 

Correct.  For details see make_one_rel_by_joins in path/allpaths.c
and make_rels_by_joins in path/joinrels.c (dunno why what's basically
a single algorithm is split across two files).  There are some
heuristics involved concerning whether to consider clauseless joins,
so it's not totally trivial.

> BTW, do lots of people use the GEQO ? 

Only people writing queries that join more than a dozen or so tables.
GEQO is another thing we've improved (I think) recently, but it's still
pretty weak IMHO.  The algorithm is really designed to solve Traveling
Salesman problems, which bear only a crude resemblance to the behavior
of join problems.  I'd like to see a more principled solution in there
someday.
        regards, tom lane


Re: On query rewrite

From
Josh Berkus
Date:
Sailesh,

> BTW, do lots of people use the GEQO ? 

I do.   I've several clients with data mining databases that literally require 
45-way joins on some queries.  Even a state-of-the-art CPU balks at that.

-- 
-Josh BerkusAglio Database SolutionsSan Francisco