Re: Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project - Mailing list pgsql-hackers
From | Arun Chaitanya |
---|---|
Subject | Re: Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project |
Date | |
Msg-id | CAFDHfCSmuwQ3m3AYFsvR3jfD4=f5fY3KGYJ7CyePAxS1Y67CrQ@mail.gmail.com Whole thread Raw |
In response to | Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project (Arun Chaitanya <chaitan64arun@gmail.com>) |
Responses |
Re: Optimizing Nested Correlated Queries by decorrelation: GSOC 2012 Project
|
List | pgsql-hackers |
Thanks a lot Heikki. I have already posted an example in the mail. The link to the paper is http://www.iith.ac.in/~ravig/courses/cs5050/papers/decorrelation-cesar.pdf Hope this helps, Arun On Fri, Mar 30, 2012 at 5:32 PM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote: > (off-list) > > You'll want to post a link to the paper, otherwise people who are not GSoC > mentors will have no idea what you're talking about ;-). Posting an example > query and access plans would illustrate the point, too. > > > On 30.03.2012 14:33, Arun Chaitanya wrote: >> >> Hi, >> >> I wanted to take up this as a GSOC 2012 project. >> >> SQL supports nested queries. When the inner query contains a >> correlation variable the present optimizer takes an iterative >> execution plan. If the inner query scans over a relation, the >> iterative plan chosen can be sub-optimal. >> >> The goal of this project is to enable De-correlation for all possible >> cases. The iterative execution plan can be converted to a set oriented >> plan by taking a join over the base relations involved. Sufficient >> work has already been done in this area and has been implemented in >> SQL Server. >> >> The changes required to incorporate the above mentioned strategy is in >> rewriting phase to the best of my knowledge. The key idea is to >> introduce the APPLY operator in the raw parse tree. In the above >> mentioned Papers, the author has mentioned the process of removing the >> apply. The author has proposed a set of rules which will allow us to >> achieve the goal. The present postgresql optimizer community has done >> some work in these lines for simple subqueries involving =,> ,< in >> the predicates [ I observed it by seeing the result of EXPLAIN for >> relevant queries ]. The optimization is not done for subqueries >> containing aggregate queries and existential and containment queries. >> >> An example query from TPCH benchmark discussed by the author: >> >> select c_custkey >> from customer >> where 1000000< (select sum(o_totalprice) >> from orders >> where o_custkey = c_custkey) >> >> In the above case, c_custkey is a correlation variable (parameterized) >> coming from the outer query. Hence in the present system, the inner >> query is executed as many times as the tuples in the customer >> relation. As the subQuery involves a scan over orders relation, the >> total I/O cost involved is pretty high. >> >> Using the transformation proposed by the author, the query can be >> re-written as >> >> select c_custkey >> from customer left outer join >> orders on o_custkey = c custkey >> group by c_custkey >> having 1000000< sum(o_totalprice) >> >> This allows the optimizer to chose a better plan for left outer join >> and avoid iterative execution. The idea here is not to get a rewritten >> query as output but to generate a plan tree that does the same as the >> above query. >> >> Regards, >> Arun >> > > > -- > Heikki Linnakangas > EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: