Thread: Recursive optimization of IN subqueries
Hi As far as I can tell, the pull_up_IN_clauses does not optimize recursively. Am I totally misguided here? Index: plan/subselect.c =================================================================== RCS file: /projects/cvsroot/pgsql-server/src/backend/optimizer/plan/subselect.c,v retrieving revision 1.85 diff -u -r1.85 subselect.c --- plan/subselect.c 25 Nov 2003 23:59:12 -0000 1.85 +++ plan/subselect.c 20 Jan 2004 15:21:55 -0000 @@ -716,6 +716,14 @@ if (contain_volatile_functions((Node *) sublink->lefthand)) return NULL; + /* + * Optimize recursive + */ + subselect->in_info_list = NIL; + if (subselect->hasSubLinks) + subselect->jointree->quals = pull_up_IN_clauses(subselect, + subselect->jointree->quals); + /* * Okay, pull up the sub-select into top range table and jointree. * -- Dennis
Dennis Haney <davh@diku.dk> writes: > As far as I can tell, the pull_up_IN_clauses does not optimize > recursively. Am I totally misguided here? Yes. The subquery is not being physically folded into the outer query (so the comment about "pulling up" may be a poor choice of words). It will still get planned separately by a recursive call to subquery_planner, and any internal INs will get fixed at that time. It is possible and even rather likely that the subsequent run of pull_up_subqueries will flatten the subquery into the outer query, and if so its internal INs are fixed during pull_up_subqueries. But doing it here would be redundant. You can easily prove by experiment that multi-level flattening does happen, for instance: regression=# explain select * from tenk1 a where unique1 in regression-# (select unique2 from tenk1 b where unique1 in regression(# (select thousand from tenk1 c where hundred = 99)); QUERY PLAN -------------------------------------------------------------------------------- ------------------------ Nested Loop (cost=411.66..471.82 rows=10 width=244) -> HashAggregate (cost=411.66..411.66 rows=10 width=4) -> Nested Loop (cost=351.47..411.63 rows=10 width=4) -> HashAggregate (cost=351.47..351.47 rows=10 width=4) -> Index Scan using tenk1_hundred on tenk1 c (cost=0.00..351.23 rows=99 width=4) Index Cond: (hundred = 99) -> Index Scan using tenk1_unique1 on tenk1 b (cost=0.00..6.00 rows=1 width=8) Index Cond: (b.unique1 = "outer".thousand) -> Index Scan using tenk1_unique1 on tenk1 a (cost=0.00..6.00 rows=1 width=244) Index Cond: (a.unique1 = "outer".unique2) (10 rows) regards, tom lane
Tom Lane wrote:
I saw it as though convert_IN_to_join rewrote the query from
select a.* from tenk1 a where a.unique1 in
(select c.thousand from tenk1 c where c.hundred = 99);
to
select a.* from tenk1 a, tenk1 c where a.unique1 = c.thousand AND c.hundred = 99;
But after looking at it, I've reached the conclusion that the rewrite is to this instead:
select a.* from tenk1 a, (select d.thousand from tenk1 d where d.hundred = 99) as c where a.unique1 = c.thousand;
except the subselect is added as a range table entry instead of a subselect in the from-list (not that I understand this particular part, do you mind explaining?).
Or am I still totally lost?
I think I figured it out now, after looking at it for hours...Dennis Haney <davh@diku.dk> writes:As far as I can tell, the pull_up_IN_clauses does not optimize recursively. Am I totally misguided here?Yes. The subquery is not being physically folded into the outer query (so the comment about "pulling up" may be a poor choice of words). It will still get planned separately by a recursive call to subquery_planner, and any internal INs will get fixed at that time. It is possible and even rather likely that the subsequent run of pull_up_subqueries will flatten the subquery into the outer query, and if so its internal INs are fixed during pull_up_subqueries. But doing it here would be redundant.
I saw it as though convert_IN_to_join rewrote the query from
select a.* from tenk1 a where a.unique1 in
(select c.thousand from tenk1 c where c.hundred = 99);
to
select a.* from tenk1 a, tenk1 c where a.unique1 = c.thousand AND c.hundred = 99;
But after looking at it, I've reached the conclusion that the rewrite is to this instead:
select a.* from tenk1 a, (select d.thousand from tenk1 d where d.hundred = 99) as c where a.unique1 = c.thousand;
except the subselect is added as a range table entry instead of a subselect in the from-list (not that I understand this particular part, do you mind explaining?).
Or am I still totally lost?
-- Dennis
Dennis Haney <davh@diku.dk> writes: > I saw it as though convert_IN_to_join rewrote the query from > select a.* from tenk1 a where a.unique1 in > (select c.thousand from tenk1 c where c.hundred = 99); > to > select a.* from tenk1 a, tenk1 c where a.unique1 = c.thousand AND > c.hundred = 99; > But after looking at it, I've reached the conclusion that the rewrite is > to this instead: > select a.* from tenk1 a, (select d.thousand from tenk1 d where > d.hundred = 99) as c where a.unique1 = c.thousand; Right. We do that, and then subsequently pull_up_subqueries transforms it to the other representation. The reason for this two-step approach is that the intermediate form is still a useful improvement if the subquery can't be pulled up for some reason (e.g., it's got grouping). > except the subselect is added as a range table entry instead of a > subselect in the from-list (not that I understand this particular part, > do you mind explaining?). Same thing. Every entry in the from-list will have both an RTE and an entry in the join tree. This representation is partly historical (before we had outer joins, there was only the range table and no join tree at all), but it is convenient for many purposes. regards, tom lane PS: this is a bit off-topic for pgsql-general, please pursue it on -hackers if you have more questions.