Thread: Recursive optimization of IN subqueries

Recursive optimization of IN subqueries

From
Dennis Haney
Date:
Hi

As far as I can tell, the pull_up_IN_clauses does not optimize
recursively. Am I totally misguided here?

Index: plan/subselect.c
===================================================================
RCS file:
/projects/cvsroot/pgsql-server/src/backend/optimizer/plan/subselect.c,v
retrieving revision 1.85
diff -u -r1.85 subselect.c
--- plan/subselect.c    25 Nov 2003 23:59:12 -0000      1.85
+++ plan/subselect.c    20 Jan 2004 15:21:55 -0000
@@ -716,6 +716,14 @@
        if (contain_volatile_functions((Node *) sublink->lefthand))
                return NULL;

+    /*
+     * Optimize recursive
+     */
+       subselect->in_info_list = NIL;
+       if (subselect->hasSubLinks)
+         subselect->jointree->quals = pull_up_IN_clauses(subselect,
+
subselect->jointree->quals);
+
        /*
         * Okay, pull up the sub-select into top range table and jointree.
         *

--
Dennis



Re: Recursive optimization of IN subqueries

From
Tom Lane
Date:
Dennis Haney <davh@diku.dk> writes:
> As far as I can tell, the pull_up_IN_clauses does not optimize
> recursively. Am I totally misguided here?

Yes.  The subquery is not being physically folded into the outer query
(so the comment about "pulling up" may be a poor choice of words).
It will still get planned separately by a recursive call to
subquery_planner, and any internal INs will get fixed at that time.

It is possible and even rather likely that the subsequent run of
pull_up_subqueries will flatten the subquery into the outer query,
and if so its internal INs are fixed during pull_up_subqueries.
But doing it here would be redundant.

You can easily prove by experiment that multi-level flattening does
happen, for instance:

regression=# explain select * from tenk1 a where unique1 in
regression-# (select unique2 from tenk1 b where unique1 in
regression(# (select thousand from tenk1 c where hundred = 99));
                                               QUERY PLAN

--------------------------------------------------------------------------------
------------------------
 Nested Loop  (cost=411.66..471.82 rows=10 width=244)
   ->  HashAggregate  (cost=411.66..411.66 rows=10 width=4)
         ->  Nested Loop  (cost=351.47..411.63 rows=10 width=4)
               ->  HashAggregate  (cost=351.47..351.47 rows=10 width=4)
                     ->  Index Scan using tenk1_hundred on tenk1 c  (cost=0.00..351.23 rows=99 width=4)
                           Index Cond: (hundred = 99)
               ->  Index Scan using tenk1_unique1 on tenk1 b  (cost=0.00..6.00 rows=1 width=8)
                     Index Cond: (b.unique1 = "outer".thousand)
   ->  Index Scan using tenk1_unique1 on tenk1 a  (cost=0.00..6.00 rows=1 width=244)
         Index Cond: (a.unique1 = "outer".unique2)
(10 rows)


            regards, tom lane

Re: Recursive optimization of IN subqueries

From
Dennis Haney
Date:
Tom Lane wrote:
Dennis Haney <davh@diku.dk> writes: 
As far as I can tell, the pull_up_IN_clauses does not optimize 
recursively. Am I totally misguided here?   
Yes.  The subquery is not being physically folded into the outer query
(so the comment about "pulling up" may be a poor choice of words).
It will still get planned separately by a recursive call to
subquery_planner, and any internal INs will get fixed at that time.

It is possible and even rather likely that the subsequent run of
pull_up_subqueries will flatten the subquery into the outer query,
and if so its internal INs are fixed during pull_up_subqueries.
But doing it here would be redundant. 
I think I figured it out now, after looking at it for hours...

I saw it as though convert_IN_to_join rewrote the query from

select a.* from tenk1 a where a.unique1 in
(select c.thousand from tenk1 c where c.hundred = 99);

to

select a.* from tenk1 a, tenk1 c where a.unique1 = c.thousand AND c.hundred = 99;

But after looking at it, I've reached the conclusion that the rewrite is to this instead:

select a.* from tenk1 a,  (select d.thousand from tenk1 d where d.hundred = 99) as c where a.unique1 = c.thousand;

except the subselect is added as a range table entry instead of a subselect in the from-list (not that I understand this particular part, do you mind explaining?).

Or am I still totally lost?

-- 
Dennis

Re: Recursive optimization of IN subqueries

From
Tom Lane
Date:
Dennis Haney <davh@diku.dk> writes:
> I saw it as though convert_IN_to_join rewrote the query from

> select a.* from tenk1 a where a.unique1 in
> (select c.thousand from tenk1 c where c.hundred = 99);

> to

> select a.* from tenk1 a, tenk1 c where a.unique1 = c.thousand AND
> c.hundred = 99;

> But after looking at it, I've reached the conclusion that the rewrite is
> to this instead:

> select a.* from tenk1 a,  (select d.thousand from tenk1 d where
> d.hundred = 99) as c where a.unique1 = c.thousand;

Right.  We do that, and then subsequently pull_up_subqueries transforms
it to the other representation.  The reason for this two-step approach
is that the intermediate form is still a useful improvement if the
subquery can't be pulled up for some reason (e.g., it's got grouping).

> except the subselect is added as a range table entry instead of a
> subselect in the from-list (not that I understand this particular part,
> do you mind explaining?).

Same thing.  Every entry in the from-list will have both an RTE and an
entry in the join tree.  This representation is partly historical
(before we had outer joins, there was only the range table and no join
tree at all), but it is convenient for many purposes.

            regards, tom lane

PS: this is a bit off-topic for pgsql-general, please pursue it on
-hackers if you have more questions.