Thread: Planner drops unreferenced tables --- bug, no?

Planner drops unreferenced tables --- bug, no?

From
Tom Lane
Date:
Whilst chasing down a few more aggregate-related bug reports,
I realized that the planner is doing the Wrong Thing when
a query's FROM clause mentions tables that are not used
elswhere in the query.  For example, I make a table with
three rows:

play=> select x.f1 from x;
f1
--123
(3 rows)

Now:

play=> select x.f1 from x, x as x2;
f1
--123
(3 rows)

It seems to me that the latter query must yield 9 rows (three
occurrences of each value) to satisfy the SQL spec.  The spec defines
the result of a two-query FROM clause to be the Cartesian product of the
two tables, period.  It doesn't say anything about "only if one or more
columns of each table are actually used somewhere".

The particular case that led me into this was for an aggregate:

play=> select count(f1) from x;
count
-----   3
(1 row)

play=> select count(1) from x;
count
-----   1
(1 row)

Now IMHO count(1) should yield the same count as for any other non-null
expression, ie, the number of rows in the source table, because the spec
effectively says "evaluate the expression for each row and count the
number of non-null results".  The reason you get 1 here is that the
planner is dropping the "unreferenced" x, deciding that the query looks
like "select 2+2;", and generating a single-row Result plan.

Before I look into ways of fixing this, is there anyone who wants
to argue that the current behavior is correct?  It looks all wrong
to me, but...
        regards, tom lane


Re: [HACKERS] Planner drops unreferenced tables --- bug, no?

From
wieck@debis.com (Jan Wieck)
Date:

			
		

Re: [HACKERS] Planner drops unreferenced tables --- bug, no?

From
Tom Lane
Date:
wieck@debis.com (Jan Wieck) writes:
>> It seems to me that the latter query must yield 9 rows (three
>> occurrences of each value) to satisfy the SQL spec.  The spec defines
>> the result of a two-query FROM clause to be the Cartesian product of the
>> two tables, period.  It doesn't say anything about "only if one or more
>> columns of each table are actually used somewhere".

>     Caution here!

>     After  rewriting  there can be many unused rangetable entries
>     floating around. Especially if you SELECT from  a  view,  the
>     view's relation is still mentioned in the rangetable.

I was thinking of forcing rangetable entries that are marked as
'inFromCl' to be included in the planner's target relation set,
but those not so marked would only get added if referenced, same as now.
Do you think that will not work?

            regards, tom lane


Re: [HACKERS] Planner drops unreferenced tables --- bug, no?

From
wieck@debis.com (Jan Wieck)
Date:

			
		

Re: [HACKERS] Planner drops unreferenced tables --- bug, no?

From
Don Baccus
Date:
At 10:34 AM 9/29/99 -0400, Tom Lane wrote:

>play=> select x.f1 from x, x as x2;
>f1
>--
> 1
> 2
> 3
>(3 rows)
>
>It seems to me that the latter query must yield 9 rows (three
>occurrences of each value) to satisfy the SQL spec.  The spec defines
>the result of a two-query FROM clause to be the Cartesian product of the
>two tables, period.  It doesn't say anything about "only if one or more
>columns of each table are actually used somewhere".

AFAIK, this is correct.  For the heck of it, I tried it in 
Oracle, and indeed the full cartesian product's returned:


SQL> select x2.i from x, x x2;
        I
----------        1        1        1        2        2        2        3        3        3

9 rows selected.

>play=> select count(1) from x;
>count
>-----
>    1
>(1 row)

Again, Oracle 8:

SQL> select count(1) from x, x x2;
 COUNT(1)
----------        9

SQL> 



- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.
 


Re: [HACKERS] Planner drops unreferenced tables --- bug, no?

From
Tom Lane
Date:
I wrote:
>>>>> It seems to me that the latter query must yield 9 rows (three
>>>>> occurrences of each value) to satisfy the SQL spec.  The spec defines
>>>>> the result of a two-query FROM clause to be the Cartesian product of the
>>>>> two tables, period.  It doesn't say anything about "only if one or more
>>>>> columns of each table are actually used somewhere".

On further investigation, it turns out that there is actually code in
the planner that tries to do this right!  (and at one time probably
did do it right...)  add_missing_vars_to_tlist() in initsplan.c has the
specific mission of making sure any tables that are mentioned only in
the FROM clause get included into the planner's target relation set.
Unfortunately, it's been dead code for a while, because there are two
different upstream routines that do the wrong thing.

50% of the problem is in query_planner, which tries to short-circuit
the whole planning process if it doesn't see any vars at all in the
targetlist or qual.  I think this code can simply be diked out as
misguided optimization, although there might be a couple of small
changes needed elsewhere to handle the case where the target relation
set is truly empty.  (But this error is not what's preventing
"SELECT foo.f1 FROM foo, bar" from generating a join between foo and
bar; add_missing_vars_to_tlist() could still fire, so long as at least
one var appears in the query.)


wieck@debis.com (Jan Wieck) writes:
>>>> Caution here!
>> 
>>>> After  rewriting  there can be many unused rangetable entries
>>>> floating around. Especially if you SELECT from  a  view,  the
>>>> view's relation is still mentioned in the rangetable.

The other 50% of the problem is that the rewriter is overly enthusiastic
about clearing the inFromCl flag in order to prevent views from being
taken as valid join targets.  rewriteHandler.c has two different
routines that will clear inFromCl flags, and they're both bogus:

1. fireRIRrules will zap a table's inFromCl flag if the table is not
referenced by any Var in the parsetree, *whether or not the table
actually has any rules*.  This is why add_missing_vars_to_tlist() is
currently dead code in all cases.  It's wrong in another way too: if
the table has no referencing vars, fireRIRrules doesn't look for rules
applicable to the table.  That's wrong for the same reasons that
removing the table from the join set is wrong: it can still affect
the results, so the lookup and substitution should still occur.

2. If ApplyRetrieveRule is fired, it resets the inFromCl flag on *all*
tables in the query, not only the one being substituted for.  This is
just plain wrong.


I believe the right thing to do is remove ApplyRetrieveRule's change
of inFromCl entirely, and to modify fireRIRrules so that it only clears
a table's inFromCl flag if it finds an ON SELECT DO INSTEAD rule for it.
That will remove views from the join set without side-effects on the
behavior for normal tables.  (Also, fireRIRrules should try to apply
rules for a table whether it finds references to it or not; which means
that rangeTableEntry_used() isn't needed at all.)

Jan, what do you think of this?  In particular, what do you think should
happen in the following cases: 1. Table has an ON SELECT *not* INSTEAD rule. 2. There is an ON SELECT (with or without
INSTEAD)rule for one or    more fields of the table, but not for the whole table.
 

I'm not at all clear on the semantics of those kinds of rules, so I
don't know if they should remove the original table from the join set
or not.  (I'm also confused about ON SELECT INSTEAD where the INSTEAD
is not a select; is that even legal?)

Also, would it be a good idea to propagate a source view's inFromCl
flag into the substituted tables?  (That is, when firing a select rule
for a table that wasn't marked inFromCl to begin with, clear the
inFromCl flags of all RTEs that it adds to the query.)  I am not sure
if this is appropriate or not.


Actually, it would probably be cleanest if we split the functions of
inFromCl into two separate flags, say "inFromCl" and "inJoinSet".
The point of inFromCl is that when we add an implicit RTE using the
Postquel extension, it shouldn't suddenly become part of the available
namespace for unqualified column names processed later in the query.
So inFromCl controls the use of the RTE to look up unqualified names.
However, if we believe that the planner should subsequently treat that
implicit RTE as if it were a normal join target, then we need a separate
flag that carries that information.

This didn't use to be an issue, because the implicit RTE could only be
there if there were a Var reference to it; add_missing_vars_to_tlist()
should never need to do anything for it, because the RTE would have been
added to the join set already because of its Var, right?  Well, that
*used* to be true up till last week.  Now that we have a constant-
expression folder that understands about boolean and case expression
short-circuiting, it is possible for Var nodes to disappear from the
tree during optimization.  It would be a bad thing if that changed the
join semantics.  So, I think we need a flag that can force RTEs
to be included in the planner's join set regardless of whether their
Vars survive the optimizer.  And that can't be the same as inFromCl,
or there's no such thing as an implicit RTE.

With this split, inFromCl would be looked at only by the parser code
that resolves unqualified names, and inJoinSet would be looked at
by add_missing_vars_to_tlist().  The rewriter's machinations would
only need to consider whether to set/clear inJoinSet or not.

(Thomas, does any of this strike a chord with your inner/outer join
stuff?)
        regards, tom lane


Re: [HACKERS] Planner drops unreferenced tables --- bug, no?

From
Bruce Momjian
Date:
> Actually, it would probably be cleanest if we split the functions of
> inFromCl into two separate flags, say "inFromCl" and "inJoinSet".
> The point of inFromCl is that when we add an implicit RTE using the
> Postquel extension, it shouldn't suddenly become part of the available
> namespace for unqualified column names processed later in the query.
> So inFromCl controls the use of the RTE to look up unqualified names.
> However, if we believe that the planner should subsequently treat that
> implicit RTE as if it were a normal join target, then we need a separate
> flag that carries that information.

Two different flags seems like the perfect way.  Let me know if you need
any help adding the flag.  I would be glad to do it.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Planner drops unreferenced tables --- bug, no?

From
wieck@debis.com (Jan Wieck)
Date:
Tom Lane wrote:

> [...]
>
> wieck@debis.com (Jan Wieck) writes:
> >>>> Caution here!
> >>
> >>>> After  rewriting  there can be many unused rangetable entries
> >>>> floating around. Especially if you SELECT from  a  view,  the
> >>>> view's relation is still mentioned in the rangetable.
>
> The other 50% of the problem is that the rewriter is overly enthusiastic
> about clearing the inFromCl flag in order to prevent views from being
> taken as valid join targets.  rewriteHandler.c has two different
> routines that will clear inFromCl flags, and they're both bogus:
>
> [...]
>
> Jan, what do you think of this?  In particular, what do you think should
> happen in the following cases:
>   1. Table has an ON SELECT *not* INSTEAD rule.
>   2. There is an ON SELECT (with or without INSTEAD) rule for one or
>      more fields of the table, but not for the whole table.
>
> I'm not at all clear on the semantics of those kinds of rules, so I
> don't know if they should remove the original table from the join set
> or not.  (I'm also confused about ON SELECT INSTEAD where the INSTEAD
> is not a select; is that even legal?)
>
> Also, would it be a good idea to propagate a source view's inFromCl
> flag into the substituted tables?  (That is, when firing a select rule
> for a table that wasn't marked inFromCl to begin with, clear the
> inFromCl flags of all RTEs that it adds to the query.)  I am not sure
> if this is appropriate or not.

    Don't  worry  about it, those rules cannot occur and I'm sure
    we'll never reincarnate them in the future.

    The only allowed rule ON SELECT is one that

    - IS INSTEAD
    - is named "_RET<relation-name>"
    - has  one  action  which  must  be  another  SELECT  with  a
      targetlist  producing exactly the relations attribute list.

    Again: If a relation has a rule ON SELECT, it IS A  VIEW.  No
    relation can have more that one rule ON SELECT.

    I've  disabled all the other cases in RewriteDefine() on v6.4
    - I think - because of the unclear semantics. Rules ON SELECT
    where  planned  to  have  different actions or rewrite single
    attributes too. But ON SELECT rules must be  applied  on  all
    relations  which  get scanned, so if there would be a rule ON
    SELECT that inserts some logging into another relation,  this
    would actually occur ON UPDATE and ON DELETE to it's relation
    too because to do the UPDATE/DELETE it's relation has  to  be
    scanned.

    I  think  it's correct to MOVE the inFromCl from the relation
    rewritten to the join relations coming with the view's  rule.
    Thus  clear  it  on the RTE rewritten and on the first two of
    the rules (which are allways NEW and OLD for all rules). Then
    set  all  other  RTE's which come from the view to the former
    inFromCl state of the rewritten RTE.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [HACKERS] Planner drops unreferenced tables --- bug, no?

From
Tom Lane
Date:
wieck@debis.com (Jan Wieck) writes:
>     I  think  it's correct to MOVE the inFromCl from the relation
>     rewritten to the join relations coming with the view's  rule.
>     Thus  clear  it  on the RTE rewritten and on the first two of
>     the rules (which are allways NEW and OLD for all rules). Then
>     set  all  other  RTE's which come from the view to the former
>     inFromCl state of the rewritten RTE.

OK, I will do that.  My first-cut code (attached, please look it over)
passes regress test without it, but we know how much that's worth ;-).

Actually I think moving inJoinSet is now the important thing...
        regards, tom lane


All mention of inFromCl/inJoinSet removed from ApplyRetrieveRule;
fireRIRrules loop looks like:
   rt_index = 0;   while (rt_index < length(parsetree->rtable))   {       ++rt_index;
       rte = nth(rt_index - 1, parsetree->rtable);
       /*        * If the table is not one named in the original FROM clause        * then it must be referenced in the
query,or we ignore it.        * This prevents infinite expansion loop due to new rtable        * entries inserted by
expansionof a rule.        */       if (! rte->inFromCl && rt_index != parsetree->resultRelation &&           !
rangeTableEntry_used((Node*) parsetree, rt_index, 0))       {           /* Make sure the planner ignores it too... */
       rte->inJoinSet = false;           continue;       }
 
       rel = heap_openr(rte->relname, AccessShareLock);       rules = rel->rd_rules;       if (rules == NULL)       {
       heap_close(rel, AccessShareLock);           continue;       }
 
       locks = NIL;
       /*        * Collect the RIR rules that we must apply        */       for (i = 0; i < rules->numLocks; i++)
{          rule = rules->rules[i];           if (rule->event != CMD_SELECT)               continue;
 
           if (rule->attrno > 0)           {               /* per-attr rule; do we need it? */               if (!
attribute_used((Node*) parsetree,                                    rt_index,
rule->attrno,0))                   continue;           }           else           {               /* Rel-wide ON SELECT
DOINSTEAD means this is a view.                * Remove the view from the planner's join target set,                *
orwe'll get no rows out because view itself is empty!                */               if (rule->isInstead)
    rte->inJoinSet = false;           }
 
           locks = lappend(locks, rule);       }
       /*        * Check permissions        */       checkLockPerms(locks, parsetree, rt_index);
       /*        * Now apply them        */       foreach(l, locks)       {           rule = lfirst(l);
           RIRonly.event = rule->event;           RIRonly.attrno = rule->attrno;           RIRonly.qual = rule->qual;
       RIRonly.actions = rule->actions;
 
           parsetree = ApplyRetrieveRule(parsetree,                                         &RIRonly,
                     rt_index,                                         RIRonly.attrno == -1,
            rel,                                         &modified);       }
 
       heap_close(rel, AccessShareLock);   }