Thread: Planner drops unreferenced tables --- bug, no?
Whilst chasing down a few more aggregate-related bug reports, I realized that the planner is doing the Wrong Thing when a query's FROM clause mentions tables that are not used elswhere in the query. For example, I make a table with three rows: play=> select x.f1 from x; f1 --123 (3 rows) Now: play=> select x.f1 from x, x as x2; f1 --123 (3 rows) It seems to me that the latter query must yield 9 rows (three occurrences of each value) to satisfy the SQL spec. The spec defines the result of a two-query FROM clause to be the Cartesian product of the two tables, period. It doesn't say anything about "only if one or more columns of each table are actually used somewhere". The particular case that led me into this was for an aggregate: play=> select count(f1) from x; count ----- 3 (1 row) play=> select count(1) from x; count ----- 1 (1 row) Now IMHO count(1) should yield the same count as for any other non-null expression, ie, the number of rows in the source table, because the spec effectively says "evaluate the expression for each row and count the number of non-null results". The reason you get 1 here is that the planner is dropping the "unreferenced" x, deciding that the query looks like "select 2+2;", and generating a single-row Result plan. Before I look into ways of fixing this, is there anyone who wants to argue that the current behavior is correct? It looks all wrong to me, but... regards, tom lane
wieck@debis.com (Jan Wieck) writes: >> It seems to me that the latter query must yield 9 rows (three >> occurrences of each value) to satisfy the SQL spec. The spec defines >> the result of a two-query FROM clause to be the Cartesian product of the >> two tables, period. It doesn't say anything about "only if one or more >> columns of each table are actually used somewhere". > Caution here! > After rewriting there can be many unused rangetable entries > floating around. Especially if you SELECT from a view, the > view's relation is still mentioned in the rangetable. I was thinking of forcing rangetable entries that are marked as 'inFromCl' to be included in the planner's target relation set, but those not so marked would only get added if referenced, same as now. Do you think that will not work? regards, tom lane
At 10:34 AM 9/29/99 -0400, Tom Lane wrote: >play=> select x.f1 from x, x as x2; >f1 >-- > 1 > 2 > 3 >(3 rows) > >It seems to me that the latter query must yield 9 rows (three >occurrences of each value) to satisfy the SQL spec. The spec defines >the result of a two-query FROM clause to be the Cartesian product of the >two tables, period. It doesn't say anything about "only if one or more >columns of each table are actually used somewhere". AFAIK, this is correct. For the heck of it, I tried it in Oracle, and indeed the full cartesian product's returned: SQL> select x2.i from x, x x2; I ---------- 1 1 1 2 2 2 3 3 3 9 rows selected. >play=> select count(1) from x; >count >----- > 1 >(1 row) Again, Oracle 8: SQL> select count(1) from x, x x2; COUNT(1) ---------- 9 SQL> - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
I wrote: >>>>> It seems to me that the latter query must yield 9 rows (three >>>>> occurrences of each value) to satisfy the SQL spec. The spec defines >>>>> the result of a two-query FROM clause to be the Cartesian product of the >>>>> two tables, period. It doesn't say anything about "only if one or more >>>>> columns of each table are actually used somewhere". On further investigation, it turns out that there is actually code in the planner that tries to do this right! (and at one time probably did do it right...) add_missing_vars_to_tlist() in initsplan.c has the specific mission of making sure any tables that are mentioned only in the FROM clause get included into the planner's target relation set. Unfortunately, it's been dead code for a while, because there are two different upstream routines that do the wrong thing. 50% of the problem is in query_planner, which tries to short-circuit the whole planning process if it doesn't see any vars at all in the targetlist or qual. I think this code can simply be diked out as misguided optimization, although there might be a couple of small changes needed elsewhere to handle the case where the target relation set is truly empty. (But this error is not what's preventing "SELECT foo.f1 FROM foo, bar" from generating a join between foo and bar; add_missing_vars_to_tlist() could still fire, so long as at least one var appears in the query.) wieck@debis.com (Jan Wieck) writes: >>>> Caution here! >> >>>> After rewriting there can be many unused rangetable entries >>>> floating around. Especially if you SELECT from a view, the >>>> view's relation is still mentioned in the rangetable. The other 50% of the problem is that the rewriter is overly enthusiastic about clearing the inFromCl flag in order to prevent views from being taken as valid join targets. rewriteHandler.c has two different routines that will clear inFromCl flags, and they're both bogus: 1. fireRIRrules will zap a table's inFromCl flag if the table is not referenced by any Var in the parsetree, *whether or not the table actually has any rules*. This is why add_missing_vars_to_tlist() is currently dead code in all cases. It's wrong in another way too: if the table has no referencing vars, fireRIRrules doesn't look for rules applicable to the table. That's wrong for the same reasons that removing the table from the join set is wrong: it can still affect the results, so the lookup and substitution should still occur. 2. If ApplyRetrieveRule is fired, it resets the inFromCl flag on *all* tables in the query, not only the one being substituted for. This is just plain wrong. I believe the right thing to do is remove ApplyRetrieveRule's change of inFromCl entirely, and to modify fireRIRrules so that it only clears a table's inFromCl flag if it finds an ON SELECT DO INSTEAD rule for it. That will remove views from the join set without side-effects on the behavior for normal tables. (Also, fireRIRrules should try to apply rules for a table whether it finds references to it or not; which means that rangeTableEntry_used() isn't needed at all.) Jan, what do you think of this? In particular, what do you think should happen in the following cases: 1. Table has an ON SELECT *not* INSTEAD rule. 2. There is an ON SELECT (with or without INSTEAD)rule for one or more fields of the table, but not for the whole table. I'm not at all clear on the semantics of those kinds of rules, so I don't know if they should remove the original table from the join set or not. (I'm also confused about ON SELECT INSTEAD where the INSTEAD is not a select; is that even legal?) Also, would it be a good idea to propagate a source view's inFromCl flag into the substituted tables? (That is, when firing a select rule for a table that wasn't marked inFromCl to begin with, clear the inFromCl flags of all RTEs that it adds to the query.) I am not sure if this is appropriate or not. Actually, it would probably be cleanest if we split the functions of inFromCl into two separate flags, say "inFromCl" and "inJoinSet". The point of inFromCl is that when we add an implicit RTE using the Postquel extension, it shouldn't suddenly become part of the available namespace for unqualified column names processed later in the query. So inFromCl controls the use of the RTE to look up unqualified names. However, if we believe that the planner should subsequently treat that implicit RTE as if it were a normal join target, then we need a separate flag that carries that information. This didn't use to be an issue, because the implicit RTE could only be there if there were a Var reference to it; add_missing_vars_to_tlist() should never need to do anything for it, because the RTE would have been added to the join set already because of its Var, right? Well, that *used* to be true up till last week. Now that we have a constant- expression folder that understands about boolean and case expression short-circuiting, it is possible for Var nodes to disappear from the tree during optimization. It would be a bad thing if that changed the join semantics. So, I think we need a flag that can force RTEs to be included in the planner's join set regardless of whether their Vars survive the optimizer. And that can't be the same as inFromCl, or there's no such thing as an implicit RTE. With this split, inFromCl would be looked at only by the parser code that resolves unqualified names, and inJoinSet would be looked at by add_missing_vars_to_tlist(). The rewriter's machinations would only need to consider whether to set/clear inJoinSet or not. (Thomas, does any of this strike a chord with your inner/outer join stuff?) regards, tom lane
> Actually, it would probably be cleanest if we split the functions of > inFromCl into two separate flags, say "inFromCl" and "inJoinSet". > The point of inFromCl is that when we add an implicit RTE using the > Postquel extension, it shouldn't suddenly become part of the available > namespace for unqualified column names processed later in the query. > So inFromCl controls the use of the RTE to look up unqualified names. > However, if we believe that the planner should subsequently treat that > implicit RTE as if it were a normal join target, then we need a separate > flag that carries that information. Two different flags seems like the perfect way. Let me know if you need any help adding the flag. I would be glad to do it. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom Lane wrote: > [...] > > wieck@debis.com (Jan Wieck) writes: > >>>> Caution here! > >> > >>>> After rewriting there can be many unused rangetable entries > >>>> floating around. Especially if you SELECT from a view, the > >>>> view's relation is still mentioned in the rangetable. > > The other 50% of the problem is that the rewriter is overly enthusiastic > about clearing the inFromCl flag in order to prevent views from being > taken as valid join targets. rewriteHandler.c has two different > routines that will clear inFromCl flags, and they're both bogus: > > [...] > > Jan, what do you think of this? In particular, what do you think should > happen in the following cases: > 1. Table has an ON SELECT *not* INSTEAD rule. > 2. There is an ON SELECT (with or without INSTEAD) rule for one or > more fields of the table, but not for the whole table. > > I'm not at all clear on the semantics of those kinds of rules, so I > don't know if they should remove the original table from the join set > or not. (I'm also confused about ON SELECT INSTEAD where the INSTEAD > is not a select; is that even legal?) > > Also, would it be a good idea to propagate a source view's inFromCl > flag into the substituted tables? (That is, when firing a select rule > for a table that wasn't marked inFromCl to begin with, clear the > inFromCl flags of all RTEs that it adds to the query.) I am not sure > if this is appropriate or not. Don't worry about it, those rules cannot occur and I'm sure we'll never reincarnate them in the future. The only allowed rule ON SELECT is one that - IS INSTEAD - is named "_RET<relation-name>" - has one action which must be another SELECT with a targetlist producing exactly the relations attribute list. Again: If a relation has a rule ON SELECT, it IS A VIEW. No relation can have more that one rule ON SELECT. I've disabled all the other cases in RewriteDefine() on v6.4 - I think - because of the unclear semantics. Rules ON SELECT where planned to have different actions or rewrite single attributes too. But ON SELECT rules must be applied on all relations which get scanned, so if there would be a rule ON SELECT that inserts some logging into another relation, this would actually occur ON UPDATE and ON DELETE to it's relation too because to do the UPDATE/DELETE it's relation has to be scanned. I think it's correct to MOVE the inFromCl from the relation rewritten to the join relations coming with the view's rule. Thus clear it on the RTE rewritten and on the first two of the rules (which are allways NEW and OLD for all rules). Then set all other RTE's which come from the view to the former inFromCl state of the rewritten RTE. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
wieck@debis.com (Jan Wieck) writes: > I think it's correct to MOVE the inFromCl from the relation > rewritten to the join relations coming with the view's rule. > Thus clear it on the RTE rewritten and on the first two of > the rules (which are allways NEW and OLD for all rules). Then > set all other RTE's which come from the view to the former > inFromCl state of the rewritten RTE. OK, I will do that. My first-cut code (attached, please look it over) passes regress test without it, but we know how much that's worth ;-). Actually I think moving inJoinSet is now the important thing... regards, tom lane All mention of inFromCl/inJoinSet removed from ApplyRetrieveRule; fireRIRrules loop looks like: rt_index = 0; while (rt_index < length(parsetree->rtable)) { ++rt_index; rte = nth(rt_index - 1, parsetree->rtable); /* * If the table is not one named in the original FROM clause * then it must be referenced in the query,or we ignore it. * This prevents infinite expansion loop due to new rtable * entries inserted by expansionof a rule. */ if (! rte->inFromCl && rt_index != parsetree->resultRelation && ! rangeTableEntry_used((Node*) parsetree, rt_index, 0)) { /* Make sure the planner ignores it too... */ rte->inJoinSet = false; continue; } rel = heap_openr(rte->relname, AccessShareLock); rules = rel->rd_rules; if (rules == NULL) { heap_close(rel, AccessShareLock); continue; } locks = NIL; /* * Collect the RIR rules that we must apply */ for (i = 0; i < rules->numLocks; i++) { rule = rules->rules[i]; if (rule->event != CMD_SELECT) continue; if (rule->attrno > 0) { /* per-attr rule; do we need it? */ if (! attribute_used((Node*) parsetree, rt_index, rule->attrno,0)) continue; } else { /* Rel-wide ON SELECT DOINSTEAD means this is a view. * Remove the view from the planner's join target set, * orwe'll get no rows out because view itself is empty! */ if (rule->isInstead) rte->inJoinSet = false; } locks = lappend(locks, rule); } /* * Check permissions */ checkLockPerms(locks, parsetree, rt_index); /* * Now apply them */ foreach(l, locks) { rule = lfirst(l); RIRonly.event = rule->event; RIRonly.attrno = rule->attrno; RIRonly.qual = rule->qual; RIRonly.actions = rule->actions; parsetree = ApplyRetrieveRule(parsetree, &RIRonly, rt_index, RIRonly.attrno == -1, rel, &modified); } heap_close(rel, AccessShareLock); }