Re: [HACKERS] Planner drops unreferenced tables --- bug, no? - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [HACKERS] Planner drops unreferenced tables --- bug, no? |
Date | |
Msg-id | 4083.939088008@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [HACKERS] Planner drops unreferenced tables --- bug, no? (wieck@debis.com (Jan Wieck)) |
Responses |
Re: [HACKERS] Planner drops unreferenced tables --- bug, no?
Re: [HACKERS] Planner drops unreferenced tables --- bug, no? |
List | pgsql-hackers |
I wrote: >>>>> It seems to me that the latter query must yield 9 rows (three >>>>> occurrences of each value) to satisfy the SQL spec. The spec defines >>>>> the result of a two-query FROM clause to be the Cartesian product of the >>>>> two tables, period. It doesn't say anything about "only if one or more >>>>> columns of each table are actually used somewhere". On further investigation, it turns out that there is actually code in the planner that tries to do this right! (and at one time probably did do it right...) add_missing_vars_to_tlist() in initsplan.c has the specific mission of making sure any tables that are mentioned only in the FROM clause get included into the planner's target relation set. Unfortunately, it's been dead code for a while, because there are two different upstream routines that do the wrong thing. 50% of the problem is in query_planner, which tries to short-circuit the whole planning process if it doesn't see any vars at all in the targetlist or qual. I think this code can simply be diked out as misguided optimization, although there might be a couple of small changes needed elsewhere to handle the case where the target relation set is truly empty. (But this error is not what's preventing "SELECT foo.f1 FROM foo, bar" from generating a join between foo and bar; add_missing_vars_to_tlist() could still fire, so long as at least one var appears in the query.) wieck@debis.com (Jan Wieck) writes: >>>> Caution here! >> >>>> After rewriting there can be many unused rangetable entries >>>> floating around. Especially if you SELECT from a view, the >>>> view's relation is still mentioned in the rangetable. The other 50% of the problem is that the rewriter is overly enthusiastic about clearing the inFromCl flag in order to prevent views from being taken as valid join targets. rewriteHandler.c has two different routines that will clear inFromCl flags, and they're both bogus: 1. fireRIRrules will zap a table's inFromCl flag if the table is not referenced by any Var in the parsetree, *whether or not the table actually has any rules*. This is why add_missing_vars_to_tlist() is currently dead code in all cases. It's wrong in another way too: if the table has no referencing vars, fireRIRrules doesn't look for rules applicable to the table. That's wrong for the same reasons that removing the table from the join set is wrong: it can still affect the results, so the lookup and substitution should still occur. 2. If ApplyRetrieveRule is fired, it resets the inFromCl flag on *all* tables in the query, not only the one being substituted for. This is just plain wrong. I believe the right thing to do is remove ApplyRetrieveRule's change of inFromCl entirely, and to modify fireRIRrules so that it only clears a table's inFromCl flag if it finds an ON SELECT DO INSTEAD rule for it. That will remove views from the join set without side-effects on the behavior for normal tables. (Also, fireRIRrules should try to apply rules for a table whether it finds references to it or not; which means that rangeTableEntry_used() isn't needed at all.) Jan, what do you think of this? In particular, what do you think should happen in the following cases: 1. Table has an ON SELECT *not* INSTEAD rule. 2. There is an ON SELECT (with or without INSTEAD)rule for one or more fields of the table, but not for the whole table. I'm not at all clear on the semantics of those kinds of rules, so I don't know if they should remove the original table from the join set or not. (I'm also confused about ON SELECT INSTEAD where the INSTEAD is not a select; is that even legal?) Also, would it be a good idea to propagate a source view's inFromCl flag into the substituted tables? (That is, when firing a select rule for a table that wasn't marked inFromCl to begin with, clear the inFromCl flags of all RTEs that it adds to the query.) I am not sure if this is appropriate or not. Actually, it would probably be cleanest if we split the functions of inFromCl into two separate flags, say "inFromCl" and "inJoinSet". The point of inFromCl is that when we add an implicit RTE using the Postquel extension, it shouldn't suddenly become part of the available namespace for unqualified column names processed later in the query. So inFromCl controls the use of the RTE to look up unqualified names. However, if we believe that the planner should subsequently treat that implicit RTE as if it were a normal join target, then we need a separate flag that carries that information. This didn't use to be an issue, because the implicit RTE could only be there if there were a Var reference to it; add_missing_vars_to_tlist() should never need to do anything for it, because the RTE would have been added to the join set already because of its Var, right? Well, that *used* to be true up till last week. Now that we have a constant- expression folder that understands about boolean and case expression short-circuiting, it is possible for Var nodes to disappear from the tree during optimization. It would be a bad thing if that changed the join semantics. So, I think we need a flag that can force RTEs to be included in the planner's join set regardless of whether their Vars survive the optimizer. And that can't be the same as inFromCl, or there's no such thing as an implicit RTE. With this split, inFromCl would be looked at only by the parser code that resolves unqualified names, and inJoinSet would be looked at by add_missing_vars_to_tlist(). The rewriter's machinations would only need to consider whether to set/clear inJoinSet or not. (Thomas, does any of this strike a chord with your inner/outer join stuff?) regards, tom lane
pgsql-hackers by date: