Re: [HACKERS] Planner drops unreferenced tables --- bug, no? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] Planner drops unreferenced tables --- bug, no?
Date
Msg-id 4083.939088008@sss.pgh.pa.us
Whole thread Raw
In response to Re: [HACKERS] Planner drops unreferenced tables --- bug, no?  (wieck@debis.com (Jan Wieck))
Responses Re: [HACKERS] Planner drops unreferenced tables --- bug, no?  (Bruce Momjian <maillist@candle.pha.pa.us>)
Re: [HACKERS] Planner drops unreferenced tables --- bug, no?  (wieck@debis.com (Jan Wieck))
List pgsql-hackers
I wrote:
>>>>> It seems to me that the latter query must yield 9 rows (three
>>>>> occurrences of each value) to satisfy the SQL spec.  The spec defines
>>>>> the result of a two-query FROM clause to be the Cartesian product of the
>>>>> two tables, period.  It doesn't say anything about "only if one or more
>>>>> columns of each table are actually used somewhere".

On further investigation, it turns out that there is actually code in
the planner that tries to do this right!  (and at one time probably
did do it right...)  add_missing_vars_to_tlist() in initsplan.c has the
specific mission of making sure any tables that are mentioned only in
the FROM clause get included into the planner's target relation set.
Unfortunately, it's been dead code for a while, because there are two
different upstream routines that do the wrong thing.

50% of the problem is in query_planner, which tries to short-circuit
the whole planning process if it doesn't see any vars at all in the
targetlist or qual.  I think this code can simply be diked out as
misguided optimization, although there might be a couple of small
changes needed elsewhere to handle the case where the target relation
set is truly empty.  (But this error is not what's preventing
"SELECT foo.f1 FROM foo, bar" from generating a join between foo and
bar; add_missing_vars_to_tlist() could still fire, so long as at least
one var appears in the query.)


wieck@debis.com (Jan Wieck) writes:
>>>> Caution here!
>> 
>>>> After  rewriting  there can be many unused rangetable entries
>>>> floating around. Especially if you SELECT from  a  view,  the
>>>> view's relation is still mentioned in the rangetable.

The other 50% of the problem is that the rewriter is overly enthusiastic
about clearing the inFromCl flag in order to prevent views from being
taken as valid join targets.  rewriteHandler.c has two different
routines that will clear inFromCl flags, and they're both bogus:

1. fireRIRrules will zap a table's inFromCl flag if the table is not
referenced by any Var in the parsetree, *whether or not the table
actually has any rules*.  This is why add_missing_vars_to_tlist() is
currently dead code in all cases.  It's wrong in another way too: if
the table has no referencing vars, fireRIRrules doesn't look for rules
applicable to the table.  That's wrong for the same reasons that
removing the table from the join set is wrong: it can still affect
the results, so the lookup and substitution should still occur.

2. If ApplyRetrieveRule is fired, it resets the inFromCl flag on *all*
tables in the query, not only the one being substituted for.  This is
just plain wrong.


I believe the right thing to do is remove ApplyRetrieveRule's change
of inFromCl entirely, and to modify fireRIRrules so that it only clears
a table's inFromCl flag if it finds an ON SELECT DO INSTEAD rule for it.
That will remove views from the join set without side-effects on the
behavior for normal tables.  (Also, fireRIRrules should try to apply
rules for a table whether it finds references to it or not; which means
that rangeTableEntry_used() isn't needed at all.)

Jan, what do you think of this?  In particular, what do you think should
happen in the following cases: 1. Table has an ON SELECT *not* INSTEAD rule. 2. There is an ON SELECT (with or without
INSTEAD)rule for one or    more fields of the table, but not for the whole table.
 

I'm not at all clear on the semantics of those kinds of rules, so I
don't know if they should remove the original table from the join set
or not.  (I'm also confused about ON SELECT INSTEAD where the INSTEAD
is not a select; is that even legal?)

Also, would it be a good idea to propagate a source view's inFromCl
flag into the substituted tables?  (That is, when firing a select rule
for a table that wasn't marked inFromCl to begin with, clear the
inFromCl flags of all RTEs that it adds to the query.)  I am not sure
if this is appropriate or not.


Actually, it would probably be cleanest if we split the functions of
inFromCl into two separate flags, say "inFromCl" and "inJoinSet".
The point of inFromCl is that when we add an implicit RTE using the
Postquel extension, it shouldn't suddenly become part of the available
namespace for unqualified column names processed later in the query.
So inFromCl controls the use of the RTE to look up unqualified names.
However, if we believe that the planner should subsequently treat that
implicit RTE as if it were a normal join target, then we need a separate
flag that carries that information.

This didn't use to be an issue, because the implicit RTE could only be
there if there were a Var reference to it; add_missing_vars_to_tlist()
should never need to do anything for it, because the RTE would have been
added to the join set already because of its Var, right?  Well, that
*used* to be true up till last week.  Now that we have a constant-
expression folder that understands about boolean and case expression
short-circuiting, it is possible for Var nodes to disappear from the
tree during optimization.  It would be a bad thing if that changed the
join semantics.  So, I think we need a flag that can force RTEs
to be included in the planner's join set regardless of whether their
Vars survive the optimizer.  And that can't be the same as inFromCl,
or there's no such thing as an implicit RTE.

With this split, inFromCl would be looked at only by the parser code
that resolves unqualified names, and inJoinSet would be looked at
by add_missing_vars_to_tlist().  The rewriter's machinations would
only need to consider whether to set/clear inJoinSet or not.

(Thomas, does any of this strike a chord with your inner/outer join
stuff?)
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Frederick Cheeseborough"
Date:
Subject: unsubscribe
Next
From: Thomas Lockhart
Date:
Subject: Re: [HACKERS] MATLAB mex file for PostgreSQL