Thread: Table name scope (was Re: [BUGS] Outer joins aren't working with views)
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> It works for me: >> regression=# select t1.*, t2.* from t1 natural left outer join t2; >> id | id2 | id >> ----+-----+---- >> 1 | 1 | 1 > My recollection is that SQL9x requires that the join result lose the > link to the original table names. That is, > select id, id2 from t1 natural left outer join t2; > is legal, but > select t1.id, ... > is not. Hm. This is one of the areas that I had put down on my personal TODO list as needing a close look before release. So, let's get to it. My first scan of SQL92 looks like our current behavior is right. I find these paras that seem to be relevant to the scope of a <correlation name> (ie, a table alias): 5.4 Names and identifiers, syntax rule 12: 12)An <identifier> that is a <correlation name> is associated with a table within a particular scope. Thescope of a <correlation name> is either a <select statement: single row>, <subquery>, or <query specification>(see Subclause 6.3, "<table reference>"). Scopes may be nested. In different scopes, the same <correlation name> may be associated with different tables or with the same table. 6.3 <table reference>, syntax rule 2: 2) Case: a) If a <table reference> TR is contained in a <from clause> FC with no intervening <derived table>,then the scope clause SC of TR is the <select statement: single row> or innermost <query specification>that contains FC. The scope clause of the exposed <correlation name> or exposed <table name> ofTR is the <select list>, <where clause>, <group by clause>, and <having clause> of SC, togetherwith the <join condition> of all <joined table>s contained in SC that contains TR. b) Otherwise, the scope clause SC of TR is the outermost <joined table> that contains TR with no intervening<derived table>. The scope of the exposed <correlation name> or exposed <table name> ofTR is the <join condition> of SC and of all <joined table>s contained in SC that contain TR. (Note that <derived table> means subselect-in-FROM, cf 6.3 and 7.11.) The first and second items here seem to be perfectly clear that the names t1 and t2 have scope across the whole SELECT statement and are not hidden within the <joined table> formed by the OUTER JOIN clause. On the other hand, the third item leaves me confused again. I don't see how it applies at all, ie, when is the "If" of 2(a) ever false? How is it *possible* to have a <table reference> that's not directly contained in a <from clause>? The business about a <derived table> seems like horsepucky, because a table ref inside a subselect would be contained in the subselect's from-clause and its scope would be that subselect. Where in the spec does it allow a table reference that's not in a from-clause? (Our PostQuel extensions do not count ;-)) It'd be useful to check the above example against Oracle and other implementations, but the parts of the spec that I can follow seem to say that we've got the right behavior now. regards, tom lane
Re: Table name scope (was Re: [BUGS] Outer joins aren't working with views)
From
Thomas Lockhart
Date:
> The first and second items here seem to be perfectly clear that the > names t1 and t2 have scope across the whole SELECT statement and are not > hidden within the <joined table> formed by the OUTER JOIN clause. You are right. If there is a "correlation name", then those underlying table names become invisible, but that was not in the example here. Rereading my Date and Darwen clarified this for me. However, there are *some* columns for which this explicit table qualification is not allowed, including in the example of NATURAL JOIN. Date and Darwen, 4th ed, pp 142-144 discuss various aspects of join scope and behavior. For NATURAL JOIN, the columns with common names forming the join columns *lose* their underlying table name, since they can't be traced back to a column from a specific table (the table of origin is ambiguous). And for a NATURAL JOIN, it is impossible to get back two columns with the same name, since those columns were unified by the join process. The process is required to join on the columns with names in common, and to swallow one of each pair in the result. How should you refer to the column that remains? create table t1 (id int, id2 int); create table t2 (id int, name text); select * from t1 natural left outer join t2; must return something from the set of columns (id, id2, name), and two columns of name "id" will not be visible. Also, column "id" cannot be qualified with a table name. So select t1.id from t1 natural join t2; is not legal (though perhaps could be justified as an extension). The columns *not* involved in the join operation, id2 and name, *can* be qualified by the underlying table name, but the only way to get the same for "id" after the natural join is to use a correlation name. e.g. select tx.id from (t1 natural join t2) as tx; select t1.id2 from t1 natural join t2; are both legal. > It'd be useful to check the above example against Oracle and other > implementations, but the parts of the spec that I can follow seem > to say that we've got the right behavior now. Oracle does not support SQL9x join syntax, so we can't ask it for an example. Not sure about the others. Comments? - Thomas
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> The first and second items here seem to be perfectly clear that the >> names t1 and t2 have scope across the whole SELECT statement and are not >> hidden within the <joined table> formed by the OUTER JOIN clause. > You are right. If there is a "correlation name", then those underlying > table names become invisible, but that was not in the example here. Right, either the table's real name or its alias ("correlation name") is introduced into the query's scope, not both. AFAICT the scope rules are the same for either one, though. > Rereading my Date and Darwen clarified this for me. However, there are > *some* columns for which this explicit table qualification is not > allowed, including in the example of NATURAL JOIN. I disagree on that. The table's real/alias name is certainly supposed to be accessible, and I see nothing in the spec that says that only some of its columns are accessible via qualification. What the spec does say is that the *output* of the join has only one copy of the joined column. In other words, given table A with columns ID and CA, and table B with columns ID and CB, I believe the correct behavior is SELECT * FROM (A NATURAL JOIN B) J produces ID, CA, CB SELECT J.* FROM (A NATURAL JOIN B) J produces ID, CA, CB SELECT A.* FROM (A NATURAL JOIN B) J produces ID, CA SELECT B.* FROM (A NATURAL JOIN B) J produces ID, CB If it's an outer join then J.ID is subtly different from A.ID and/or B.ID --- the spec defines the output column as COALESCE(A.ID,B.ID) (cf SQL92 7.5 <joined table>, syntax rule 6.d) to get rid of introduced nulls. BTW, our implementation simplifies that to A.ID for an inner or left join, or B.ID for a right join, and only uses the full COALESCE expression for a full join. Anyway, I believe it's true that you can't get at A.ID or B.ID in this example except by qualifying the column name with the table name --- but I don't see where it says that you shouldn't be able to get at them at all. If that were true then the definition in 7.5.6.d wouldn't be legal, because that's exactly the syntax it uses to define the joined column. > Date and Darwen, 4th ed, pp 142-144 discuss various aspects of join > scope and behavior. For NATURAL JOIN, the columns with common names > forming the join columns *lose* their underlying table name, since they > can't be traced back to a column from a specific table (the table of > origin is ambiguous). My reading is that the output columns are qualified with the JOIN clause's correlation name, if any (J in my example). If you didn't bother to stick a correlation name on the join clause, you couldn't refer to them with a qualified name. In an example like SELECT * FROM (A NATURAL LEFT JOIN (B NATURAL FULL JOIN C)); supposing that all three tables have a column ID, then the output ID column of the B/C join has no qualified name, and it would indeed be impossible to refer to it from the SELECT list. The only IDs accessible from the SELECT list are the also-qualified-name-less output of the left join and A.ID, B.ID, C.ID, none of which are quite the same as the output of the full join. Perhaps what Date and Darwen are talking about is cases like this? regards, tom lane
Re: Table name scope (was Re: [BUGS] Outer joins aren't working with views)
From
Thomas Lockhart
Date:
> I disagree on that. The table's real/alias name is certainly supposed > to be accessible, and I see nothing in the spec that says that only some > of its columns are accessible via qualification. Date and Darwen disagree circa 1997, and I believe that SQL99 does not radically alter the spec in this regard. All of my interpretations below are based on D&D, not the draft spec we have available (though I look to that to support their interpretation, which imho it does). > What the spec does say > is that the *output* of the join has only one copy of the joined column. > In other words, given table A with columns ID and CA, and table B with > columns ID and CB, I believe the correct behavior is > > SELECT * FROM (A NATURAL JOIN B) J produces ID, CA, CB Yes. > SELECT J.* FROM (A NATURAL JOIN B) J produces ID, CA, CB Yes. > SELECT A.* FROM (A NATURAL JOIN B) J produces ID, CA No, since there is a range variable J, no columns explicitly qualified with A or B are visible. If the range variable J is omitted, then the result will produce only CA. See one of the D&D cases I include below. > SELECT B.* FROM (A NATURAL JOIN B) J produces ID, CB Same as for the previous case. B.* is not visible since a range variable is specified, and if J is not there then B.* produces CB only. > If it's an outer join then J.ID is subtly different from A.ID and/or > B.ID --- the spec defines the output column as COALESCE(A.ID,B.ID) > (cf SQL92 7.5 <joined table>, syntax rule 6.d) to get rid of introduced > nulls. BTW, our implementation simplifies that to A.ID for an inner or > left join, or B.ID for a right join, and only uses the full COALESCE > expression for a full join. Right, the result is the same for these cases. The only issue is the scoping on the name allowed for external reference. > Anyway, I believe it's true that you can't get at A.ID or B.ID in > this example except by qualifying the column name with the table name > --- but I don't see where it says that you shouldn't be able to get > at them at all. If that were true then the definition in 7.5.6.d > wouldn't be legal, because that's exactly the syntax it uses to define > the joined column. 7.7.7.d seems to define SLCC pretty clearly, without a table name qualification. I think that this is consistant with D&D's interpretation. > > Date and Darwen, 4th ed, pp 142-144 discuss various aspects of join > > scope and behavior. For NATURAL JOIN, the columns with common names > > forming the join columns *lose* their underlying table name, since they > > can't be traced back to a column from a specific table (the table of > > origin is ambiguous). > My reading is that the output columns are qualified with the JOIN > clause's correlation name, if any (J in my example). If you didn't > bother to stick a correlation name on the join clause, you couldn't > refer to them with a qualified name. Sure. But without a correlation name, you are not allowed to qualify with the underlying table name for "join columns" from NATURAL or JOIN ON joins. See below... > In an example like > > SELECT * FROM (A NATURAL LEFT JOIN (B NATURAL FULL JOIN C)); > > supposing that all three tables have a column ID, then the output ID > column of the B/C join has no qualified name, and it would indeed be > impossible to refer to it from the SELECT list. The only IDs accessible > from the SELECT list are the also-qualified-name-less output of the > left join and A.ID, B.ID, C.ID, none of which are quite the same as > the output of the full join. Perhaps what Date and Darwen are talking > about is cases like this? No, they are talking about simpler cases, and very clearly they disagree with the current behavior of the PostgreSQL parser. Now, it may be that SQL99 has changed the scoping rules for these cases, but instead I would look for support for Date and Darwen's interpretation in the spec, rather than reading the spec from first principles. Date and Darwen can explain it in a couple of pages, and give examples, where the spec is just way too convoluted for a clear reading istm. Anyway, here are two cases discussed by D&D -- note that table sp has columns (sno, pno, qty) and table s has columns (sno, sname, status, city, primary): (p142, after a discussion of other cases) "One very counterintuitive consequence of this unorthodox scoping rule is illustrated by the following example: The result of the expression select distinct sp.* from sp natural join s; will include columns PNO and QTY but *not* column SNO, because -- believe it or not -- there is no column "SP.SNO" in the result of the join expression (indeed specifying SP.SNO in the SELECT clause would be a syntax error)." The emphasis is D&D's, not mine ;) For natural joins, or other joins where two columns are subsumed into one (anything with a USING clause?) the scoping rules are clear, at least to D&D: it is not possible to reference one of these columns by qualifying with the name of an underlying table. Another case (p143-144, following some simpler cases which show how scoping progresses through more deeply nested joins): "Now let us modify the example once again to introduce an explicit range variable TC for the overall result: ( ( T1 JOIN T2 ON cond-1 ) AS TA JOIN ( T3 JOIN T4 ON cond-2 ) AS TB ON cond-3 ) AS TC The rules are now as follows: cond-1 can reference T1 and T2 but not T3, T4, TA, TB, or TC cond-2 can reference T3 and T4 but not T1, T2, TA, TB, or TC cond-3 can reference TA and TB but not T1, T2, T3, T4, or TC and (once again) if the overall expression appears as the operand of a FROM clause, then the associated SELECT clause, WHERE clause, etc. can reference TC but not T1, T2, T3, T4, TA, or TB." So the two D&D cases cited above illustrate the "with range variables" and "without range variables" expected behavior. Comments? - Thomas
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > (p142, after a discussion of other cases) > "One very counterintuitive consequence of this unorthodox scoping rule > is illustrated by the following example: The result of the expression > select distinct sp.* from sp natural join s; > will include columns PNO and QTY but *not* column SNO, because -- > believe it or not -- there is no column "SP.SNO" in the result of the > join expression (indeed specifying SP.SNO in the SELECT clause would be > a syntax error)." > The emphasis is D&D's, not mine ;) Hm. After further digging in the spec, it seems that their interpretation rests on SQL92's section 6.4 <column reference> syntax rule 2.b. Rule 2 in full is: 2) If CR contains a <qualifier> Q, then CR shall appear within the scope of one or more <table name>s or<correlation name>s that are equal to Q. If there is more than one such <table name> or <correlationname>, then the one with the most local scope is specified. Let T be the table associated with Q. a) T shall include a column whose <column name> is CN. b) If T is a <table reference> in a <joined table> J, then CN shall not be a common column name inJ. Note: Common column name is defined in Subclause 7.5, "<joined table>". 2.b strikes me as a completely unnecessary and counterintuitive restriction. Do D&D provide any justification for it? I'm not especially inclined to make our implementation substantially more complex in order to enforce what seems a bogus restriction. What's even more interesting is that I can find no equivalent text in SQL99. regards, tom lane
Re: Table name scope (was Re: [BUGS] Outer joins aren't working with views)
From
Thomas Lockhart
Date:
> > (p142, after a discussion of other cases) > > "One very counterintuitive consequence of this unorthodox scoping rule > > is illustrated by the following example: The result of the expression > > > select distinct sp.* from sp natural join s; > > > will include columns PNO and QTY but *not* column SNO, because -- > > believe it or not -- there is no column "SP.SNO" in the result of the > > join expression (indeed specifying SP.SNO in the SELECT clause would be > > a syntax error)." > > > The emphasis is D&D's, not mine ;) > > Hm. After further digging in the spec, it seems that their > interpretation rests on SQL92's section 6.4 <column reference> syntax > rule 2.b. Rule 2 in full is: > > 2) If CR contains a <qualifier> Q, then CR shall appear within the > scope of one or more <table name>s or <correlation name>s that > are equal to Q. If there is more than one such <table name> or > <correlation name>, then the one with the most local scope is > specified. Let T be the table associated with Q. > > a) T shall include a column whose <column name> is CN. > > b) If T is a <table reference> in a <joined table> J, then CN > shall not be a common column name in J. > > Note: Common column name is defined in Subclause 7.5, "<joined > table>". > > 2.b strikes me as a completely unnecessary and counterintuitive > restriction. Do D&D provide any justification for it? I'm not > especially inclined to make our implementation substantially more > complex in order to enforce what seems a bogus restriction. Hmm. istm that the D&D interpretation is entirely clear, and that for NATURAL and USING joins there is no other way to carry along join results as intermediate "tables". If select * from t1 natural join t2; produces, say, three columns, how can any other specification of the target list using only wildcards produce *more* columns? In particular, how can select t1.*, t2.* from t1 natural join t2; produce columns from t1 and t2 which are *not present* in the join "t1 natural join t2"? > What's even more interesting is that I can find no equivalent > text in SQL99. Of course. When they bloated the spec by a factor of three or four, they had to leave out the clear parts to save space ;) I'm pretty sure that the sections I quoted (in 7.7.7 in the draft document I have -- hopefully the same as what you have available?) cover this topic. In particular, NATURAL and USING joins are not the same as other inner or outer joins in the resulting set of available columns. So there are two issues here which I hope to clarify: scoping on joins, and NATURAL and USING join column sets. - Thomas
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > In particular, how can > select t1.*, t2.* from t1 natural join t2; > produce columns from t1 and t2 which are *not present* in the join "t1 > natural join t2"? Very easily ;-) >> What's even more interesting is that I can find no equivalent >> text in SQL99. > Of course. When they bloated the spec by a factor of three or four, they > had to leave out the clear parts to save space ;) Or they realized they blew it the first time. > I'm pretty sure that the sections I quoted (in 7.7.7 in the draft > document I have -- hopefully the same as what you have available?) > cover this topic. In particular, NATURAL and USING joins are not the > same as other inner or outer joins in the resulting set of available > columns. There's no question about what happens as far as the output of the join is concerned. However, 7.7.7 does not say word one about what is implied by direct access (ie, qualified-name access) to the component tables of the join. I've been through the SQL99 draft again, and there is quite clearly NOT any restriction corresponding to the old 6.4.2.b; so under SQL99 it is legal to refer to A.ID and B.ID. However, they do still have the idea that A.* should omit ID: 7.11 <query specification> syntax rule 7.g.i (concerning expansion of qualified asterisks) says i) If the basis is a <table or query name> or <correlation name>, then let TQ be the table associatedwith the basis. The <select sublist> is equivalent to a <value expression> sequencein which each <value expression> is a column reference CR that references a column of TQ that is not a common column of a <joined table>. Each column of TQ that is not a referenced common columnshall be referenced exactly once. The columns shall be referenced in the ascending sequenceof their ordinal positions within TQ. which is essentially taken from 7.9.4 of the old spec. This is a mess; I wonder if the discrepancy between qualified-name access and asterisk expansion is deliberate? (Perhaps they felt that allowing qualified name access was an extension that wouldn't break old code, but that they couldn't change the asterisk expansion rule without breaking backwards compatibility?) It'd be nice to see if this is still true in SQL99 final. > So there are two issues here which I hope to clarify: scoping > on joins, and NATURAL and USING join column sets. Two issues? I thought we were only arguing about the latter one. regards, tom lane
I said: > The <select sublist> is equivalent to a <value expression> > sequence in which each <value expression> is a column > reference CR that references a column of TQ that is not > a common column of a <joined table>. > which is essentially taken from 7.9.4 of the old spec. This is a mess; In fact, after looking at it again, I realize that the quoted text is *wrong*, because it does not say what they presumably intended. As written, it appears thatSELECT J.* FROM (A NATURAL JOIN B) J should omit the common column(s). They're common columns of a <joined table>, aren't they? A lawyer would probably point out that 7.7 does not define the phrase "common column". It defines "common column name". Common column name clearly applies to all three tables involved (both input tables and the output table), but it's not so clear whether "common column" is intended to do so. One could also wonder about the intended behavior of multi-level joins. Does a column of a base table become inaccessible if it is used as a common column several JOIN levels up? At best, this part of the spec is extremely poorly written. regards, tom lane
Re: Table name scope (was Re: [BUGS] Outer joins aren't working with views)
From
Thomas Lockhart
Date:
> > So there are two issues here which I hope to clarify: scoping > > on joins, and NATURAL and USING join column sets. > Two issues? I thought we were only arguing about the latter one. Well, I prefer to consider it "discussing" ;) And there are two issues. I'll bet lunch and dinner that SQL99 did *not* make radical changes in the scoping rules for join syntax vis a vis SQL92. Certainly something compatible with SQL92 should have a shot at being also compatible under SQL99, and scoping rules would fall into that category. On the second topic, NATURAL and USING join column sets, I believe that it *must* be true that the set of columns available in a natural join result (e.g. the result of A NATURAL JOIN B ) is the complete set of columns available to a SELECT target list, to a WHERE qualification, etc. D&D's description of the effects of this "interpretation" are consistant and clear (where the spec is not). I'm not sure how we can allow our interpretation to be at odds with the SQL92 spec or with a reading of the SQL99 draft I have available. In particular, the rules for forming join results seem to cover the cases we are discussing, and I read them as being consistant with D&D's SQL92 discussion. btw, their appendix on the upcoming "SQL3" does not bring up join results or join scoping as among the changes in the upcoming standard, though of course that is not a definitive point. Date and Darwen have imho a very clear description of the scoping allowed in join syntax. That scoping discussion says very clearly that a "range variable" (SQL9x "correlation name") becomes the only allowed qualification to a column name in SELECT target lists, WHERE clauses, etc etc. They have very specific examples to clarify the point. And they deem that necessary because the spec is a PITA to wade through. I'd rather leave it to them to do the wading ;) Let's look for counterexamples in our other texts if you are really uncomfortable with the SQL92 (and SQL99?) result in D&D. I have another book or two, and will look through them tonight. Does anyone else want to jump in, esp. if you have experience with the SQL9x conventions or have access to a db which already implements it? - Thomas
Re: Re: Table name scope (was Re: [BUGS] Outer joins aren't working with views)
From
Peter Eisentraut
Date:
Tom Lane writes: > SELECT * FROM (A NATURAL JOIN B) J produces ID, CA, CB > > SELECT J.* FROM (A NATURAL JOIN B) J produces ID, CA, CB > > SELECT A.* FROM (A NATURAL JOIN B) J produces ID, CA > > SELECT B.* FROM (A NATURAL JOIN B) J produces ID, CB ISTM that correlation names aren't allowed after joined tables in the first place. <table reference> ::= <table name> [ [ AS ] <correlation name> [ <left paren> <derivedcolumn list> <right paren> ] ] | <derived table> [ AS ] <correlation name> [ <leftparen> <derived column list> <right paren> ] | <joined table> <joined table> ::= <cross join> | <qualified join> | <left paren> <joined table><right paren> -- Peter Eisentraut peter_e@gmx.net http://yi.org/peter-e/
Peter Eisentraut <peter_e@gmx.net> writes: > ISTM that correlation names aren't allowed after joined tables in the > first place. > <table reference> ::= > <table name> [ [ AS ] <correlation name> > [ <left paren> <derived column list> <right paren> ] ] > | <derived table> [ AS ] <correlation name> > [ <left paren> <derived column list> <right paren> ] > | <joined table> > <joined table> ::= > <cross join> > | <qualified join> > | <left paren> <joined table> <right paren> Keep looking: <derived table> ::= <table subquery> <table subquery> ::= <subquery> <subquery> ::= <left paren> <query expression> <right paren> <query expression> ::= <non-join query expression> | <joined table> So you can writeSELECT A.* FROM (A NATURAL JOIN B) J but inSELECT A.* FROM A NATURAL JOIN B J the J will be taken as an alias for B not for the join. If they allowed an alias clause on an unparenthesized <joined table>, the grammar would be ambiguous... regards, tom lane
>>>> So there are two issues here which I hope to clarify: scoping >>>> on joins, and NATURAL and USING join column sets. I've been looking some more at this business, and I have found one of the reasons that I was confused. The SQL92 spec says (6.3 syntax rule 2) 2) Case: a) If a <table reference> TR is contained in a <from clause> FC with no intervening <derived table>,then the scope clause SC of TR is the <select statement: single row> or innermost <query specification>that contains FC. The scope clause of the exposed <correlation name> or exposed <table name> ofTR is the <select list>, <where clause>, <group by clause>, and <having clause> of SC, togetherwith the <join condition> of all <joined table>s contained in SC that contains TR. b) Otherwise, the scope clause SC of TR is the outermost <joined table> that contains TR with no intervening<derived table>. The scope of the exposed <correlation name> or exposed <table name> ofTR is the <join condition> of SC and of all <joined table>s contained in SC that contain TR. I mistakenly read this with the assumption that <derived table> means a sub-SELECT. It does mean that, but it also means a <joined table>, *if and only if* that joined table is labeled with a <correlation name>. The relevant productions are: <table reference> ::= <table name> [ [ AS ] <correlation name> [ <left paren> <derivedcolumn list> <right paren> ] ] | <derived table> [ AS ] <correlation name> [ <leftparen> <derived column list> <right paren> ] | <joined table> <derived table> ::= <table subquery> <table subquery> ::= <subquery> <subquery> ::= <left paren> <query expression> <right paren> <query expression> ::= <non-join query expression> | <joined table> So "(<joined table>) AS foo" has a <subquery> but "<joined table>" doesn't. AFAICT, this means that table references defined within the join are invisible outside "(<joined table>) AS foo", but they are visible outside a plain "<joined table>". This is more than a tad bizarre ... but it explains the examples you quoted from Date and Darwen. However, as long as a table reference is visible, I think that the set of qualified column names available from it should not depend on whether it came from inside a JOIN expression or not. Comments? regards, tom lane