Thread: Re: [BUGS] 7.3 GROUP BY differs from 7.2
Dan Langille <dan@langille.org> writes: > This is the query in question: > SELECT element_id as wle_element_id, COUNT(watch_list_id) > FROM watch_list JOIN watch_list_element > ON watch_list.id = watch_list_element.watch_list_id > WHERE watch_list.user_id = 1 > GROUP BY watch_list_element.element_id; > ERROR: Attribute unnamed_join.element_id must be GROUPed or used in an > aggregate function The parser currently considers an output column of a JOIN to be a different variable from the corresponding column of the input table. Thus the above error message. While the distinction is without content in this example, it is extremely real in some nearby cases --- in particular, in NATURAL or USING full outer joins it's possible for one to be null when the other isn't. (And no, I don't think 7.2 got this right.) I'm having a hard time finding anything in the SQL spec that addresses this point specifically --- but I also cannot find anything that suggests that the name scope rules differ between outer and inner joins. So it would be difficult for them to assert that element_id and watch_list_element.element_id must be treated as equivalent here, when they are clearly not equivalent in related cases. Anyone care to offer a gloss on the spec to prove that this behavior is correct or not correct? regards, tom lane
On Fri, 21 Feb 2003, Tom Lane wrote: > Dan Langille <dan@langille.org> writes: > > This is the query in question: > > > SELECT element_id as wle_element_id, COUNT(watch_list_id) > > FROM watch_list JOIN watch_list_element > > ON watch_list.id = watch_list_element.watch_list_id > > WHERE watch_list.user_id = 1 > > GROUP BY watch_list_element.element_id; > > > ERROR: Attribute unnamed_join.element_id must be GROUPed or used in an > > aggregate function > > The parser currently considers an output column of a JOIN to be a > different variable from the corresponding column of the input table. > Thus the above error message. While the distinction is without content > in this example, it is extremely real in some nearby cases --- in > particular, in NATURAL or USING full outer joins it's possible for one > to be null when the other isn't. (And no, I don't think 7.2 got this > right.) > > I'm having a hard time finding anything in the SQL spec that addresses > this point specifically --- but I also cannot find anything that > suggests that the name scope rules differ between outer and inner joins. > So it would be difficult for them to assert that element_id and > watch_list_element.element_id must be treated as equivalent here, > when they are clearly not equivalent in related cases. > > Anyone care to offer a gloss on the spec to prove that this behavior > is correct or not correct? Hmm, I'd read SQL92 Section 7.5 (joined tables) Syntax Rules as saying that the non natural/using case is separate from the other cases. Syntax rule 5 seems to say that A JOIN B ON (...) should have the same column descriptors as A,B and it explicitly doesn't cover NATURAL or USING (covered by rule 6).
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > On Fri, 21 Feb 2003, Tom Lane wrote: >> Anyone care to offer a gloss on the spec to prove that this behavior >> is correct or not correct? > Hmm, I'd read SQL92 Section 7.5 (joined tables) Syntax Rules as saying > that the non natural/using case is separate from the other cases. > Syntax rule 5 seems to say that A JOIN B ON (...) should have the same > column descriptors as A,B and it explicitly doesn't cover NATURAL or > USING (covered by rule 6). Yeah, but those rules only define the names and types of the JOIN's output columns. They don't say anything about the visibility of those names, nor the visibility of the underlying-table column names, nor particularly about semantic equivalence of the two sets of names. I spent some time digging around in the verbiage about name scopes, but didn't find any joy. It does seem clear that if you stick an alias on the JOIN as a whole, that that hides the individual table names/aliases, which would render the issue moot. But Dan didn't do that in his example, so he is allowed to access both the join columns and the underlying columns. Question is, are they equivalent for the purposes of the grouped-column-reference rules, or not? regards, tom lane
On Sat, 22 Feb 2003, Tom Lane wrote: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > On Fri, 21 Feb 2003, Tom Lane wrote: > >> Anyone care to offer a gloss on the spec to prove that this behavior > >> is correct or not correct? > > > Hmm, I'd read SQL92 Section 7.5 (joined tables) Syntax Rules as saying > > that the non natural/using case is separate from the other cases. > > > Syntax rule 5 seems to say that A JOIN B ON (...) should have the same > > column descriptors as A,B and it explicitly doesn't cover NATURAL or > > USING (covered by rule 6). > > Yeah, but those rules only define the names and types of the JOIN's > output columns. They don't say anything about the visibility of those > names, nor the visibility of the underlying-table column names, nor > particularly about semantic equivalence of the two sets of names. > > I spent some time digging around in the verbiage about name scopes, > but didn't find any joy. It does seem clear that if you stick an > alias on the JOIN as a whole, that that hides the individual table > names/aliases, which would render the issue moot. But Dan didn't > do that in his example, so he is allowed to access both the join > columns and the underlying columns. Question is, are they equivalent > for the purposes of the grouped-column-reference rules, or not? Well, I mentioned it because I thought that it might mean that the semantic equivalence for columns would be the same as the A,B case (barring aliases, using, etc) since the only rule about the visibility on the names in that case I could find was that you can't qualify to a table within a joined table on a "common column name" of the joined table. I'd think the rule would have to be the same for the A,B case as well. And, if say A.A and <output of join> A are ambiguous on whether they're the same, wouldn't the rules on group by say that A.A is invalid grouping column reference in this case whether or not the select list has A or A.A? I'm thinking that it wouldn't be an unambiguous reference to a column of the preceding FROM/WHERE clause.
Guys, SQL spec aside, thinking about this from a strictly implementation/user point of view: (an keeping in mind that I think it's very important that we work out the spec-correct behaviour for 7.4 and/or 7.3.3) The particular case that Dan has raised is an issue for four reasons: 1) It looks to a human like it *should* work, and I think given a long weekend of relational calculus someone (not me) could define the cases where it is OK as opposed to the cases (probably the majority) where it is not. 2) That syntax *did* work in previous versions of PostgreSQL. 3) That syntax will be accepted by some other SQL databases. 4) The error message is rather confusing and could cause a developer to spend an hour or more hunting for the wrong error. I propose that, should we decide not to change the behaviour of the parser, that we do the following: 1) add the following to the FAQ or elsewhere: Q. I just got the message "ERROR: Attribute unnamed_join.[column name] must be GROUPed or used in an aggregate function" and my GROUP BY query won't run, even though all of the columns are in the GROUP BY clause. This query may have worked in PostgreSQL 7.2, or on another SQL database. What do I do? A. You are probably qualifying a column name differently in the SELECT clause than in the GROUP BY clause, causing the parser to be confused about what you really mean. For example, you may be referring to a column by its simple column name ("element_id") in the SELECT clause, and by its table-qualified name ("table1.element_id") in the GROUP BY clause, or you may be using an alias in one place but not the other. Please make sure that all columns in the SELECT clause match *exactly* the columns in the GROUP BY clause. 2) That we add a warning in the 7.3 release notes about the breaking of backward compatibility. Thoughts? -- Josh Berkus Aglio Database Solutions San Francisco