Thread: 7.3 GROUP BY differs from 7.2

7.3 GROUP BY differs from 7.2

From
Dan Langille
Date:
I notice this today when migrating an application from 7.2 to 7.3.  The
column name is not being recognized.

See also: http://archives.postgresql.org/pgsql-sql/2003-02/msg00480.php

This is the query in question:

SELECT element_id as wle_element_id, COUNT(watch_list_id)
  FROM watch_list JOIN watch_list_element
       ON watch_list.id   = watch_list_element.watch_list_id
 WHERE watch_list.user_id = 1
 GROUP BY watch_list_element.element_id;

ERROR:  Attribute unnamed_join.element_id must be GROUPed or used in an
aggregate function

Of note is the column watch_list_element.element_id. The following
variation works:

SELECT element_id as wle_element_id, COUNT(watch_list_id)
  FROM watch_list JOIN watch_list_element
       ON watch_list.id   = watch_list_element.watch_list_id
 WHERE watch_list.user_id = 1
 GROUP BY element_id;

i.e. remove the table name from the GROUP BY field.  Similar success is
obtained if you add the table name to element_id in both the SELECT and
the GROUP BY.

Similarly, this works:

SELECT element_id as wle_element_id, COUNT(watch_list_id)
  FROM watch_list JOIN watch_list_element
       ON watch_list.id   = watch_list_element.watch_list_id
 WHERE watch_list.user_id = 1
 GROUP BY wle_element_id;

i.e. use the column alias.

Under 7.2.3, all of the above queries work.

cheers

Re: 7.3 GROUP BY differs from 7.2

From
Tom Lane
Date:
Dan Langille <dan@langille.org> writes:
> This is the query in question:

> SELECT element_id as wle_element_id, COUNT(watch_list_id)
>   FROM watch_list JOIN watch_list_element
>        ON watch_list.id   = watch_list_element.watch_list_id
>  WHERE watch_list.user_id = 1
>  GROUP BY watch_list_element.element_id;

> ERROR:  Attribute unnamed_join.element_id must be GROUPed or used in an
> aggregate function

The parser currently considers an output column of a JOIN to be a
different variable from the corresponding column of the input table.
Thus the above error message.  While the distinction is without content
in this example, it is extremely real in some nearby cases --- in
particular, in NATURAL or USING full outer joins it's possible for one
to be null when the other isn't.  (And no, I don't think 7.2 got this
right.)

I'm having a hard time finding anything in the SQL spec that addresses
this point specifically --- but I also cannot find anything that
suggests that the name scope rules differ between outer and inner joins.
So it would be difficult for them to assert that element_id and
watch_list_element.element_id must be treated as equivalent here,
when they are clearly not equivalent in related cases.

Anyone care to offer a gloss on the spec to prove that this behavior
is correct or not correct?
        regards, tom lane


Re: [SQL] 7.3 GROUP BY differs from 7.2

From
Stephan Szabo
Date:
On Fri, 21 Feb 2003, Tom Lane wrote:

> Dan Langille <dan@langille.org> writes:
> > This is the query in question:
>
> > SELECT element_id as wle_element_id, COUNT(watch_list_id)
> >   FROM watch_list JOIN watch_list_element
> >        ON watch_list.id   = watch_list_element.watch_list_id
> >  WHERE watch_list.user_id = 1
> >  GROUP BY watch_list_element.element_id;
>
> > ERROR:  Attribute unnamed_join.element_id must be GROUPed or used in an
> > aggregate function
>
> The parser currently considers an output column of a JOIN to be a
> different variable from the corresponding column of the input table.
> Thus the above error message.  While the distinction is without content
> in this example, it is extremely real in some nearby cases --- in
> particular, in NATURAL or USING full outer joins it's possible for one
> to be null when the other isn't.  (And no, I don't think 7.2 got this
> right.)
>
> I'm having a hard time finding anything in the SQL spec that addresses
> this point specifically --- but I also cannot find anything that
> suggests that the name scope rules differ between outer and inner joins.
> So it would be difficult for them to assert that element_id and
> watch_list_element.element_id must be treated as equivalent here,
> when they are clearly not equivalent in related cases.
>
> Anyone care to offer a gloss on the spec to prove that this behavior
> is correct or not correct?

Hmm, I'd read SQL92 Section 7.5 (joined tables) Syntax Rules as saying
that the non natural/using case is separate from the other cases.

Syntax rule 5 seems to say that A JOIN B ON (...) should have the same
column descriptors as A,B and it explicitly doesn't cover NATURAL or
USING (covered by rule 6).



Re: [SQL] 7.3 GROUP BY differs from 7.2

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> On Fri, 21 Feb 2003, Tom Lane wrote:
>> Anyone care to offer a gloss on the spec to prove that this behavior
>> is correct or not correct?

> Hmm, I'd read SQL92 Section 7.5 (joined tables) Syntax Rules as saying
> that the non natural/using case is separate from the other cases.

> Syntax rule 5 seems to say that A JOIN B ON (...) should have the same
> column descriptors as A,B and it explicitly doesn't cover NATURAL or
> USING (covered by rule 6).

Yeah, but those rules only define the names and types of the JOIN's
output columns.  They don't say anything about the visibility of those
names, nor the visibility of the underlying-table column names, nor
particularly about semantic equivalence of the two sets of names.

I spent some time digging around in the verbiage about name scopes,
but didn't find any joy.  It does seem clear that if you stick an
alias on the JOIN as a whole, that that hides the individual table
names/aliases, which would render the issue moot.  But Dan didn't
do that in his example, so he is allowed to access both the join
columns and the underlying columns.  Question is, are they equivalent
for the purposes of the grouped-column-reference rules, or not?
        regards, tom lane


Re: [SQL] 7.3 GROUP BY differs from 7.2

From
Stephan Szabo
Date:
On Sat, 22 Feb 2003, Tom Lane wrote:

> Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> > On Fri, 21 Feb 2003, Tom Lane wrote:
> >> Anyone care to offer a gloss on the spec to prove that this behavior
> >> is correct or not correct?
>
> > Hmm, I'd read SQL92 Section 7.5 (joined tables) Syntax Rules as saying
> > that the non natural/using case is separate from the other cases.
>
> > Syntax rule 5 seems to say that A JOIN B ON (...) should have the same
> > column descriptors as A,B and it explicitly doesn't cover NATURAL or
> > USING (covered by rule 6).
>
> Yeah, but those rules only define the names and types of the JOIN's
> output columns.  They don't say anything about the visibility of those
> names, nor the visibility of the underlying-table column names, nor
> particularly about semantic equivalence of the two sets of names.
>
> I spent some time digging around in the verbiage about name scopes,
> but didn't find any joy.  It does seem clear that if you stick an
> alias on the JOIN as a whole, that that hides the individual table
> names/aliases, which would render the issue moot.  But Dan didn't
> do that in his example, so he is allowed to access both the join
> columns and the underlying columns.  Question is, are they equivalent
> for the purposes of the grouped-column-reference rules, or not?

Well, I mentioned it because I thought that it might mean that the
semantic equivalence for columns would be the same as the A,B case
(barring aliases, using, etc) since the only rule about the visibility on
the names in that case I could find was that you can't qualify to a table
within a joined table on a "common column name" of the joined table.
I'd think the rule would have to be the same for the A,B case as well.

And, if say A.A and <output of join> A are ambiguous on whether they're
the same, wouldn't the rules on group by say that A.A is invalid grouping
column reference in this case whether or not the select list has A or A.A?
I'm thinking that it wouldn't be an unambiguous reference to a column of
the preceding FROM/WHERE clause.



Re: [SQL] 7.3 GROUP BY differs from 7.2

From
Josh Berkus
Date:
Guys,

SQL spec aside, thinking about this from a strictly implementation/user point
of view:
(an keeping in mind that I think it's very important that we work out the
spec-correct behaviour for 7.4 and/or 7.3.3)

The particular case that Dan has raised is an issue for four reasons:
1) It looks to a human like it *should* work, and I think given a long weekend
of relational calculus someone (not me) could define the cases where it is OK
as opposed to the cases (probably the majority) where it is not.
2) That syntax *did* work in previous versions of PostgreSQL.
3) That syntax will be accepted by some other SQL databases.
4) The error message is rather confusing and could cause a developer to spend
an hour or more hunting for the wrong error.

I propose that, should we decide not to change the behaviour of the parser,
that we do the following:

1) add the following to the FAQ or elsewhere:

Q.  I just got the message "ERROR:  Attribute unnamed_join.[column name] must
be GROUPed or used in an aggregate function" and my GROUP BY query won't run,
even though all of the columns are in the GROUP BY clause. This query may
have worked in PostgreSQL 7.2, or on another SQL database.  What do I do?

A. You are probably qualifying a column name differently in the SELECT clause
than in the GROUP BY clause, causing the parser to be confused about what you
really mean.   For example, you may be referring to a column by its simple
column name ("element_id") in the SELECT clause, and by its table-qualified
name ("table1.element_id") in the GROUP BY clause, or you may be using an
alias in one place but not the other.   Please make sure that all columns in
the SELECT clause match *exactly* the columns in the GROUP BY clause.

2) That we add a warning in the 7.3 release notes about the breaking of
backward compatibility.

Thoughts?

--
Josh Berkus
Aglio Database Solutions
San Francisco