Thread: GROUP BY + join regression in 7.3

GROUP BY + join regression in 7.3

From
Tom Lane
Date:
I have just noticed that 7.3 and CVS tip reject a query that was
accepted in earlier releases:

regression=# create table t1(f1 int, f2 int);
CREATE TABLE
regression=# create table t2(f2 int, f3 int);
CREATE TABLE
regression=# select t1.f1 from t1 join t2 on (t1.f2=t2.f2) group by f1;
ERROR:  Attribute t1.f1 must be GROUPed or used in an aggregate function

The difficulty here is that in 7.3, the unqualified name "f1" is treated
as a Var of the unnamed JOIN relation, which is not equal() to "t1.f1".

It's possible that we could avoid associating Vars with the JOIN
relation in the case of JOINs that have no alias, but I'm not convinced
this would work; it took a couple of iterations to get to the present
representation of JOIN variables and I no longer recall all the reasons
why simpler representations failed :-(

What I'm thinking of doing is applying flatten_join_alias_vars() in
parse_agg.c before checking for invalid grouping --- that replaces "f1"
with "t1.f1" so that the above works.

But first, I thought I'd ask if anyone thinks the above error message is
correct behavior?  The SQL spec seems quite unspecific about this ---
it says that the SELECT-list item must be "a reference to a grouping
column", but I couldn't find anywhere that specifically said whether
"f1" and "t1.f1" are to be considered the same column in this scenario.

There are related cases in which they are definitely not the same thing,
egselect ... from t1 right join t2 using (f1)
"f1" will never be NULL in the result of this FROM clause, but "t1.f1"
could be NULL.  ("f1" would really be "t2.f1" here.)

In the plain inner-join case, there isn't any semantic difference
between "f1" and "t1.f1", but does that mean we should treat them
as the same grouping column?  I'm not sure.  Anyone want to try these
cases on other DBMSes?
        regards, tom lane



Re: GROUP BY + join regression in 7.3

From
Joe Conway
Date:
Tom Lane wrote:
> I have just noticed that 7.3 and CVS tip reject a query that was
> accepted in earlier releases:
> 
> regression=# create table t1(f1 int, f2 int);
> CREATE TABLE
> regression=# create table t2(f2 int, f3 int);
> CREATE TABLE
> regression=# select t1.f1 from t1 join t2 on (t1.f2=t2.f2) group by f1;
> ERROR:  Attribute t1.f1 must be GROUPed or used in an aggregate function

[...snip...]

> In the plain inner-join case, there isn't any semantic difference
> between "f1" and "t1.f1", but does that mean we should treat them
> as the same grouping column?  I'm not sure.  Anyone want to try these
> cases on other DBMSes?
> 

I did the above on MSSQL2000 -- it works with no error. I haven't been 
able to get at a development Oracle db, but I'll try to tomorrow.

Joe



Re: GROUP BY + join regression in 7.3

From
Tom Lane
Date:
Joe Conway <mail@joeconway.com> writes:
> I did the above on MSSQL2000 -- it works with no error.

Just out of curiosity --- does MSSQL treat "f1" and "t1.f1" as different
in the RIGHT JOIN variant case I mentioned?  Previous reports have led
me to have a very low estimate of their [understanding of/willingness to
enforce] subtle distinctions ...
        regards, tom lane



Re: GROUP BY + join regression in 7.3

From
Joe Conway
Date:
Tom Lane wrote:
> Just out of curiosity --- 

Here's what I get:

select * from t1
f1          f2
----------- -----------
1           1
2           2

(2 row(s) affected)

select * from t2
f2          f3
----------- -----------
1           3

(1 row(s) affected)

select t1.f2 from t2 right join t1 on t1.f2 = t2.f2 group by f2
Server: Msg 209, Level 16, State 1, Line 1
Ambiguous column name 'f2'.

select f2 from t2 right join t1 on t1.f2 = t2.f2 group by t1.f2
Server: Msg 209, Level 16, State 1, Line 1
Ambiguous column name 'f2'.

select f2 from t2 right join t1 on t1.f2 = t2.f2 group by f2
Server: Msg 209, Level 16, State 1, Line 1
Ambiguous column name 'f2'.
Server: Msg 209, Level 16, State 1, Line 1
Ambiguous column name 'f2'.

select t1.f2 from t2 right join t1 on t1.f2 = t2.f2 group by t1.f2
f2
-----------
1
2

(2 row(s) affected)

Any other variations you want to see?

Joe



Re: GROUP BY + join regression in 7.3

From
Manfred Koizar
Date:
On Tue, 01 Apr 2003 00:29:46 -0500, Tom Lane <tgl@sss.pgh.pa.us>
wrote:
>Just out of curiosity --- does MSSQL treat "f1" and "t1.f1" as different
>in the RIGHT JOIN variant case I mentioned?

MSSQL7: SELECT t1.f1 FROM t1 {INNER | LEFT | RIGHT} JOIN t2 ON (t1.f2=t2.f2) GROUP BY f1
all run without an error.

ORACLE7: JOIN syntax not available, but SELECT t1.f1 FROM t1, t2 WHERE (t1.f2=t2.f2) GROUP BY f1; SELECT t1.f1 FROM t1,
t2WHERE t1.f2=t2.f2(+) GROUP BY f1; SELECT t1.f1 FROM t1, t2 WHERE t1.f2(+)=t2.f2 GROUP BY f1;
 
all work.

ServusManfred



Re: GROUP BY + join regression in 7.3

From
Tom Lane
Date:
Manfred Koizar <mkoi-pg@aon.at> writes:
> MSSQL7:
>   SELECT t1.f1
>   FROM t1 {INNER | LEFT | RIGHT} JOIN t2 ON (t1.f2=t2.f2)
>   GROUP BY f1
> all run without an error.

> ORACLE7: JOIN syntax not available, but
>   SELECT t1.f1 FROM t1, t2 WHERE (t1.f2=t2.f2) GROUP BY f1;
>   SELECT t1.f1 FROM t1, t2 WHERE t1.f2=t2.f2(+) GROUP BY f1;
>   SELECT t1.f1 FROM t1, t2 WHERE t1.f2(+)=t2.f2 GROUP BY f1;
> all work.

Okay, that seems to pretty much prove the point.  I'll fix parse_agg.
        regards, tom lane