At 12:05 PM 12/15/99 -0500, Tom Lane wrote:
>Who's up for a little language-lawyering discussion?
>
>I have just noticed that our parser is probably in error in treating
>GROUP BY and ORDER BY expressions similarly. This came up while
>checking whether we were doing the right thing in rejecting
>
>SELECT complicated-expression AS foo FROM table WHERE foo < 42;
FWIW, here's what Oracle does:
SQL> select * from foo;
I J
---------- ---------- 1 2
SQL> select i as ii from foo where ii=3;
select i as ii from foo where ii=3 *
ERROR at line 1:
ORA-00904: invalid column name
This seems in agreement with PostgreSQL's rejection of the query.
>
>Our parser will accept AS-names in ORDER BY and GROUP BY clauses,
Oracle, again:
SQL> select i as ii from foo where i=1 group by ii;
select i as ii from foo where i=1 group by ii *
ERROR at line 1:
ORA-00904: invalid column name
BTW, at times at least it seems that PostgreSQL REQUIRES use of the
"as" name in group by, at least I've had queries I've been unable to move
from Oracle to PostgreSQL unless I've done so. It's not consistent,
though, i.e. there's some kind of bug that pops up for some queries.
I've not had time to attempt to figure out exactly what differentiates
queries that work from queries that fail if I don't use the "as" name.
Here's Oracle's pronouncement on order by:
SQL> select i as ii from foo where i=1 order by ii;
II
---------- 1
>but not in WHERE or HAVING. But eyeballing the spec makes it look like
>AS-names should *only* be recognized in ORDER BY, nowhere else.
And this jives with Oracle. I offer this as supporting evidence only,
of course, I'm sure Oracle violates the standard in some ways as well
so we can't take their implementation as being definitive in regard
to standards issues.
> <direct select statement: multiple rows> ::=
> <query expression> [ <order by clause> ]
>
> <query specification> ::=
> SELECT [ <set quantifier> ] <select list> <table expression>
>
> <table expression> ::=
> <from clause>
> [ <where clause> ]
> [ <group by clause> ]
> [ <having clause> ]
>
>(<query expression> reduces to <query specification>s combined by
>UNION/INTERSECT/EXCEPT, which are not of interest here).
>
>Now the interesting thing about this is that WHERE, GROUP BY, and HAVING
>are all defined to use column names that are defined by the <table
>expression> they're in.
Well, really it's a scoping issue, not a syntax issue. What is the scope
of an identifier defined by an "as" identifier? Of course, this is simple
enough that they might've been able to encapsulate the scope in the syntax.
Do you have the syntax for the various clauses available? For instance,
two kinds of identifiers might be defined, say a column_id which must
really be the name of a real column and a more general id which is the
union of real column ids and "as" names.
I just looked at the grammar in Date's book, and it says:
order-item ::= { column | integer } [ ASC | DESC ]
and GROUP BY is followed by a "column-ref-commalist"
which would leave me to think that the issue of where an "as" identifier
can be used is addressed semantically, not syntactically, since both
simply refer to "column" identifiers. I don't have time at the
moment to dig into Date's book further to see what he says, I can look
later if you want.
Keep in mind this is the very first time I've looked at the formal
syntax for SQL. I have a background in language-lawyering, though.
>Next question is, do we want to leave the code as-is, or tighten up
>the parser to reject AS-names and column numbers in GROUP BY?
My personal feeling is that minor extensions, including accidental
ones, work against the goal of standards which is of course to make
it easier to move stuff from one implementation to another. From my
current perspective of moving nearly 9,000 lines of Oracle SQL (just
in the data model, with thousands more in the code that uses it) examples
like this where postgres implements a superset of the standard is a lot
easier to deal with than those areasa where postgres implements a subset
(no outer joins, for instance)!
But philosophically I'm a believer in standards and in making it
as easy as possible to move code back and forth between various
SQL engines.
Can it silently break a query, i.e. are there examples where an
identifier might refer to different columns in the two cases? If not,
I wouldn't worry about it too much though if it were up to me I'd probably
adhere to the standard. Silent breakage (i.e. "working" but returning an
incorrect result compared to the result you'd get with a standard
implementation)
is more insidious as such queries can be hard to uncover when porting
something.
>It seems to me we should change it, because there are cases where the
>existing code will do the wrong thing according to the SQL spec.
>If "foo" is a column name and also an AS-name for something else,
>"GROUP BY foo" should group on the raw column according to the spec,
>but right now we will pick the SELECT result value instead.
Oops...silent breakage, all right. Bad. Yeah, it should be fixed, I
don't think there should be any question about it - assuming that a
closer reading of the standard verifies that it should work as you
and Oracle both seem to think it should.
- Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert
Serviceand other goodies at http://donb.photo.net.