Thread: SQL feature requests
Hopefully this is the right place for a few feature requests that would address some of the things that I've noticed in postgres. 1. Just a minor annoyance, but why must subqueries in FROM clauses have an alias? For instance suppose that I have an orders table, and one of the fields is userid. The following is unambiguous and is legal in Oracle: SELECT order_count , count(*) as people FROM ( SELECT count(*) as order_count FROM orders GROUP BY userid ) GROUP BY order_count It annoys me that it isn't legal in postgres. (Yes, I know how to fix the query. But it still is an annoyance, and it comes up fairly often in reporting purposes.) 2. Why is 'non-integer constant in GROUP BY' an error? I find it inconvenient. For reporting purposes I often have to dynamically build queries in code. An easy way to do that is just interpolate in a set of possible statements which will either be empty strings or have trailing commas. But then I need this (possibly empty) list to have a valid group by statement at the end. In Oracle I used to just write it like this: SELECT ... GROUP BY $possible_term_1 $possible_term_2 $possible_term_3 'end of possible groupings' In postgres I either have to use a different strategy to build up these strings, or else use a more complicated term to finish that off.This works for now: case when true then true end but I don't know whether some future version of postgres might break my code by banning that as well. 3. How hard would it be to have postgres ignore aliases in group by clauses? Per my comments above, I often build complex queries in code. I can't easily use the shortcut of referring to the select column by number because the position is hard to determine. So my code has to copy the select terms. But I can't copy them exactly because the select terms include lots of "...as foo" clauses that are not allowed in a group by. So I have to store very similar terms to use twice. It would be nice if I could just make the group by look like the select, and have the (obviously irrelevant) aliases just be ignored. 4) Items 2 and 3 would both be made irrelevant if postgres did something that I'd really, really would like. Which is to assume that a query without a group by clause, but with an aggregate function in the select, should have an implicit group by clause where you group by all non-aggregate functions in the select. For example SELECT foo, count(*) FROM bar would be processed as: SELECT foo, count(*) FROM bar GROUP BY foo If I write a query with an aggregate function in the select, better than 95% of the time this is the group by clause that I want. (This email has one of the few exceptions.) In the remaining cases I could easily add the extra stuff in the group by to the select without problems. Therefore if postgres could just insert the obvious group by clause in, I would never again write the words "group by" when working with postgres. And I predict that many other people would do the same. But it doesn't. So when working with postgres, just like every other database that I've used, I have to constantly type in group by clauses with entirely redundant information. (But they're not EXACTLY the same as the select clauses that they are redundant with...) Cheers, Ben
Ben, pgsql-sql is probably the appropriate list for future queries of this nature. Note that the below is my personal opinion; each PG developer has their own. > 1. Just a minor annoyance, but why must subqueries in FROM clauses > have an alias? For instance suppose that I have an orders table, and > one of the fields is userid. The following is unambiguous and is > legal in Oracle: I *think* the alias is a requirement of the SQL standard. Yes/No? > 2. Why is 'non-integer constant in GROUP BY' an error? Again, this needs to reference one of the SQL standards if you're interested in a change of behavior. If we're out of standards compliance, that's a strong argument. If we're in compliance, you have a pretty steep hurdle to justify new syntax. > 3. How hard would it be to have postgres ignore aliases in group by > clauses? Unfortunately, I think this is also a SQL compliance issue. However, I'd be more liable to support your arguments for it; it's much more obviously useful functionality. > 4) Items 2 and 3 would both be made irrelevant if postgres did > something that I'd really, really would like. Which is to assume that > a query without a group by clause, but with an aggregate function in > the select, should have an implicit group by clause where you group by > all non-aggregate functions in the select. In addition to SQL compliance issues, we're reluctant to do anything which makes implicit assumptions which could easily be wrong in PostgreSQL. Such shortcutting all to often leads to runaway queries or wrong data when the assumptions are incorrect. MySQL gives us lots of examples of what can happen if you do too many things for convenience and compromise correctness. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
On Aug 22, 2007, at 18:45 , Ben Tilly wrote: > 1. Just a minor annoyance, but why must subqueries in FROM clauses > have an alias? It's required by the SQL standard, AIUI. I wonder what EnterpriseDB does? > 2. Why is 'non-integer constant in GROUP BY' an error? > This works for now: > > case when true then true end > > but I don't know whether some future version of postgres might break > my code by banning that as well. The PostgreSQL developers generally tries hard to preserve backwards compatibility, so I doubt the case expression as you have it would go away (though I'm kind of surprised it's allowed). Am I wrong in thinking that Oracle would accept the same format PostgreSQL does? In that case, couldn't you use whatever method works in PostgreSQL in Oracle? I haven't checked the SQL standard, but it seems unlikely it'd allow something like GROUP BY , , , ; AIUI, Integers are only allowed because the SQL standard explicitly allows you to refer to columns by the order they appear in the SELECT list. Otherwise the GROUP BY items need to be column names. Both 1 and 2 seem to me to be places where Oracle is likely deviating from the standard. If you're targeting Oracle, then using Oracle- specific syntax might be warranted. If you're hoping to target more than one possible backend, I'd think it be better to use more portable syntax (e.g., SQL-standard syntax) than expecting other DBMSs to follow another's deviations. That's not to say PostgreSQL does not have non-standard syntax: in places, it does. But it does try to hew very closely to the standard. Again, I wonder what EnterpriseDB does in this case? > 3. How hard would it be to have postgres ignore aliases in group by > clauses? Per my comments above, I often build complex queries in > code. I can't easily use the shortcut of referring to the select > column by number because the position is hard to determine. So my > code has to copy the select terms. But I can't copy them exactly > because the select terms include lots of "...as foo" clauses that are > not allowed in a group by. So I have to store very similar terms to > use twice. Perhaps someone else knows what you're referring to here, but I'm having a hard time without an example. Here's what I *think* you're trying to say: test=# select * from observation; observation_id | record_id | score_id ----------------+-----------+---------- 3240 | 1 | 1 3239 | 1 | 1 3238 | 1 | 2 3237 | 1 | 1 2872 | 2 | 1 2869| 2 | 2 2870 | 2 | 1 2871 | 2 | 1 3218 | 3 | 2 3217 | 3 | 1 (10 rows) test=# select record_id as foo, count(observation_id) from observation group by record_id; foo | count -----+------- 3 | 2 2 | 4 1 | 4 (3 rows) test=# select record_id as foo, count(observation_id) from observation group by foo; foo | count -----+------- 3 | 2 2 | 4 1 | 4 (3 rows) test=# select record_id as foo, count(observation_id) as bar from observation group by foo; foo | bar -----+----- 3 | 2 2 | 4 1 | 4 (3 rows) test=# select record_id as foo, count(observation_id) as bar from observation group by record_id; foo | bar -----+----- 3 | 2 2 | 4 1 | 4 (3 rows) test=# select version(); version ------------------------------------------------------------------------ ---------------------------------------------------------------------- PostgreSQL 8.2.4 on powerpc-apple-darwin8.9.0, compiled by GCC powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5367) (1 row) I'm not getting an error in any permutation that I can think of. What am I missing? > Which is to assume that > a query without a group by clause, but with an aggregate function in > the select, should have an implicit group by clause where you group by > all non-aggregate functions in the select. > > For example > > SELECT foo, count(*) > FROM bar > > would be processed as: > > SELECT foo, count(*) > FROM bar > GROUP BY foo It's been discussed before. I don't believe it's been rejected out of hand (though you can check the archives), just that no one's gotten around to it. (Don't know what the SQL-spec says on this point.) I'm not trying to dismiss your points, just trying to address them. I'm interested to hear what others have to say. Michael Glaesemann grzm seespotcode net
"Ben Tilly" <btilly@gmail.com> writes: > Hopefully this is the right place for a few feature requests that > would address some of the things that I've noticed in postgres. > > 1. Just a minor annoyance, but why must subqueries in FROM clauses > have an alias? For instance suppose that I have an orders table, and > one of the fields is userid. The following is unambiguous and is > legal in Oracle: Thank you, this is one of my top pet peeves but when I proposed changing it I was told nobody's complained. Now we have at least one user complaint, any others out there? > 2. Why is 'non-integer constant in GROUP BY' an error? Hm... I was a bit surprised by this warning myself. IIRC there was an implementation convenience issue. > 3. How hard would it be to have postgres ignore aliases in group by > clauses? That sounds like a strange idea. > 4) Items 2 and 3 would both be made irrelevant if postgres did > something that I'd really, really would like. Which is to assume that > a query without a group by clause, but with an aggregate function in > the select, should have an implicit group by clause where you group by > all non-aggregate functions in the select. > > For example > > SELECT foo, count(*) > FROM bar > > would be processed as: > > SELECT foo, count(*) > FROM bar > GROUP BY foo I agree this would be convenient but it seems too scary to actually go anywhere. What would you group by in the case of: SELECT a+b, count(*) FROM bar Should it group by a,b or a+b ? Also, this might be a bit shocking for MySQL users who are accustomed to MySQL's non-standard extension for the same syntax. There it's treated as an assertion that the columns are equal for all records in a group or at least that it doesn't matter which such value is returned, effectively equivalent to our DISTINCT ON feature. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote: > > On Aug 22, 2007, at 18:45 , Ben Tilly wrote: > > > 1. Just a minor annoyance, but why must subqueries in FROM clauses > > have an alias? > > It's required by the SQL standard, AIUI. I wonder what EnterpriseDB > does? I can well believe that the standard says that you must accept subqueries with aliases. But does it say you must reject subqueries without aliases? I strongly doubt that. I have no clue what EnterpriseDB does. > > 2. Why is 'non-integer constant in GROUP BY' an error? > > > This works for now: > > > > case when true then true end > > > > but I don't know whether some future version of postgres might break > > my code by banning that as well. > > The PostgreSQL developers generally tries hard to preserve backwards > compatibility, so I doubt the case expression as you have it would go > away (though I'm kind of surprised it's allowed). Am I wrong in > thinking that Oracle would accept the same format PostgreSQL does? In The reason for my comparing to Oracle is that I used to work at an Oracle shop. I now work at a postgres shop. Portability is not my issue, just the annoyances that I experienced moving from one to the other. As for whether that case expression would go away, that it is allowed is such an obscure feature that I doubt anyone changing that code would notice if it was removed. > that case, couldn't you use whatever method works in PostgreSQL in > Oracle? I haven't checked the SQL standard, but it seems unlikely > it'd allow something like > > GROUP BY , , , ; That's not what Oracle accepts that postgres does not. What Oracle accepts is: ... GROUP BY 'foo'; > AIUI, Integers are only allowed because the SQL standard explicitly > allows you to refer to columns by the order they appear in the SELECT > list. Otherwise the GROUP BY items need to be column names. Need to be? The SQL-92 standard is clear that you must accept a list of column names. It is also clear that a column name must be be of the form field or table.field. In no way, shape or form does that allow having terms like trim(foo.bar) in a group by. But every reasonable database that I know - including postgres - allows that. The standard very wisely does not forbid extensions. Every database has extensions. In some cases, such as allowing trim(foo.bar) in a group by clause, some extensions are so common as to be a standard. (I don't have a copy of any later standards so I don't know whether that has since been explicitly allowed.) Therefore the real question is how much farther than the standard you go. Postgres explicitly disallows a constant character expression. But it allows the constant case expression that I gave. It would be nice for me to not have to remember that very obscure and convoluted case. > Both 1 and 2 seem to me to be places where Oracle is likely deviating > from the standard. If you're targeting Oracle, then using Oracle- > specific syntax might be warranted. If you're hoping to target more > than one possible backend, I'd think it be better to use more > portable syntax (e.g., SQL-standard syntax) than expecting other > DBMSs to follow another's deviations. That's not to say PostgreSQL > does not have non-standard syntax: in places, it does. But it does > try to hew very closely to the standard. The queries that I'm writing are not hoping to target more than one database at one company. > Again, I wonder what EnterpriseDB does in this case? No clue. > > 3. How hard would it be to have postgres ignore aliases in group by > > clauses? Per my comments above, I often build complex queries in > > code. I can't easily use the shortcut of referring to the select > > column by number because the position is hard to determine. So my > > code has to copy the select terms. But I can't copy them exactly > > because the select terms include lots of "...as foo" clauses that are > > not allowed in a group by. So I have to store very similar terms to > > use twice. > > Perhaps someone else knows what you're referring to here, but I'm > having a hard time without an example. Here's what I *think* you're > trying to say: What I'm trying to say is that it would be convenient for me to be able to write: select bar as "baz" , count(*) as "some count" from foo group by bar as "baz" That's not allowed right now because as is not allowed in a group by statement. [...] > > Which is to assume that > > a query without a group by clause, but with an aggregate function in > > the select, should have an implicit group by clause where you group by > > all non-aggregate functions in the select. > > > > For example > > > > SELECT foo, count(*) > > FROM bar > > > > would be processed as: > > > > SELECT foo, count(*) > > FROM bar > > GROUP BY foo > > It's been discussed before. I don't believe it's been rejected out of > hand (though you can check the archives), just that no one's gotten > around to it. (Don't know what the SQL-spec says on this point.) I don't know what the SQL spec says, but I know (having talked to other developers) that many people would find it very nice. > I'm not trying to dismiss your points, just trying to address them. > I'm interested to hear what others have to say. Cheers, Ben
On 8/22/07, Gregory Stark <stark@enterprisedb.com> wrote: > > "Ben Tilly" <btilly@gmail.com> writes: > > > Hopefully this is the right place for a few feature requests that > > would address some of the things that I've noticed in postgres. > > > > 1. Just a minor annoyance, but why must subqueries in FROM clauses > > have an alias? For instance suppose that I have an orders table, and > > one of the fields is userid. The following is unambiguous and is > > legal in Oracle: > > Thank you, this is one of my top pet peeves but when I proposed changing it I > was told nobody's complained. Now we have at least one user complaint, any > others out there? Always happy to complain. :-) > > 2. Why is 'non-integer constant in GROUP BY' an error? > > Hm... I was a bit surprised by this warning myself. IIRC there was an > implementation convenience issue. If your implementation accepts: group by case when true then 'foo' end how much harder can it be to accept: group by 'foo' ? > > 3. How hard would it be to have postgres ignore aliases in group by > > clauses? > > That sounds like a strange idea. It is a strange idea, but it makes dynamically building queries easier. Right now I'm following a strategy of storing what I'm going to insert in the select clause in one variable, and the group by clause in another. So I need 2 variables for each dynamic field that I might choose to group by and want to have a custom name for. With this change I would only need one variable. > > 4) Items 2 and 3 would both be made irrelevant if postgres did > > something that I'd really, really would like. Which is to assume that > > a query without a group by clause, but with an aggregate function in > > the select, should have an implicit group by clause where you group by > > all non-aggregate functions in the select. > > > > For example > > > > SELECT foo, count(*) > > FROM bar > > > > would be processed as: > > > > SELECT foo, count(*) > > FROM bar > > GROUP BY foo > > I agree this would be convenient but it seems too scary to actually go > anywhere. What would you group by in the case of: > > SELECT a+b, count(*) FROM bar > > Should it group by a,b or a+b ? It should group by a+b. Which is to say, every field in the select clause that currently triggers an error because it isn't in the group by clause. > Also, this might be a bit shocking for MySQL users who are accustomed to > MySQL's non-standard extension for the same syntax. There it's treated as an > assertion that the columns are equal for all records in a group or at least > that it doesn't matter which such value is returned, effectively equivalent to > our DISTINCT ON feature. I don't mind shocking MySQL users. ;-) But seriously, if that objection is the barrier then I'd be happy to see it be something that is explicitly turned on in the query. For instance: select autogroup bar, count(*) from foo If that was available then I for one would type autogroup a lot more often than group by. After all autogroup is about as hard to type, and I don't have to type the redundant list of fields in the group by. Cheers, Ben
Ben Tilly wrote: > On 8/22/07, Gregory Stark <stark@enterprisedb.com> wrote: >> "Ben Tilly" <btilly@gmail.com> writes: >>> 2. Why is 'non-integer constant in GROUP BY' an error? >> Hm... I was a bit surprised by this warning myself. IIRC there was an >> implementation convenience issue. > > If your implementation accepts: > > group by case when true then 'foo' end > > how much harder can it be to accept: > > group by 'foo' This is not about hardness of the implementation, but rather about non-confusing behaviour I think. AFAIK, "group by 1" means "group by the first selected column", not "group all rows together". But "group by 'foo'" would carry the second meaning - "group all rows together". This is so totally counter-intuitive, that it's not even funny... "group by case when true then 'foo' end" looks different enough compared to "group by 1" to make this less of a footgun. Seems that the "group by <integer>" syntax predates the appearance of aliases in the sql standard... greetings, Florian flug
On Aug 22, 2007, at 20:36 , Ben Tilly wrote: > On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote: >> >> On Aug 22, 2007, at 18:45 , Ben Tilly wrote: >> >>> 1. Just a minor annoyance, but why must subqueries in FROM clauses >>> have an alias? >> >> It's required by the SQL standard, AIUI. I wonder what EnterpriseDB >> does? > > I can well believe that the standard says that you must accept > subqueries with aliases. But does it say you must reject subqueries > without aliases? I strongly doubt that. If I'm reading my draft copy of the SQL:2003 spec right (and there's a good chance that I'm not, as it's not the easiest document for me to parse), aliases *are* required. From 5WD-02-Foundation-2003-09 (the fifth working draft) > 7.5 <from clause> > > <from clause> ::= FROM <table reference list> > <table reference list> ::= > <table reference> [ { <comma> <table reference> }... ] > 7.6 <table reference> > > <table reference> ::= > <table factor> > | <joined table> > <table factor> ::= <table primary> [ <sample clause> ] > <table primary> ::= > <table or query name> [ [ AS ] <correlation name> > [ <left paren> <derived column list><right paren> ] ] > | <derived table> [ AS ] <correlation name> > [ <left paren> <derived column list><right paren> ] > | <lateral derived table> [ AS ] <correlation name> > [ <left paren> <derived column list><right paren> ] > | <collection derived table> [ AS ] <correlation name> > [ <left paren> <derived column list><right paren> ] > | <table function derived table> [ AS ] <correlation name> > [ <left paren> <derived column list><right paren> ] > | <only spec> [ [ AS ] <correlation name> > [ <left paren> <derived column list><right paren> ] ] > | <left paren> <joined table> <right paren> > <derived table> ::= <table subquery> <correlation name> I believe is what we commonly refer to as an alias. I think <table or query name> is a table or view name, and <derived table> is a subquery. <correlation name> is optional for <table or query name> but not for <derived table>. The fact that the standard clearly makes it optional in some cases and required in others is pretty clear that <derived table> without a <correlation name> is to be rejected, don't you think? > I have no clue what EnterpriseDB does. In case it wasn't clear, the reason I bring it up is that EnterpriseDB, while working from a PostgreSQL base, strives for Oracle compatibility. >> that case, couldn't you use whatever method works in PostgreSQL in >> Oracle? I haven't checked the SQL standard, but it seems unlikely >> it'd allow something like >> >> GROUP BY , , , ; > > That's not what Oracle accepts that postgres does not. What > Oracle accepts is: > > ... > GROUP BY 'foo'; Thanks for clarifying. If it wasn't obvious by now, you know I'm not familiar with Oracle :) >> AIUI, Integers are only allowed because the SQL standard explicitly >> allows you to refer to columns by the order they appear in the SELECT >> list. Otherwise the GROUP BY items need to be column names. > > Need to be? > > The SQL-92 standard is clear that you must accept a list of column > names. It is also clear that a column name must be be of the form > field or table.field. The 2003 draft (same as above) seems to agree with the SQL92 standard: > 7.9 <group by clause> > Format > <group by clause> ::= > GROUPBY [ <set quantifier> ] <grouping element list> > <grouping element list> ::= > <grouping element> [ { <comma> <grouping element> }... ] > <grouping element> ::= > <ordinary grouping set> > | <rollup list> > | <cube list> > | <grouping sets specification> > | <empty grouping set> > <ordinary grouping set> ::= > <grouping column reference> > | <left paren> <grouping column reference list><right paren> > <grouping column reference> ::= > <column reference> [ <collate clause> ] > <grouping column reference list> ::= > <grouping column reference> [ { <comma><grouping column > reference> }... ] > <rollup list> ::= > ROLLUP<left paren> <ordinary grouping set list><right paren> > <ordinary grouping set list> ::= > <ordinary grouping set> [ { <comma> <ordinary grouping set> }... ] > <cube list> ::= > CUBE<left paren> <ordinary grouping set list><right paren> > <grouping sets specification> ::= > GROUPINGSETS <left paren> <grouping set list><right paren> > <grouping set list> ::= > <grouping set> [ { <comma> <grouping set> }... ] > <grouping set> ::= > <ordinary grouping set> > | <rollup list> > | <cube list> > | <grouping sets specification> > | <empty grouping set> > <empty grouping set> ::= <left paren><right paren> > 6.7 <column reference> > Format > <column reference> ::= > <basic identifier chain> > | MODULE<period> <qualified identifier><period> <column name> There'd have to be a pretty strong reason to extend this, more than just a convenience, I should think. > In no way, shape or form does that allow having > terms like trim(foo.bar) in a group by. > > But every reasonable database that I know - including postgres - > allows that. Can you give an example of something like this working in PostgreSQL? I get an error when I try to use a text value in a GROUP BY clause. (Or are you referring specifically to the CASE expression corner case?) test=# select record_id as foo, count(observation_id) as bar from observation group by trim(observation.record_id::text); ERROR: column "observation.record_id" must appear in the GROUP BY clause or be used in an aggregate function test=# select record_id as foo, count(observation_id) as bar from observation group by trim(observation.record_id); ERROR: column "observation.record_id" must appear in the GROUP BY clause or be used in an aggregate function > Postgres explicitly disallows a constant character expression. But it > allows the constant case expression that I gave. It would be nice for > me to not have to remember that very obscure and convoluted case. I agree, and would move that it should be disallowed if there isn't a reason for it to be maintained, for exactly the reason you give: there shouldn't be such convoluted, and obscure corner case. test=# select record_id as foo, count(observation_id) as bar from observation group by record_id, true; ERROR: non-integer constant in GROUP BY test=# select record_id as foo, count(observation_id) as bar from observation group by record_id, case when true then true end; > What I'm trying to say is that it would be convenient for me to be > able to write: > > select bar as "baz" > , count(*) as "some count" > from foo > group by bar as "baz" > > That's not allowed right now because as is not allowed in a group > by statement. Ah, I see. Thanks for clarifying. >>> Which is to assume that >>> a query without a group by clause, but with an aggregate function in >>> the select, should have an implicit group by clause where you >>> group by >>> all non-aggregate functions in the select. >>> >>> For example >>> >>> SELECT foo, count(*) >>> FROM bar >>> >>> would be processed as: >>> >>> SELECT foo, count(*) >>> FROM bar >>> GROUP BY foo >> >> It's been discussed before. I don't believe it's been rejected out of >> hand (though you can check the archives), just that no one's gotten >> around to it. (Don't know what the SQL-spec says on this point.) > > I don't know what the SQL spec says, but I know (having talked to > other developers) that many people would find it very nice. Since I had the spec open, I tried to look at this as well, though I must admit I found it very tough going. I think this is the key section: > 10.9 <aggregate function> > > ... > > Syntax Rules > > ... > > 4) The argument source of an <aggregate function> is > Case: > a) If AF is immediately contained in a <set function > specification>, then a table or group of a grouped table as > specified in Subclause 7.10, “<having clause>”, and Subclause 7.12, > “<query specification>”. > > b) Otherwise, the collection of rows in the current row's window > frame defined by the window structure descriptor identified by the > <window function> that simply contains AF, as defined in Subclause > 7.11, “<window clause>”. Now the <query specification> stuff is pretty dense, and the <window clause> is currently impenetrable for me, so I just looked at the <having clause> and <query specification> sections referenced above. > 7.10 <having clause> > > ... > > Syntax Rules > > 1) Let HC be the <having clause>. Let TE be the <table expression> > that immediately contains HC. If TE does not immediately contain a > <group by clause>, then “GROUPBY ()” is implicit. Let T be the > descriptor of the table defined by the <group by clause>GBC > immediately contained in TE and let R be the result of GBC. So it looks like if there's no explicit GROUP BY, we're to assume GROUP BY (), at least in the case of HAVING. > 7.12 <query specification> > > ... > > Syntax Rules > > ... > > 14) If <table expression> does not immediately contain a <group by > clause> and <table expression> is simply contained in a <query > expression> that is the aggregation query of some <set function > specification>, then GROUP BY () is implicit. Again, assume GROUP BY () if no explicit GROUP BY with a <set function specification>, which I understand to be some form of aggregation. > 15) If T is a grouped table, then let G be the set of grouping > columns of T. In each <value expression> contained in <select > list> , each column reference that references a column of T shall > reference some column C that is functionally dependent on G or > shall be contained in an aggregated argument of a <set function > specification> whose aggregation query is QS. And a definition of grouped table is found: > 4.14.2 Types of tables > > ... > > A grouped table is a set of groups derived during the evaluation of > a <group by clause>. A group G is a collection of rows in which, > for every grouping column GC, if the value of GC in some row is not > distinct from GV, then the value of GC in every row is GV; > moreover, if R1 is a row in group G1 of grouped table GT and R2 is > a row in GT such that for every grouping column GC the value of GC > in R1 is not distinct from the value of GC in R2, then R2 is in G1. > Every row in GT is in exactly one group. A group may be considered > as a table. Set functions operate on groups. So we're already talking about an explicit GROUP BY clause, and it doesn't seem to leave much wiggle room grouping columns that aren't explicitly listed. I'll readily admit I very easily could be reading this wrong or have missed a section of the spec that discusses this in more detail. (And that goes for any of my interpretations of the spec!) One place you and I differ is that I think you'd be in favor of many more extensions in cases where the SQL spec doesn't explicitly say "Thou shalt not do X". I'm a bit more conservative here: in my opinion, if the SQL speaks to a particular issue (for example, a WHERE clause), then you should pretty much keep to the letter of what it says. Extensions are more for areas where the standard is silent. For example, full text search will be added to PostgreSQL in 8.3. AFAIK, the SQL spec doesn't talk about full text search at all, so this is an area where extensions are a natural fit. Others, such as PostgreSQL's DISTINCT ON, is a much more difficult issue, and I'd suspect if it were proposed as a new feature now it would be rejected. However, it's been a part of PostgreSQL for a long time, so for historical reasons it remains. Likewise, I doubt "text" as a synonym for "varchar" would be accepted. Anyway, enough excursions into SpecLand for me. I'd welcome anyone else taking a crack at it. Michael Glaesemann grzm seespotcode net
On Aug 22, 2007, at 20:49 , Ben Tilly wrote: > If your implementation accepts: > > group by case when true then 'foo' end What would that mean? Regardless of whether or not it's accepted, it should have *some* meaning. It's not equivalent to GROUP BY "foo" test=# select record_id as foo , count(observation_id) as bar from observation group by case when true then 'foo' end; ERROR: column "observation.record_id" must appear in the GROUP BY clause or be used in an aggregate function test=# select record_id , count(observation_id) as bar from observation group by case when true then 'record_id' end; ERROR: column "observation.record_id" must appear in the GROUP BY clause or be used in an aggregate function *This* seems like a bug: test=# select record_id , count(observation_id) as bar from observation group by record_id ,case when true then 'foo' end; record_id | bar -----------+----- 1 | 4 2 | 4 3 | 2 (3 rows) And for good measure: test=# select record_id , count(observation_id) as bar from observation group by case when true then record_id end; ERROR: column "observation.record_id" must appear in the GROUP BY clause or be used in an aggregate function Michael Glaesemann grzm seespotcode net
Michael Glaesemann <grzm@seespotcode.net> writes: > On Aug 22, 2007, at 20:36 , Ben Tilly wrote: >> I can well believe that the standard says that you must accept >> subqueries with aliases. But does it say you must reject subqueries >> without aliases? I strongly doubt that. > If I'm reading my draft copy of the SQL:2003 spec right (and there's > a good chance that I'm not, as it's not the easiest document for me > to parse), aliases *are* required. Yes, they are. It's been like that since SQL92 if not before (I've never seen a copy of SQL89). I was meaning to go look up whether SQL2003 had gotten any laxer, but thanks for saving me the trouble. The question for us is whether we should exceed the spec by allowing something it does not. The fundamental reason why not is that we would have to invent an alias for the unnamed sub-select. That has several drawbacks: more code to try to pick an unused alias, more bug reports from people whose queries conflicted with whatever we picked. In return for which, we're providing absolutely zip real increase in functionality, and we're encouraging people to write unportable SQL-spec-incompatible code. There is a future-proofing argument too: if the committee ever decides this is a good idea (which may be unlikely after 15 years, but you never know), they could well define the default alias in a different way than we had done, and then we are stuck behind the eight ball. One could also assume that if the SQL committee has not made this oh-so-obvious improvement, there is some darn good reason why not. I'm not privy to their minutes so I don't know what arguments were made there, but I think we can eliminate "they never thought of it" as an explanation. Three generations of the spec have been written specifically to exclude this. In short, lots of downsides here, and not a whole lot of upside. regards, tom lane
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Michael Glaesemann > Sent: Wednesday, August 22, 2007 5:17 PM > To: Ben Tilly > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] SQL feature requests > > > On Aug 22, 2007, at 18:45 , Ben Tilly wrote: > > > 1. Just a minor annoyance, but why must subqueries in FROM clauses > > have an alias? > > It's required by the SQL standard, AIUI. I wonder what EnterpriseDB > does? > > > 2. Why is 'non-integer constant in GROUP BY' an error? > > > This works for now: > > > > case when true then true end > > > > but I don't know whether some future version of postgres might break > > my code by banning that as well. 1. The SQL standard requires an alias for the subquery, but many real-world SQL implementations relax this requirement in the case where it is unambiguous. The standard doesn't say you have to reject statements without the alias, it says only that you must accept the ones that do. PostgreSQL has many things in its SQL where we accept things that the standard doesn't require, and I don't see a good argument why it shouldn't allow this. 2. The SQL standard only allows column names in group by lists, NOT expressions. PostgreSQL extends the standard by allowing some, but not all, expressions in the group by list (somewhat inconsistent, in my view). Expressions in the group by list is actually a quite common extension. But PostgreSQL also extends the standard by attaching special meaning to simple integer constants in the group by list, and treating them as column numbers from the select list. As far as I remember, the SQL standard only allows that on ORDER BY clauses. Anyway, it seems reasonable to allow expressions in the group by, such as: Select a*10, sum(b) from x group by a*10; But once you allow that, it seems like you should allow any expression, even degenerate ones such as select 'foo',sum(b) from x group by 'foo';
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Chuck McDevitt > Sent: Wednesday, August 22, 2007 11:53 PM > To: Michael Glaesemann; Ben Tilly > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] SQL feature requests > > > -----Original Message----- > > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > > owner@postgresql.org] On Behalf Of Michael Glaesemann > > Sent: Wednesday, August 22, 2007 5:17 PM > > To: Ben Tilly > > Cc: pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] SQL feature requests > > > > > > On Aug 22, 2007, at 18:45 , Ben Tilly wrote: > > > > > 1. Just a minor annoyance, but why must subqueries in FROM clauses > > > have an alias? > > > > It's required by the SQL standard, AIUI. I wonder what EnterpriseDB > > does? > > > > > 2. Why is 'non-integer constant in GROUP BY' an error? > > > > > This works for now: > > > > > > case when true then true end > > > > > > but I don't know whether some future version of postgres might > break > > > my code by banning that as well. > > 1. The SQL standard requires an alias for the subquery, but many > real-world SQL implementations relax this requirement in the case where > it is unambiguous. The standard doesn't say you have to reject > statements without the alias, it says only that you must accept the > ones > that do. PostgreSQL has many things in its SQL where we accept things > that the standard doesn't require, and I don't see a good argument why > it shouldn't allow this. > > 2. The SQL standard only allows column names in group by lists, NOT > expressions. PostgreSQL extends the standard by allowing some, but not > all, expressions in the group by list (somewhat inconsistent, in my > view). Expressions in the group by list is actually a quite common > extension. But PostgreSQL also extends the standard by attaching > special meaning to simple integer constants in the group by list, and > treating them as column numbers from the select list. As far as I > remember, the SQL standard only allows that on ORDER BY clauses. > Anyway, it seems reasonable to allow expressions in the group by, such > as: > > Select a*10, sum(b) from x group by a*10; > > But once you allow that, it seems like you should allow any expression, > even degenerate ones such as > select 'foo',sum(b) from x group by 'foo'; > Just wanted to point out that the group by thing is really just syntactic sugar. You can always get a SQL standard compliant system to accept the constants this way: Select z,sum(b) from (Select 'foo',b from x) a1 (z,b) Group by z; This will work (although with PostgreSQL, you might have to cast the string constant to text, because it is "unknown" datatype).
> > If your implementation accepts: > > > > group by case when true then 'foo' end I think he meant: group by case when true then "foo" end > > What would that mean? Regardless of whether or not it's accepted, it > should have *some* meaning. > > It's not equivalent to GROUP BY "foo" Yea, but only because 'foo' is an identifier, and not a string constant. > test=# select record_id as foo > , count(observation_id) as bar > from observation > group by case when true > then 'foo' > end; > ERROR: column "observation.record_id" must appear in the GROUP BY > clause or be used in an aggregate function I think your example would be easier to understand if you removed the quotes. We don't detect the correctness of the above query. You can hardly say that this is a feature, but I am not inclined to see it as a troublesome bug eighter. Andreas
> > how much harder can it be to accept: > > > > group by 'foo' Presumably you meant group by "foo". Imho pg should accept group by "foo". It could be part of a constant removal, that also takes burden off the sort. e.g. in "select x, count(*) from bar where x=5 group by x", x could be removed since it is constant. > This is not about hardness of the implementation, but rather about > non-confusing behaviour I think. > > AFAIK, "group by 1" means "group by the first selected column", not > "group all rows together". But "group by 'foo'" would carry the second > meaning - "group all rows together". Yes. I don't see the issue. 1 is imho sufficiently different even from "1". Pg is not alone in allowing column number in group by. Andreas
"Zeugswetter Andreas ADI SD" <Andreas.Zeugswetter@s-itsolutions.at> writes: >> > how much harder can it be to accept: >> > >> > group by 'foo' > > Presumably you meant group by "foo". No that's the whole point. He meant the constant string 'foo' not the column identifier "foo" which certainly should work now. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > In short, lots of downsides here, and not a whole lot of upside. I highly doubt the spec would ever conflict with allowing the user to elide the aliases given that Oracle (and others?) have always allowed this. Moreover if it's been 15 years without them adding it surely that argues we can be pretty sure they won't add them? This seems like a particularly petty case compared to a lot of other extensions we do allow. Surely allowing arbitrary expressions in GROUP BY is far more likely to conflict in the future given how it constrains our grammar. And in theory that provides no added functionality over aside from programmer convenience as well. There are tons of extensions to the spec in the Postgres grammar. This would be one of the simplest safest ones. The upside is the convenience which after all is the same upside as most of our spec grammar extensions. Many many programmers are accustomed to entering ad-hoc queries of this form and forcing them to enter an alias for no purpose is just silly pedanticism from their point of view. The portability of ad-hoc queries is meaningless and if you don't refer to the alias in the query then it's truly pointless. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Gregory Stark escribió: > The upside is the convenience which after all is the same upside as most of > our spec grammar extensions. Many many programmers are accustomed to entering > ad-hoc queries of this form and forcing them to enter an alias for no purpose > is just silly pedanticism from their point of view. The portability of ad-hoc > queries is meaningless and if you don't refer to the alias in the query then > it's truly pointless. So there's the compromise: allow not specifying an alias only if it's not used in the rest of the query at all, so the subquery would be effectively anonymous. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote: > > On Aug 22, 2007, at 20:49 , Ben Tilly wrote: > > > If your implementation accepts: > > > > group by case when true then 'foo' end > > What would that mean? Regardless of whether or not it's accepted, it > should have *some* meaning. To my eyes it has a very clear meaning, we're grouping on an expression that happens to be a constant. Which happens to be the same for all rows. Which is a spectacularly useless thing to actually do, but the ability to do it happens to be convenient when I'm looking for something to terminate a series of commas in a dynamically built query. > It's not equivalent to GROUP BY "foo" I wouldn't want it to be. Strings and identifiers are very different things. [...] > *This* seems like a bug: > test=# select record_id > , count(observation_id) as bar > from observation > group by record_id > , case when true > then 'foo' > end; > record_id | bar > -----------+----- > 1 | 4 > 2 | 4 > 3 | 2 > (3 rows) Why does it seem like a bug to you? Turn it around, and tell me in what way is its behaviour surprising to someone who knows SQL. You asked to group on something that is the same for all rows. That group by condition did nothing. (Except rendered the syntax valid when it might not have been.) As I would expect. Furthermore ask yourself whether anyone who wrote that would likely have written it by accident. Cheers, Ben
"Ben Tilly" <btilly@gmail.com> writes: > ... But then I need this (possibly > empty) list to have a valid group by statement at the end. In Oracle > I used to just write it like this: > SELECT ... > GROUP BY $possible_term_1 > $possible_term_2 > $possible_term_3 > 'end of possible groupings' FWIW, that will work if you write it as 'end of possible groupings'::text I'm disinclined to accept it as-is because (a) it's highly likely to be a mistake, and (b) there's no principled way to assign a datatype to the expression, if we are to interpret it as an expression. Basically anything but an unadorned constant will work there. regards, tom lane
"Alvaro Herrera" <alvherre@commandprompt.com> writes: > Gregory Stark escribió: > >> The upside is the convenience which after all is the same upside as most of >> our spec grammar extensions. Many many programmers are accustomed to entering >> ad-hoc queries of this form and forcing them to enter an alias for no purpose >> is just silly pedanticism from their point of view. The portability of ad-hoc >> queries is meaningless and if you don't refer to the alias in the query then >> it's truly pointless. > > So there's the compromise: allow not specifying an alias only if it's > not used in the rest of the query at all, so the subquery would be > effectively anonymous. If what's not used in the rest of the query? If you haven't specified the alias what would you use? Surely even if we did generate an alias name nobody would think the generated name was guaranteed to be stable and reference it? I think the compromise is to instead of generating aliases at all just use an alias like "*Anonymous Subquery*" and add a boolean flag indicating that that range table is anonymous and not a valid target for references. I started doing that a while back but got distracted (and discouraged since it seemed not to have widespread support). IMHO even generating non-anonymous aliases like "*Anonymous Subquery*1" would be fine but I'm happy to have a flag hiding them too. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Aug 23, 2007, at 10:47 , Ben Tilly wrote: > On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote: >> >> *This* seems like a bug: >> test=# select record_id >> , count(observation_id) as bar >> from observation >> group by record_id >> , case when true >> then 'foo' >> end; >> record_id | bar >> -----------+----- >> 1 | 4 >> 2 | 4 >> 3 | 2 >> (3 rows) > > Why does it seem like a bug to you? > > Turn it around, and tell me in what way is its behaviour surprising to > someone who knows SQL. You asked to group on something that is the > same for all rows. That group by condition did nothing. (Except > rendered the syntax valid when it might not have been.) As I would > expect. Considering that I expect the GROUP BY clause to include only column references (or other groupings of column references), not expressions. Whether or not the value is the same, it surprises me that something other than a column reference is accepted at all. I hadn't realized this behavior was accepted in PostgreSQL, but I learn something new every day. My non-rigorous way of thinking about GROUP BY is that it groups this listed columns when the values of the listed columns are the same. An expression that evaluates to anything other than a column name doesn't provide any information about which column to consider grouped, and expressions don't evaluate to column names, or identifiers in general. If I understand you correctly, a GROUP BY item that isn't a column name would be a value that's applied to all columns, and the actual value is irrelevant—different values don't change the result. So the only purpose it would serve would be to prevent a trailing comma from raising a syntax error: you'd still need to explicitly list the other columns (unless the implementation behavior is changed to extend the spec there as well). What this does is allow you to use something like this (psuedocode): group_columns = [ 'foo', 'bar', 'baz' ] group_column_list = '' for col in group_columns { group_column_list += col + ',' } # group_column_list = "foo,bar,baz," group_by_clause = "GROUP BY $group_column_list CASE WHEN TRUE THEN 'quux' END" rather than group_column_list = join group_columns, ',' # group_column_list = "foo,bar,baz" group_by_clause = "GROUP BY $group_column_list" I still feel I'm missing something. If that's it, it seems like something easy enough to handle in middleware. Sorry if it appears I'm being dense. I've definitely learned things in this thread. > Furthermore ask yourself whether anyone who wrote that would likely > have written it by accident. I don't see what that has to do with anything. There are plenty of things I can write on purpose that would be nonsense. You might even consider my posts as prime examples :) Michael Glaesemann grzm seespotcode net
Gregory Stark <stark@enterprisedb.com> writes: > I highly doubt the spec would ever conflict with allowing the user to elide > the aliases given that Oracle (and others?) have always allowed this. Moreover > if it's been 15 years without them adding it surely that argues we can be > pretty sure they won't add them? The $64 question in my mind is exactly why hasn't the spec allowed this? It's clear that they have gone out of their way to not allow it, and I think it's unwise to say "oh let's do it" without understanding why not. > This seems like a particularly petty case compared to a lot of other > extensions we do allow. That's exactly the problem. Most of our other extensions are justified by some significant capability gain. This isn't --- it provides zero new functionality, and the "convenience" factor amounts to the saving of one keystroke (ok, maybe two if you insist on a space before the alias). Pretty weak argument... regards, tom lane
Gregory Stark escribió: > "Alvaro Herrera" <alvherre@commandprompt.com> writes: > > > Gregory Stark escribió: > > > >> The upside is the convenience which after all is the same upside as most of > >> our spec grammar extensions. Many many programmers are accustomed to entering > >> ad-hoc queries of this form and forcing them to enter an alias for no purpose > >> is just silly pedanticism from their point of view. The portability of ad-hoc > >> queries is meaningless and if you don't refer to the alias in the query then > >> it's truly pointless. > > > > So there's the compromise: allow not specifying an alias only if it's > > not used in the rest of the query at all, so the subquery would be > > effectively anonymous. > > If what's not used in the rest of the query? The alias, of course. If you reread Tom's argument, his problem is that the alias we choose may collide with what an hypotetical future standard may define, so the users who start to depend on the names with choose automatically would be screwed. > I think the compromise is to instead of generating aliases at all just use an > alias like "*Anonymous Subquery*" and add a boolean flag indicating that that > range table is anonymous and not a valid target for references. I started > doing that a while back but got distracted (and discouraged since it seemed > not to have widespread support). Hey, you are describing an implementation of my suggestion. -- Alvaro Herrera http://www.advogato.org/person/alvherre A male gynecologist is like an auto mechanic who never owned a car. (Carrie Snow)
On 8/23/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Ben Tilly" <btilly@gmail.com> writes: > > ... But then I need this (possibly > > empty) list to have a valid group by statement at the end. In Oracle > > I used to just write it like this: > > > SELECT ... > > GROUP BY $possible_term_1 > > $possible_term_2 > > $possible_term_3 > > 'end of possible groupings' > > FWIW, that will work if you write it as > > 'end of possible groupings'::text Ah, that is the solution that I'll move to. > I'm disinclined to accept it as-is because (a) it's highly likely to be > a mistake, and (b) there's no principled way to assign a datatype to the > expression, if we are to interpret it as an expression. > > Basically anything but an unadorned constant will work there. As long as that behaviour is guaranteed, I'm OK on this one. Ben
> > > This seems like a particularly petty case compared to a lot of other > > extensions we do allow. > > That's exactly the problem. Most of our other extensions are justified > by some significant capability gain. This isn't --- it provides zero > new functionality, and the "convenience" factor amounts to the saving > of > one keystroke (ok, maybe two if you insist on a space before the > alias). > Pretty weak argument... > > regards, tom lane > Tom, it isn't just a case of "convenience". When we are trying to convert users from another database (say Oracle for example) to PostgeSQL, one of the big stumbling blocks that slows down the work is all the little query changes that people have to make (people who might not have written the query as well), and it can require the review of hundreds or thousands of SQL scripts and applications. The harder it is, the more reluctant they are to convert. Sometimes supporting "de-facto" standards as well as official standards makes sense.
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> This seems like a particularly petty case compared to a lot of other >> extensions we do allow. > > That's exactly the problem. Most of our other extensions are justified > by some significant capability gain. This isn't --- it provides zero > new functionality, and the "convenience" factor amounts to the saving of > one keystroke (ok, maybe two if you insist on a space before the alias). > Pretty weak argument... I think we're talking about two different things. I'm referring to this: postgres=# select * from (select * from test); ERROR: subquery in FROM must have an alias HINT: For example, FROM (SELECT ...) [AS] foo. I get this about once a day because I frequently type this ad-hoc query and always forget to put the alias in the first time: postgres=# select count(*),n from (select count(*) as n from test group by i) group by n; ERROR: subquery in FROM must have an alias HINT: For example, FROM (SELECT ...) [AS] foo. Not to say that four is a lot larger than one but if we're talking about two different things then we're not likely to reach any kind of agreement... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Chuck McDevitt wrote: > Tom, it isn't just a case of "convenience". When we are trying to > convert users from another database (say Oracle for example) to > PostgeSQL, one of the big stumbling blocks that slows down the work is > all the little query changes that people have to make (people who might > not have written the query as well), and it can require the review of > hundreds or thousands of SQL scripts and applications. The harder it > is, the more reluctant they are to convert. > > Sometimes supporting "de-facto" standards as well as official standards > makes sense. That is not the goal of the project. Sincerely, Joshua D. Drake > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > - -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGzcMGATb/zqfZUUQRAugDAJsF4WZXNRWVFzaeae7WfvGxxkt6eACeOkpL 3BpeYUSL9FNxpIzfioFxxbw= =MVNX -----END PGP SIGNATURE-----
On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote: > > On Aug 22, 2007, at 20:36 , Ben Tilly wrote: [...] > > I can well believe that the standard says that you must accept > > subqueries with aliases. But does it say you must reject subqueries > > without aliases? I strongly doubt that. > > If I'm reading my draft copy of the SQL:2003 spec right (and there's > a good chance that I'm not, as it's not the easiest document for me > to parse), aliases *are* required. Again, it lays out very carefully the syntax that must be valid. It does not say that extensions to that syntax are not valid. Every database implements such extensions. [...] > > I have no clue what EnterpriseDB does. > > In case it wasn't clear, the reason I bring it up is that > EnterpriseDB, while working from a PostgreSQL base, strives for > Oracle compatibility. I got the reference. But I don't know what EnterpriseDB does - I've never used it. [...] > >> AIUI, Integers are only allowed because the SQL standard explicitly > >> allows you to refer to columns by the order they appear in the SELECT > >> list. Otherwise the GROUP BY items need to be column names. > > > > Need to be? > > > > The SQL-92 standard is clear that you must accept a list of column > > names. It is also clear that a column name must be be of the form > > field or table.field. > > The 2003 draft (same as above) seems to agree with the SQL92 standard: > [ large snippet of the draft elided ] > > There'd have to be a pretty strong reason to extend this, more than > just a convenience, I should think. It is already extended in postgres. For pretty good reasons. > > In no way, shape or form does that allow having > > terms like trim(foo.bar) in a group by. > > > > But every reasonable database that I know - including postgres - > > allows that. > > Can you give an example of something like this working in PostgreSQL? > I get an error when I try to use a text value in a GROUP BY clause. > (Or are you referring specifically to the CASE expression corner case?) This works in every reasonable database that I have tried it in: select trim(foo.bar), count(*) from foo group by trim(foo.bar) And yes, I have tried it in postgres. [...] > > Postgres explicitly disallows a constant character expression. But it > > allows the constant case expression that I gave. It would be nice for > > me to not have to remember that very obscure and convoluted case. > > I agree, and would move that it should be disallowed if there isn't a > reason for it to be maintained, for exactly the reason you give: > there shouldn't be such convoluted, and obscure corner case. There is an excellent reason to generally allow complex expressions in group by statements, and that reason is that many useful and reasonable queries won't work if you don't. Such as the one I gave above. [...] > > I don't know what the SQL spec says, but I know (having talked to > > other developers) that many people would find it very nice. > > Since I had the spec open, I tried to look at this as well, though I > must admit I found it very tough going. > > I think this is the key section: > > > 10.9 <aggregate function> > > > > ... > > > > Syntax Rules > > > > ... > > > > 4) The argument source of an <aggregate function> is > > Case: > > a) If AF is immediately contained in a <set function > > specification>, then a table or group of a grouped table as > > specified in Subclause 7.10, "<having clause>", and Subclause 7.12, > > "<query specification>". > > > > b) Otherwise, the collection of rows in the current row's window > > frame defined by the window structure descriptor identified by the > > <window function> that simply contains AF, as defined in Subclause > > 7.11, "<window clause>". > > Now the <query specification> stuff is pretty dense, and the <window > clause> is currently impenetrable for me, so I just looked at the > <having clause> and <query specification> sections referenced above. I'm not surprised that the window clause section is impenetrable to you. Window clauses are part of the definition of analytic functions, which postgres does NOT yet implement. However they are on the todo list. Speaking personally, analytic functions are the single feature from Oracle that I've most missed when moving to postgres. I would be happy to explain what they are and how they should work either on the list or off to anyone who is interested in implementing them. However they are a topic for another thread, and probably for another week. (I'm out next week, and am loathe to open that can of worms just yet.) However I will note as a practical matter that implementing analytic functions will increase how often people will want to have anonymous subqueries in the from statement. (Because for a certain set of problems a useful technique is to do a group by query of an analytic query.) [...] > > A grouped table is a set of groups derived during the evaluation of > > a <group by clause>. A group G is a collection of rows in which, > > for every grouping column GC, if the value of GC in some row is not > > distinct from GV, then the value of GC in every row is GV; > > moreover, if R1 is a row in group G1 of grouped table GT and R2 is > > a row in GT such that for every grouping column GC the value of GC > > in R1 is not distinct from the value of GC in R2, then R2 is in G1. > > Every row in GT is in exactly one group. A group may be considered > > as a table. Set functions operate on groups. > > So we're already talking about an explicit GROUP BY clause, and it > doesn't seem to leave much wiggle room grouping columns that aren't > explicitly listed. I'll readily admit I very easily could be reading > this wrong or have missed a section of the spec that discusses this > in more detail. (And that goes for any of my interpretations of the > spec!) Again, the spec specifies what must be accepted. It does not ban extensions. Postgres already implements the useful extension of accepting complicated expressions in a group by. It could implement more. > One place you and I differ is that I think you'd be in favor of many > more extensions in cases where the SQL spec doesn't explicitly say > "Thou shalt not do X". I'm a bit more conservative here: in my > opinion, if the SQL speaks to a particular issue (for example, a > WHERE clause), then you should pretty much keep to the letter of what > it says. Extensions are more for areas where the standard is silent. We definitely differ. To my eyes an extension that is well defined and clearly useful which is currently illegal deserves consideration. Utility is not the only concern, of course, you have to think about how well it fits with other features, its likelyhood of breaking in the future, etc. Inferring the group by statement that is likely to be meant the vast majority of the time is clearly useful. It would get rid of a lot of useless redundancy in a lot of people's queries. It may be that it is deemed too much work or too risky to implement. > For example, full text search will be added to PostgreSQL in 8.3. > AFAIK, the SQL spec doesn't talk about full text search at all, so > this is an area where extensions are a natural fit. Others, such as > PostgreSQL's DISTINCT ON, is a much more difficult issue, and I'd > suspect if it were proposed as a new feature now it would be > rejected. However, it's been a part of PostgreSQL for a long time, so > for historical reasons it remains. Likewise, I doubt "text" as a > synonym for "varchar" would be accepted. If we had this conversation in 15 years, I'm willing to bet that full text search would be in the spec and you'd be able to complain that something about how it was implemented was not quite standard but would have to remain for historical reasons... > Anyway, enough excursions into SpecLand for me. I'd welcome anyone > else taking a crack at it. Cheers, Ben
Chuck McDevitt wrote: > Sometimes supporting "de-facto" standards as well as official standards > makes sense. > > > On that basis we would support huge pieces of stuff that emulates MySQL too. Or perhaps you'd like us to follow Oracle's treatment of NULL. Who gets to choose what is the de facto standard we follow? cheers andrew
"Chuck McDevitt" <cmcdevitt@greenplum.com> writes: > Tom, it isn't just a case of "convenience". When we are trying to > convert users from another database (say Oracle for example) to > PostgeSQL, one of the big stumbling blocks that slows down the work is > all the little query changes that people have to make Well, if you're trying to sell it on the grounds of Oracle compatibility, then it should actually *be* Oracle compatible. What exactly do they do about the default-alias problem? regards, tom lane
Gregory Stark escribió: > I get this about once a day because I frequently type this ad-hoc query and > always forget to put the alias in the first time: > > postgres=# select count(*),n from (select count(*) as n from test group by i) group by n; > ERROR: subquery in FROM must have an alias > HINT: For example, FROM (SELECT ...) [AS] foo. Exactly one extra keystroke gets you a non-error answer: alvherre=# select count(*),n from (select count(*) as n from test group by i)a group by n; -- Alvaro Herrera http://www.PlanetPostgreSQL.org/ "Si quieres ser creativo, aprende el arte de perder el tiempo"
Chuck McDevitt wrote: >Tom, it isn't just a case of "convenience". When we are trying to >convert users from another database (say Oracle for example) to >PostgeSQL, one of the big stumbling blocks that slows down the work is >all the little query changes that people have to make (people who might >not have written the query as well), and it can require the review of >hundreds or thousands of SQL scripts and applications. The harder it >is, the more reluctant they are to convert. > >Sometimes supporting "de-facto" standards as well as official standards >makes sense. > > One of the ways "de-facto" standards are even better than standards is that there are even more of them. Which de-facto standard should we follow: Oracle, MySQL, or MS SQL Server? Note that following all of them is a bad idea, as one persons convience is another persons stumbling block. Brian
On Aug 23, 2007, at 12:25 , Ben Tilly wrote: > It is already extended in postgres. For pretty good reasons. Yes, this particular point is moot. Whether or not the reasons are good is another thing and a difference of opinion. > I'm not surprised that the window clause section is impenetrable to > you. Window clauses are part of the definition of analytic functions, > which postgres does NOT yet implement. However they are on the todo > list. And being worked on, from what I gather. Just not for 8.3. So hopefully you won't have to wait too much longer. Michael Glaesemann grzm seespotcode net
On 8/23/07, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Chuck McDevitt" <cmcdevitt@greenplum.com> writes: > > Tom, it isn't just a case of "convenience". When we are trying to > > convert users from another database (say Oracle for example) to > > PostgeSQL, one of the big stumbling blocks that slows down the work is > > all the little query changes that people have to make > > Well, if you're trying to sell it on the grounds of Oracle > compatibility, then it should actually *be* Oracle compatible. > What exactly do they do about the default-alias problem? To the best of my knowledge such subqueries are completely anonymous. There is no way to explicitly refer to them unless you provide an alias. Which is exactly the solution that was proposed twice in this thread, and has the further benefit of being forwards compatible with any reasonable future standard. As verification I asked a certified Oracle DBA. His understanding is that Oracle may choose to rewrite the query for you or not. If it does not rewrite the query, then it has an internal identifier but there is no way you can get to it. Cheers, Ben
On 8/23/07, Michael Glaesemann <grzm@seespotcode.net> wrote: > On Aug 23, 2007, at 10:47 , Ben Tilly wrote: [...] > > Why does it seem like a bug to you? > > > > Turn it around, and tell me in what way is its behaviour surprising to > > someone who knows SQL. You asked to group on something that is the > > same for all rows. That group by condition did nothing. (Except > > rendered the syntax valid when it might not have been.) As I would > > expect. > > Considering that I expect the GROUP BY clause to include only column > references (or other groupings of column references), not > expressions. Whether or not the value is the same, it surprises me > that something other than a column reference is accepted at all. I > hadn't realized this behavior was accepted in PostgreSQL, but I learn > something new every day. That's obviously a very different point of view than mine. Over the last decade of using relational databases I've always assumed that any non-aggregate expression that is valid in a select is valid in a group by, and I've only rarely been surprised by this. (The only exceptions that I can list off of the top of my head are that integer constants can refer to columns by position, and text constants are not allowed by postgres.) > My non-rigorous way of thinking about GROUP BY is that it groups this > listed columns when the values of the listed columns are the same. An > expression that evaluates to anything other than a column name > doesn't provide any information about which column to consider > grouped, and expressions don't evaluate to column names, or > identifiers in general. If I understand you correctly, a GROUP BY > item that isn't a column name would be a value that's applied to all > columns, and the actual value is irrelevant—different values don't > change the result. That's not quite how it works. The better non-rigorous way of thinking about it is that any non-aggregate function you can put in a select is allowed in a group by. So if I group by trim(foo.bar), I will be grouping rows based on the distinct values of trim(foo.bar). So the values 'baz', ' baz', 'baz ' and ' baz ' would all be rolled up into one row in the group by query. But the value 'blat' would wind up in another row. The case of a constant expression is the logical (if normally useless) extension of this. > So the only purpose it would serve would be to prevent a trailing > comma from raising a syntax error: you'd still need to explicitly > list the other columns (unless the implementation behavior is changed > to extend the spec there as well). What this does is allow you to use > something like this (psuedocode): That is the only purpose of a constant expression is that. > group_columns = [ 'foo', 'bar', 'baz' ] > > group_column_list = '' > for col in group_columns { group_column_list += col + ',' } # > group_column_list = "foo,bar,baz," > > group_by_clause = "GROUP BY $group_column_list CASE WHEN TRUE THEN > 'quux' END" > > rather than > > group_column_list = join group_columns, ',' # group_column_list = > "foo,bar,baz" > group_by_clause = "GROUP BY $group_column_list" > > I still feel I'm missing something. If that's it, it seems like > something easy enough to handle in middleware. Sorry if it appears > I'm being dense. I've definitely learned things in this thread. That's mostly right. However don't forget the group_columns might be an empty list, and in that case you need to optionally not have a group by clause at all. (Yes, in some of my queries this is a very real possibility.) Plus a bit of context. This comes up for me in reports which are basically implemented as a series of queries using temp tables. So making the generation of SQL more convoluted significantly increases the complexity of the code. (Writing reports is most of my job, so I get to write lots and lots of these.) > > Furthermore ask yourself whether anyone who wrote that would likely > > have written it by accident. > > I don't see what that has to do with anything. There are plenty of > things I can write on purpose that would be nonsense. You might even > consider my posts as prime examples :) When you have a boundary case, sometimes you really want to answer the question, "Is this case likely to be a result of confusion?" Which question is particularly relevant in this case because my strong suspicion is that constant text expressions are banned in PostgreSQL explicitly because of fears that they are a result of confusion. Cheers, Ben
Ben Tilly wrote: > On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote: >> On Aug 22, 2007, at 20:49 , Ben Tilly wrote: >> >>> If your implementation accepts: >>> >>> group by case when true then 'foo' end >> What would that mean? Regardless of whether or not it's accepted, it >> should have *some* meaning. > > To my eyes it has a very clear meaning, we're grouping on an > expression that happens to be a constant. Which happens to be the > same for all rows. Which is a spectacularly useless thing to actually > do, but the ability to do it happens to be convenient when I'm looking > for something to terminate a series of commas in a dynamically built > query. Which is the same very clear meaning that "group by 1" has - we're grouping on a expression which happens to be the constant 1. Hey, wait a second. This isn't what "group by 1" means at all - it rather means group by whatever the fist column in the select list is. So, yes, "group by 'foo'" *seems* to have a very clear meaning - but that clearness vanishes as soon as you take into account what "group by 1" means. greetings, Florian Pflug
On Thursday 23 August 2007 09:55, Alvaro Herrera wrote: > > I think the compromise is to instead of generating aliases at all just > > use an alias like "*Anonymous Subquery*" and add a boolean flag > > indicating that that range table is anonymous and not a valid target for > > references. I started doing that a while back but got distracted (and > > discouraged since it seemed not to have widespread support). > > Hey, you are describing an implementation of my suggestion. I just don't really see the gain in doing this. Any violations of the SQL standard are bad things in and of themselves; they encourage people to write non-spec applications which are then hard to port/support alongside other databases. It's also bad PR for us. So if we're going to support a non-spec feature (gods know we have) then it's going to be because it offers *significant* extra functionality. SELECT DISTINCT ON (), LIMIT/OFFSET, custom data types, CREATE OPERATOR, RULES, etc. all give us stuff that would be *very* hard to do -- or even impossible -- without these features. I just don't see the ability to omit the alias in a query with only one subquery (the only circumstances under which it would be safe to do so) as any significant gain in fuctionality. -- Josh Berkus PostgreSQL @ Sun San Francisco
> -----Original Message----- > From: Andrew Dunstan [mailto:andrew@dunslane.net] > Sent: Thursday, August 23, 2007 10:26 AM > To: Chuck McDevitt > Cc: Tom Lane; Gregory Stark; Michael Glaesemann; Ben Tilly; pgsql- > hackers@postgresql.org > Subject: Re: [HACKERS] SQL feature requests > > > > Chuck McDevitt wrote: > > Sometimes supporting "de-facto" standards as well as official > standards > > makes sense. > > > > > > > > On that basis we would support huge pieces of stuff that emulates MySQL > too. Or perhaps you'd like us to follow Oracle's treatment of NULL. Who > gets to choose what is the de facto standard we follow? > > cheers > > Andrew You must be joking... PostgreSQL already has a huge amount of "non-standard" syntax and semantics (perhaps "extensions" is a better word?). Everything from non-standard cast operator, non-standard substr, non-standard trim, non standard group by semantics (allowing simple ints to mean column number)... Given a day, we could probably write down several pages of "non-standard" features of PGSQL. Who decides what de facto standards to support, and which not? The PostgreSQL community of course. In general, we wouldn't want to support any de facto standard that: 1. Is supported only by one vendor 2. Causes any standard SQL statement to fail, or return a different answer from the standard. The proposed change doesn't fail either of these.
On 8/23/07, Florian G. Pflug <fgp@phlo.org> wrote: > Ben Tilly wrote: > > On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote: > >> On Aug 22, 2007, at 20:49 , Ben Tilly wrote: > >> > >>> If your implementation accepts: > >>> > >>> group by case when true then 'foo' end > >> What would that mean? Regardless of whether or not it's accepted, it > >> should have *some* meaning. > > > > To my eyes it has a very clear meaning, we're grouping on an > > expression that happens to be a constant. Which happens to be the > > same for all rows. Which is a spectacularly useless thing to actually > > do, but the ability to do it happens to be convenient when I'm looking > > for something to terminate a series of commas in a dynamically built > > query. > > Which is the same very clear meaning that "group by 1" has - we're > grouping on a expression which happens to be the constant 1. Hey, > wait a second. This isn't what "group by 1" means at all - it > rather means group by whatever the fist column in the select list is. Which feature shocked me when I first saw it in Oracle. It violated every expectation that I have. I also deliberately do NOT use that feature. Because it is not safe if someone else is possibly going to edit your query. Add a field in a natural place and, oops, your query just broke. Also I hate referring to things by position in code. Particularly when they are far away from each other as they may be in a large query. (I've written queries that are over 1000 lines long in the past.) > So, yes, "group by 'foo'" *seems* to have a very clear meaning - but > that clearness vanishes as soon as you take into account what "group by 1" > means. I'm happy to use "group by 'foo'::text" instead. Anyone else in my position will have to stumble on their own solution, but I don't think there are that many in my position. Cheers, Ben
> -----Original Message----- > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers- > owner@postgresql.org] On Behalf Of Florian G. Pflug > Sent: Thursday, August 23, 2007 11:04 AM > To: Ben Tilly > Cc: Michael Glaesemann; Gregory Stark; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] SQL feature requests > > Ben Tilly wrote: > > On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote: > >> On Aug 22, 2007, at 20:49 , Ben Tilly wrote: > >> > >>> If your implementation accepts: > >>> > >>> group by case when true then 'foo' end > >> What would that mean? Regardless of whether or not it's accepted, it > >> should have *some* meaning. > > > > To my eyes it has a very clear meaning, we're grouping on an > > expression that happens to be a constant. Which happens to be the > > same for all rows. Which is a spectacularly useless thing to > actually > > do, but the ability to do it happens to be convenient when I'm > looking > > for something to terminate a series of commas in a dynamically built > > query. > > Which is the same very clear meaning that "group by 1" has - we're > grouping on a expression which happens to be the constant 1. Hey, > wait a second. This isn't what "group by 1" means at all - it > rather means group by whatever the fist column in the select list is. > > So, yes, "group by 'foo'" *seems* to have a very clear meaning - but > that clearness vanishes as soon as you take into account what "group by > 1" > means. > > greetings, Florian Pflug > Except "group by 1" meaning "group by column 1" is a PostgreSQL extension, not a SQL standard feature, if I recall. Anyway, I suppose this should work like ORDER BY... For some reason, we allow all expressions in ORDER BY *except* the degeneratecase of a constant (ugly). Expressions in ORDER BY are a PostgreSQL extension also... Not sure why we disallow the case of a constant, except somebodywas worried that it would confuse users, because simple integer constants are treated special. But it seems strange that this is legal in PostgreSQL: Select * from x order by trim('foo'); But this is illegal: Select * from x order by 'foo'; And this is accepted, but orders on the constant "1" rather than on column 1: select * from x order by 1::int;
Chuck McDevitt wrote: > PostgreSQL already has a huge amount of > "non-standard" syntax and semantics (perhaps "extensions" is a better > word?). > Everything from non-standard cast operator, non-standard substr, > non-standard trim, non standard group by semantics (allowing simple ints > to mean column number)... Given a day, we could probably write down > several pages of "non-standard" features of PGSQL. > > > Quite so, and I've perpetrated a few myself. But for the most part they are either there for legacy reasons or add significant extra functionality. I rather like Alvaro's compromise suggestion re aliases in fact. At least there seems to be a better case for that than for "group by 'blurfl'". But the argument that convinces me is not that it follows some de facto standard, but that it would add to clarity. Requiring an alias where it isn't used seems to me a piece of less than excellent design. cheers andrew
Josh Berkus <josh@agliodbs.com> writes: > I just don't see the ability to omit the alias in a query with only one > subquery (the only circumstances under which it would be safe to do so) as > any significant gain in fuctionality. Why do you think it'd be restricted to only one subquery? As long as you take care that the subquery's column names don't match any other ones in the query, you don't *need* an alias for it --- there'll be no need to qualify the column names. This extends just fine to multiple subqueries. (But I'm still bugged by the idea that there's some deep reason for the SQL committee to have disallowed this syntax. There's certainly not any obvious reason why they did so. Conceivably it was just politics --- some Big Vendor using a parser that couldn't deal with it --- but maybe there is something else, like a planned future extension that'd break this syntax.) regards, tom lane
On Aug 23, 2007, at 14:25 , Tom Lane wrote: > Josh Berkus <josh@agliodbs.com> writes: >> I just don't see the ability to omit the alias in a query with >> only one >> subquery (the only circumstances under which it would be safe to >> do so) as >> any significant gain in fuctionality. > > Why do you think it'd be restricted to only one subquery? > > As long as you take care that the subquery's column names don't match > any other ones in the query, you don't *need* an alias for it --- > there'll be no need to qualify the column names. This extends just > fine to multiple subqueries. How about something like gensym? One alias you could always use and be guaranteed it would give a unique value. Still provide the alias, but don't have to think about name collisions. Michael Glaesemann grzm seespotcode net
Tom, > As long as you take care that the subquery's column names don't match > any other ones in the query, you don't *need* an alias for it --- > there'll be no need to qualify the column names. This extends just > fine to multiple subqueries. Oh, hmmm. I was thinking this would somehow be more of an issue with anonymous subqueries, but it isn't really. You can still duplicate column names in named subqueries. So, never mind that. I still don't see it as terrifically useful functionality, given that it's just saving you 4 keystrokes ... -- Josh Berkus PostgreSQL @ Sun San Francisco
"Chuck McDevitt" <cmcdevitt@greenplum.com> writes: > Except "group by 1" meaning "group by column 1" is a PostgreSQL extension, not a SQL standard feature, if I recall. Correct. Part of the reason for being conservative about changing here is that we've got a mix of standard and nonstandard behaviors with to-some-extent conflicting behavior. Rejecting cases that are on the borderline between the behaviors seems like a safer course than accepting them and maybe doing something different than the user expects. A lot of this is legacy behavior that would never have passed muster if it had been newly proposed in the last few years --- we have gotten *far* stricter about SQL compliance than we used to be. But at this point backwards compatibility also has to weigh heavily. > Expressions in ORDER BY are a PostgreSQL extension also... Nyet --- they are required by SQL99 and later. SQL92 and before had "ORDER BY output-column-name-or-number" (and nothing else). SQL99 replaced that with ORDER BY <expression>, which they then bastardized so that it could include output column names, allowing them to claim that they'd only eliminated the output-column-number variant. What we support is a rather unholy combination of the two generations of the spec. People are quite used to ORDER BY 1 and so I doubt we'll ever want to eliminate the special case for it. regards, tom lane
Michael Glaesemann wrote: > > On Aug 23, 2007, at 14:25 , Tom Lane wrote: > >> Josh Berkus <josh@agliodbs.com> writes: >>> I just don't see the ability to omit the alias in a query with only one >>> subquery (the only circumstances under which it would be safe to do so) >>> as >>> any significant gain in fuctionality. >> >> Why do you think it'd be restricted to only one subquery? >> >> As long as you take care that the subquery's column names don't match >> any other ones in the query, you don't *need* an alias for it --- >> there'll be no need to qualify the column names. This extends just >> fine to multiple subqueries. > > How about something like gensym? One alias you could always use and be > guaranteed it would give a unique value. Still provide the alias, but don't > have to think about name collisions. It is dangerous to provide a synthetic name; if the standard ever gets modified to support alias-less subqueries, they would likely choose a different name-generating algorithm, and we would have a backward-compatibility problem. Or is that a backwards-compatibility problem? I remain unsure. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Josh Berkus <josh@agliodbs.com> writes: > I still don't see it as terrifically useful functionality, given that it's > just saving you 4 keystrokes ... Less than that, because the AS is optional. The only argument I've heard that carries much weight with me is that it eases porting from other DBMS's that allow this. Are there any others besides Oracle? In a quick check, mysql 5.0.45 hews to the straight and narrow path: mysql> create table t(f1 int); Query OK, 0 rows affected (0.01 sec) mysql> select * from (select f1 from t) x; Empty set (0.00 sec) mysql> select * from (select f1 from t) ; ERROR 1248 (42000): Every derived table must have its own alias mysql> so you don't get to point to them as one that we'd improve compatibility with. regards, tom lane
Tom, > so you don't get to point to them as one that we'd improve > compatibility with. Derby & DB2 are also strict. -- Josh Berkus PostgreSQL @ Sun San Francisco
Alvaro Herrera <alvherre@commandprompt.com> writes: > Michael Glaesemann wrote: >> How about something like gensym? > It is dangerous to provide a synthetic name; if the standard ever gets > modified to support alias-less subqueries, they would likely choose a > different name-generating algorithm, and we would have a > backward-compatibility problem. I concur. If we do this, the thing to do is have *no* alias, ie, there is no way to qualify the names of the subquery output columns. (IIRC unaliased joins already work approximately like that.) Then there's not anything to have a backward compatibility problem with. regards, tom lane
>>> On Thu, Aug 23, 2007 at 3:01 PM, in message <11856.1187899268@sss.pgh.pa.us>, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The only argument I've > heard that carries much weight with me is that it eases porting from > other DBMS's that allow this. Are there any others besides Oracle? > select * from (select f1 from t) In Sybase: com.sybase.jdbc2.jdbc.SybSQLException: The derived table expression is missing a correlation name. Check derived table syntaxin the Reference Manual. Error code: 11753 SQL state: ZZZZZ
Kevin Grittner wrote: >>>> On Thu, Aug 23, 2007 at 3:01 PM, in message <11856.1187899268@sss.pgh.pa.us>, > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The only argument I've >> heard that carries much weight with me is that it eases porting from >> other DBMS's that allow this. Are there any others besides Oracle? > >> select * from (select f1 from t) > > In Sybase: > > com.sybase.jdbc2.jdbc.SybSQLException: The derived table expression is missing a correlation name. Check derived tablesyntax in the Reference Manual. > Error code: 11753 > SQL state: ZZZZZ The really funny thing is that pgsql, mysql and at least sybase *explicitly* dissallow the no-alias case. Which shows that .) This seems to be common source of confusion and errors. .)Aliasless-Subqueries wouldn't lead to ambigous grammras in those databases. Otherwise, you'd expect to get some moregeneric syntax error, and not the very explicit "No alias, but expected one". I agree with Tom - knowing *why* the standard committee disallows that syntax - and why everybody except oracle chose to agree with it would be quite interesting. greetings, Florian Pflug
Florian G. Pflug wrote: > knowing *why* the standard committee disallows that syntax - > and why everybody except oracle chose to agree with it would be quite > interesting. Consider that we rightfully disallow SELECT * FROM tab1, tab1; because there are two tables with the same name involved. This hints to the fact that the internal query processing needs to have a unique name for each table source. You could argue why that would have to be, but it seems quite useful. Otherwise you would also have to allow SELECT * FROM tab1 AS a, tab2 AS a; and that cannot possibly be good. If you believe that, then a table source without any name at all, such as one produced by subqueries, should also be disallowed. Requiring a unique name appears to be a stronger condition than requiring a name at all. This is actually false if you consider that SQL itself does distinguish between unique constraints and not-null constraints. But you don't really want to invent a logic by which nameless table sources are managed. Strict logic would probably require the results of all such queries to be unknown. Note that Oracle is notorious for mishandling null values in various ways. So it's not surprising that they mishandle "null" table names in what appears to be pretty much the same way. -- Peter Eisentraut http://developer.postgresql.org/~petere/
"Florian G. Pflug" <fgp@phlo.org> writes: > The really funny thing is that pgsql, mysql and at least sybase > *explicitly* dissallow the no-alias case. I knew that pgsql does that, because I made sure it did ;-). But it is pretty interesting that these other DBMSes also go out of their way to produce a specific error. As you say, that suggests that they could have accepted it, if they'd wanted to, at least from the standpoint of not having a grammar structure problem. regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes: > Florian G. Pflug wrote: >> knowing *why* the standard committee disallows that syntax - >> and why everybody except oracle chose to agree with it would be quite >> interesting. > ... > If you believe that, then a table source without any name at all, such > as one produced by subqueries, should also be disallowed. Good point. However, what I would kind of expect the standards committee to do is to specify that each such subquery has an automatically generated name assigned to it. Compare what SQL92 says about assigning aliases for SELECT output columns: a) If the i-th <derived column> in the <select list> specifies an <as clause> that contains a <columnname> C, then the <column name> of the i-th column of the result is C. b) If the i-th <derived column> in the <select list> does not specify an <as clause> and the <valueexpression> of that <derived column> is a single <column reference>, then the <column name>of the i-th column of the result is C. c) Otherwise, the <column name> of the i-th column of the <query specification> is implementation-dependentand different from the <column name> of any column, other than itself, of a table referenced by any <table reference> contained in the SQL-statement. Implementation-dependent is a term of art here: it means that each DBMS can do whatever the heck it wants. (But note that we fail to meet (c) exactly, since we don't bother to generate names that are distinct --- but in practice no one seems to care about that.) If we could expect that any future relaxation of the spec would likewise specify generating implementation-dependent unique table aliases for subqueries, then we'd not be having this discussion. What troubles me most is why the spec hasn't had identical verbiage for table aliases as it does for column aliases, since day one. There's *something* going on there that we don't know about. (BTW, your argument would seem to also disallow alias-less JOIN tables, but the spec has allowed those all along.) regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > c) Otherwise, the <column name> of the i-th column of the <query > specification> is implementation-dependent and different > from the <column name> of any column, other than itself, of > a table referenced by any <table reference> contained in the > SQL-statement. > > Implementation-dependent is a term of art here: it means that each DBMS > can do whatever the heck it wants. (But note that we fail to meet (c) > exactly, since we don't bother to generate names that are distinct --- > but in practice no one seems to care about that.) Actually I suspect there are people who get annoyed by it when they try to reference a column by name in a client driver like DBI which allows that. Note that if you use something like fetchrow_hashref it will actually condense out duplicate column names since it loads the row into a hash. So if you you're writing a program which just wants to dump the record without understanding it you probably load it into a hash and then dump the hash in key=>value form. And that will cause some columns to be dropped in the output. But those people probably just figure it was their own fault and put in aliases in their queries. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Thu, Aug 23, 2007 at 02:06:16PM -0400, Chuck McDevitt wrote: > In general, we wouldn't want to support any de facto standard that: > > 1. Is supported only by one vendor > 2. Causes any standard SQL statement to fail, or return a different > answer from the standard. > > The proposed change doesn't fail either of these. From what I can see upthread, it fails 1 and possibly 2. Given that we don't seem to know _why_ it is forbidden, there could well be a case under 2 is a problem, and we haven't thought of it. A -- Andrew Sullivan | ajs@crankycanuck.ca Everything that happens in the world happens at some place. --Jane Jacobs
Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> ... note that we fail to meet (c) >> exactly, since we don't bother to generate names that are distinct --- >> but in practice no one seems to care about that.) > Actually I suspect there are people who get annoyed by it when they try to > reference a column by name in a client driver like DBI which allows that. > Note that if you use something like fetchrow_hashref it will actually condense > out duplicate column names since it loads the row into a hash. So if you > you're writing a program which just wants to dump the record without > understanding it you probably load it into a hash and then dump the hash in > key=>value form. And that will cause some columns to be dropped in the output. > But those people probably just figure it was their own fault and put in > aliases in their queries. Well, if you're using client-side code that depends on access by name rather than field position, you definitely have to put in AS clauses. Even if we did generate distinct names, a client couldn't rely on knowing in advance what they'd be. regards, tom lane
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: > >> Note that if you use something like fetchrow_hashref it will actually condense >> out duplicate column names since it loads the row into a hash. So if you >> you're writing a program which just wants to dump the record without >> understanding it you probably load it into a hash and then dump the hash in >> key=>value form. And that will cause some columns to be dropped in the output. > >> But those people probably just figure it was their own fault and put in >> aliases in their queries. > > Well, if you're using client-side code that depends on access by name > rather than field position, you definitely have to put in AS clauses. > Even if we did generate distinct names, a client couldn't rely on > knowing in advance what they'd be. That's why I got tied up trying to describe a scenario where you wouldn't have to rely on knowing in advance what they would be. If you're running some kind of reporting tool it could let the user type in arbitrary queries and then look at what names are returned from the describe message to put in its column headings. If such a tool was written in perl using fetchrow_hashref I think it would end up receiving only one of each distinct column name. Whereas it should be able to depend on receiving all of them with distinct names, even if it won't know what those names will be in advance. If it's written to use arrays and column positions then it would still work properly though. And we haven't seen any real complaints. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Tom Lane wrote: > Part of the reason for being conservative about changing here > is that we've got a mix of standard and nonstandard behaviors > > A lot of this is legacy behavior that would never have passed muster > if it had been newly proposed in the last few years --- we have gotten > *far* stricter about SQL compliance than we used to be. But at this > point backwards compatibility also has to weigh heavily. Has there been any thought to eventually phasing them out? Perhaps a GUC to give warnings in the log file when they're encountered. I guess we'd need 3 levels of warnings, "off, "reasonable" and "pedantic". When set to the reasonable level it could only give smart warning messages like Warning: Use of frivolous nonstandard behavior XXX. Hint: Use the standard YYY instead. and when set to pedantic it would point out every non-standard SQL statement - useful only for someone to be aware of how much postgresql dependent behavior they might have. Then a farther future release could deprecate the frivolous non-standard pieces presumably leading to simpler code in the long run.
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: > Warning: Use of frivolous nonstandard behavior XXX. > Hint: Use the standard YYY instead. If these behaviors were really "frivolous" this might fly, but I don't think the majority of users will hold still for the removal of either GROUP BY 1 or ORDER BY 1. They're embedded in popular usage, and at least one of them was in the standard not so long ago. Moreover, despite the request that started this thread, the "correct" behavior for these syntaxes is utterly useless. regards, tom lane
> > I still don't see it as terrifically useful functionality, given that it's > > just saving you 4 keystrokes ... > > Less than that, because the AS is optional. The only argument I've > heard that carries much weight with me is that it eases porting from > other DBMS's that allow this. Are there any others besides Oracle? FWIW IBM implemented this syntax in Informix 11.1 Cheetah released in July 2007. They also allow to omit the alias, and it imho makes sense. Andreas
This has been saved for the 8.4 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- Andrew Dunstan wrote: > > > Chuck McDevitt wrote: > > PostgreSQL already has a huge amount of > > "non-standard" syntax and semantics (perhaps "extensions" is a better > > word?). > > Everything from non-standard cast operator, non-standard substr, > > non-standard trim, non standard group by semantics (allowing simple ints > > to mean column number)... Given a day, we could probably write down > > several pages of "non-standard" features of PGSQL. > > > > > > > > Quite so, and I've perpetrated a few myself. But for the most part they > are either there for legacy reasons or add significant extra functionality. > > I rather like Alvaro's compromise suggestion re aliases in fact. At > least there seems to be a better case for that than for "group by 'blurfl'". > > But the argument that convinces me is not that it follows some de facto > standard, but that it would add to clarity. Requiring an alias where it > isn't used seems to me a piece of less than excellent design. > > cheers > > andrew > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
FYI, we decided we didn't want this additional capability. --------------------------------------------------------------------------- Florian G. Pflug wrote: > Kevin Grittner wrote: > >>>> On Thu, Aug 23, 2007 at 3:01 PM, in message <11856.1187899268@sss.pgh.pa.us>, > > Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> The only argument I've > >> heard that carries much weight with me is that it eases porting from > >> other DBMS's that allow this. Are there any others besides Oracle? > > > >> select * from (select f1 from t) > > > > In Sybase: > > > > com.sybase.jdbc2.jdbc.SybSQLException: The derived table expression is missing a correlation name. Check derived tablesyntax in the Reference Manual. > > Error code: 11753 > > SQL state: ZZZZZ > > The really funny thing is that pgsql, mysql and at least sybase > *explicitly* dissallow the no-alias case. Which shows that > .) This seems to be common source of confusion and errors. > .) Aliasless-Subqueries wouldn't lead to ambigous grammras in those databases. > Otherwise, you'd expect to get some more generic syntax error, and not > the very explicit "No alias, but expected one". > > I agree with Tom - knowing *why* the standard committee disallows that syntax - > and why everybody except oracle chose to agree with it would be quite interesting. > > greetings, Florian Pflug > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +