Thread: SQL feature requests

SQL feature requests

From
"Ben Tilly"
Date:
Hopefully this is the right place for a few feature requests that
would address some of the things that I've noticed in postgres.

1. Just a minor annoyance, but why must subqueries in FROM clauses
have an alias?  For instance suppose that I have an orders table, and
one of the fields is userid.  The following is unambiguous and is
legal in Oracle:
 SELECT order_count   , count(*) as people FROM (     SELECT count(*) as order_count     FROM orders     GROUP BY
userid  ) GROUP BY order_count
 

It annoys me that it isn't legal in postgres.  (Yes, I know how to fix
the query.  But it still is an annoyance, and it comes up fairly often
in reporting purposes.)

2. Why is 'non-integer constant in GROUP BY' an error?

I find it inconvenient.  For reporting purposes I often have to
dynamically build queries in code.  An easy way to do that is just
interpolate in a set of possible statements which will either be empty
strings or have trailing commas.  But then I need this (possibly
empty) list to have a valid group by statement at the end.  In Oracle
I used to just write it like this:
 SELECT ... GROUP BY $possible_term_1   $possible_term_2   $possible_term_3   'end of possible groupings'

In postgres I either have to use a different strategy to build up
these strings, or else use a more complicated term to finish that off.This works for now:
 case when true then true end

but I don't know whether some future version of postgres might break
my code by banning that as well.

3. How hard would it be to have postgres ignore aliases in group by
clauses?  Per my comments above, I often build complex queries in
code.  I can't easily use the shortcut of referring to the select
column by number because the position is hard to determine.  So my
code has to copy the select terms.  But I can't copy them exactly
because the select terms include lots of "...as foo" clauses that are
not allowed in a group by.  So I have to store very similar terms to
use twice.

It would be nice if I could just make the group by look like the
select, and have the (obviously irrelevant) aliases just be ignored.

4) Items 2 and 3 would both be made irrelevant if postgres did
something that I'd really, really would like.  Which is to assume that
a query without a group by clause, but with an aggregate function in
the select, should have an implicit group by clause where you group by
all non-aggregate functions in the select.

For example
 SELECT foo, count(*) FROM bar

would be processed as:
 SELECT foo, count(*) FROM bar GROUP BY foo

If I write a query with an aggregate function in the select, better
than 95% of the time this is the group by clause that I want.  (This
email has one of the few exceptions.)  In the remaining cases I could
easily add the extra stuff in the group by to the select without
problems.  Therefore if postgres could just insert the obvious group
by clause in, I would never again write the words "group by" when
working with postgres.  And I predict that many other people would do
the same.

But it doesn't.  So when working with postgres, just like every other
database that I've used, I have to constantly type in group by clauses
with entirely redundant information.  (But they're not EXACTLY the
same as the select clauses that they are redundant with...)

Cheers,
Ben


Re: SQL feature requests

From
Josh Berkus
Date:
Ben,

pgsql-sql is probably the appropriate list for future queries of this 
nature.

Note that the below is my personal opinion; each PG developer has their 
own.

> 1. Just a minor annoyance, but why must subqueries in FROM clauses
> have an alias?  For instance suppose that I have an orders table, and
> one of the fields is userid.  The following is unambiguous and is
> legal in Oracle:

I *think* the alias is a requirement of the SQL standard.  Yes/No?

> 2. Why is 'non-integer constant in GROUP BY' an error?

Again, this needs to reference one of the SQL standards if you're 
interested in a change of behavior.  If we're out of standards compliance, 
that's a strong argument.  If we're in compliance, you have a pretty steep 
hurdle to justify new syntax.

> 3. How hard would it be to have postgres ignore aliases in group by
> clauses? 

Unfortunately, I think this is also a SQL compliance issue.  However, I'd 
be more liable to support your arguments for it; it's much more obviously 
useful functionality.

> 4) Items 2 and 3 would both be made irrelevant if postgres did
> something that I'd really, really would like.  Which is to assume that
> a query without a group by clause, but with an aggregate function in
> the select, should have an implicit group by clause where you group by
> all non-aggregate functions in the select.

In addition to SQL compliance issues, we're reluctant to do anything which 
makes implicit assumptions which could easily be wrong in PostgreSQL.  
Such shortcutting all to often leads to runaway queries or wrong data when 
the assumptions are incorrect.  MySQL gives us lots of examples of what 
can happen if you do too many things for convenience and compromise 
correctness.

-- 
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: SQL feature requests

From
Michael Glaesemann
Date:
On Aug 22, 2007, at 18:45 , Ben Tilly wrote:

> 1. Just a minor annoyance, but why must subqueries in FROM clauses
> have an alias?

It's required by the SQL standard, AIUI. I wonder what EnterpriseDB  
does?

> 2. Why is 'non-integer constant in GROUP BY' an error?

>  This works for now:
>
>   case when true then true end
>
> but I don't know whether some future version of postgres might break
> my code by banning that as well.

The PostgreSQL developers generally tries hard to preserve backwards  
compatibility, so I doubt the case expression as you have it would go  
away (though I'm kind of surprised it's allowed). Am I wrong in  
thinking that Oracle would accept the same format PostgreSQL does? In  
that case, couldn't you use whatever method works in PostgreSQL in  
Oracle? I haven't checked the SQL standard, but it seems unlikely  
it'd allow something like

GROUP BY , , , ;

AIUI, Integers are only allowed because the SQL standard explicitly  
allows you to refer to columns by the order they appear in the SELECT  
list. Otherwise the GROUP BY items need to be column names.

Both 1 and 2 seem to me to be places where Oracle is likely deviating  
from the standard. If you're targeting Oracle, then using Oracle- 
specific syntax might be warranted. If you're hoping to target more  
than one possible backend, I'd think it be better to use more  
portable syntax (e.g., SQL-standard syntax) than expecting other  
DBMSs to follow another's deviations. That's not to say PostgreSQL  
does not have non-standard syntax: in places, it does. But it does  
try to hew very closely to the standard.

Again, I wonder what EnterpriseDB does in this case?

> 3. How hard would it be to have postgres ignore aliases in group by
> clauses?  Per my comments above, I often build complex queries in
> code.  I can't easily use the shortcut of referring to the select
> column by number because the position is hard to determine.  So my
> code has to copy the select terms.  But I can't copy them exactly
> because the select terms include lots of "...as foo" clauses that are
> not allowed in a group by.  So I have to store very similar terms to
> use twice.

Perhaps someone else knows what you're referring to here, but I'm  
having a hard time without an example. Here's what I *think* you're  
trying to say:

test=# select * from observation;
observation_id | record_id | score_id
----------------+-----------+----------           3240 |         1 |        1           3239 |         1 |        1
     3238 |         1 |        2           3237 |         1 |        1           2872 |         2 |        1
2869|         2 |        2           2870 |         2 |        1           2871 |         2 |        1           3218 |
       3 |        2           3217 |         3 |        1
 
(10 rows)

test=# select record_id as foo, count(observation_id) from  
observation group by record_id;
foo | count
-----+-------   3 |     2   2 |     4   1 |     4
(3 rows)

test=# select record_id as foo, count(observation_id) from  
observation group by foo;
foo | count
-----+-------   3 |     2   2 |     4   1 |     4
(3 rows)

test=# select record_id as foo, count(observation_id) as bar from  
observation group by foo;
foo | bar
-----+-----   3 |   2   2 |   4   1 |   4
(3 rows)

test=# select record_id as foo, count(observation_id) as bar from  
observation group by record_id;
foo | bar
-----+-----   3 |   2   2 |   4   1 |   4
(3 rows)

test=# select version();                                                                    
version
------------------------------------------------------------------------ 
----------------------------------------------------------------------
PostgreSQL 8.2.4 on powerpc-apple-darwin8.9.0, compiled by GCC  
powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc.  
build 5367)
(1 row)

I'm not getting an error in any permutation that I can think of. What  
am I missing?

> Which is to assume that
> a query without a group by clause, but with an aggregate function in
> the select, should have an implicit group by clause where you group by
> all non-aggregate functions in the select.
>
> For example
>
>   SELECT foo, count(*)
>   FROM bar
>
> would be processed as:
>
>   SELECT foo, count(*)
>   FROM bar
>   GROUP BY foo

It's been discussed before. I don't believe it's been rejected out of  
hand (though you can check the archives), just that no one's gotten  
around to it. (Don't know what the SQL-spec says on this point.)

I'm not trying to dismiss your points, just trying to address them.  
I'm interested to hear what others have to say.

Michael Glaesemann
grzm seespotcode net




Re: SQL feature requests

From
Gregory Stark
Date:
"Ben Tilly" <btilly@gmail.com> writes:

> Hopefully this is the right place for a few feature requests that
> would address some of the things that I've noticed in postgres.
>
> 1. Just a minor annoyance, but why must subqueries in FROM clauses
> have an alias?  For instance suppose that I have an orders table, and
> one of the fields is userid.  The following is unambiguous and is
> legal in Oracle:

Thank you, this is one of my top pet peeves but when I proposed changing it I
was told nobody's complained. Now we have at least one user complaint, any
others out there?

> 2. Why is 'non-integer constant in GROUP BY' an error?

Hm... I was a bit surprised by this warning myself. IIRC there was an
implementation convenience issue.

> 3. How hard would it be to have postgres ignore aliases in group by
> clauses?  

That sounds like a strange idea.

> 4) Items 2 and 3 would both be made irrelevant if postgres did
> something that I'd really, really would like.  Which is to assume that
> a query without a group by clause, but with an aggregate function in
> the select, should have an implicit group by clause where you group by
> all non-aggregate functions in the select.
>
> For example
>
>   SELECT foo, count(*)
>   FROM bar
>
> would be processed as:
>
>   SELECT foo, count(*)
>   FROM bar
>   GROUP BY foo

I agree this would be convenient but it seems too scary to actually go
anywhere. What would you group by in the case of:

SELECT a+b, count(*) FROM bar

Should it group by a,b or a+b ?

Also, this might be a bit shocking for MySQL users who are accustomed to
MySQL's non-standard extension for the same syntax. There it's treated as an
assertion that the columns are equal for all records in a group or at least
that it doesn't matter which such value is returned, effectively equivalent to
our DISTINCT ON feature.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: SQL feature requests

From
"Ben Tilly"
Date:
On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote:
>
> On Aug 22, 2007, at 18:45 , Ben Tilly wrote:
>
> > 1. Just a minor annoyance, but why must subqueries in FROM clauses
> > have an alias?
>
> It's required by the SQL standard, AIUI. I wonder what EnterpriseDB
> does?

I can well believe that the standard says that you must accept
subqueries with aliases.  But does it say you must reject subqueries
without aliases?  I strongly doubt that.

I have no clue what EnterpriseDB does.

> > 2. Why is 'non-integer constant in GROUP BY' an error?
>
> >  This works for now:
> >
> >   case when true then true end
> >
> > but I don't know whether some future version of postgres might break
> > my code by banning that as well.
>
> The PostgreSQL developers generally tries hard to preserve backwards
> compatibility, so I doubt the case expression as you have it would go
> away (though I'm kind of surprised it's allowed). Am I wrong in
> thinking that Oracle would accept the same format PostgreSQL does? In

The reason for my comparing to Oracle is that I used to work at an
Oracle shop.  I now work at a postgres shop.  Portability is not my
issue, just the annoyances that I experienced moving from one to the
other.

As for whether that case expression would go away, that it is allowed
is such an obscure feature that I doubt anyone changing that code
would notice if it was removed.

> that case, couldn't you use whatever method works in PostgreSQL in
> Oracle? I haven't checked the SQL standard, but it seems unlikely
> it'd allow something like
>
> GROUP BY , , , ;

That's not what Oracle accepts  that postgres does not.  What Oracle accepts is:
 ... GROUP BY 'foo';

> AIUI, Integers are only allowed because the SQL standard explicitly
> allows you to refer to columns by the order they appear in the SELECT
> list. Otherwise the GROUP BY items need to be column names.

Need to be?

The SQL-92 standard is clear that you must accept a list of column
names.  It is also clear that a column name must be be of the form
field or table.field.  In no way, shape or form does that allow having
terms like trim(foo.bar) in a group by.

But every reasonable database that I know - including postgres - allows that.

The standard very wisely does not forbid extensions.  Every database
has extensions.  In some cases, such as allowing trim(foo.bar) in a
group by clause, some extensions are so common as to be a standard.
(I don't have a copy of any later standards so I don't know whether
that has since been explicitly allowed.)  Therefore the real question
is how much farther than the standard you go.

Postgres explicitly disallows a constant character expression.  But it
allows the constant case expression that I gave.  It would be nice for
me to not have to remember that very obscure and convoluted case.

> Both 1 and 2 seem to me to be places where Oracle is likely deviating
> from the standard. If you're targeting Oracle, then using Oracle-
> specific syntax might be warranted. If you're hoping to target more
> than one possible backend, I'd think it be better to use more
> portable syntax (e.g., SQL-standard syntax) than expecting other
> DBMSs to follow another's deviations. That's not to say PostgreSQL
> does not have non-standard syntax: in places, it does. But it does
> try to hew very closely to the standard.

The queries that I'm writing are not hoping to target more than one
database at one company.

> Again, I wonder what EnterpriseDB does in this case?

No clue.

> > 3. How hard would it be to have postgres ignore aliases in group by
> > clauses?  Per my comments above, I often build complex queries in
> > code.  I can't easily use the shortcut of referring to the select
> > column by number because the position is hard to determine.  So my
> > code has to copy the select terms.  But I can't copy them exactly
> > because the select terms include lots of "...as foo" clauses that are
> > not allowed in a group by.  So I have to store very similar terms to
> > use twice.
>
> Perhaps someone else knows what you're referring to here, but I'm
> having a hard time without an example. Here's what I *think* you're
> trying to say:

What I'm trying to say is that it would be convenient for me to be
able to write:
 select bar as "baz"   , count(*) as "some count" from foo group by bar as "baz"

That's not allowed right now because as is not allowed in a group by statement.

[...]
> > Which is to assume that
> > a query without a group by clause, but with an aggregate function in
> > the select, should have an implicit group by clause where you group by
> > all non-aggregate functions in the select.
> >
> > For example
> >
> >   SELECT foo, count(*)
> >   FROM bar
> >
> > would be processed as:
> >
> >   SELECT foo, count(*)
> >   FROM bar
> >   GROUP BY foo
>
> It's been discussed before. I don't believe it's been rejected out of
> hand (though you can check the archives), just that no one's gotten
> around to it. (Don't know what the SQL-spec says on this point.)

I don't know what the SQL spec says, but I know (having talked to
other developers) that many people would find it very nice.

> I'm not trying to dismiss your points, just trying to address them.
> I'm interested to hear what others have to say.

Cheers,
Ben


Re: SQL feature requests

From
"Ben Tilly"
Date:
On 8/22/07, Gregory Stark <stark@enterprisedb.com> wrote:
>
> "Ben Tilly" <btilly@gmail.com> writes:
>
> > Hopefully this is the right place for a few feature requests that
> > would address some of the things that I've noticed in postgres.
> >
> > 1. Just a minor annoyance, but why must subqueries in FROM clauses
> > have an alias?  For instance suppose that I have an orders table, and
> > one of the fields is userid.  The following is unambiguous and is
> > legal in Oracle:
>
> Thank you, this is one of my top pet peeves but when I proposed changing it I
> was told nobody's complained. Now we have at least one user complaint, any
> others out there?

Always happy to complain. :-)

> > 2. Why is 'non-integer constant in GROUP BY' an error?
>
> Hm... I was a bit surprised by this warning myself. IIRC there was an
> implementation convenience issue.

If your implementation accepts:
 group by case when true then 'foo' end

how much harder can it be to accept:
 group by 'foo'

?

> > 3. How hard would it be to have postgres ignore aliases in group by
> > clauses?
>
> That sounds like a strange idea.

It is a strange idea, but it makes dynamically building queries
easier.  Right now I'm following a strategy of storing what I'm going
to insert in the select clause in one variable, and the group by
clause in another.  So I need 2 variables for each dynamic field that
I might choose to group by and want to have a custom name for.  With
this change I would only need one variable.

> > 4) Items 2 and 3 would both be made irrelevant if postgres did
> > something that I'd really, really would like.  Which is to assume that
> > a query without a group by clause, but with an aggregate function in
> > the select, should have an implicit group by clause where you group by
> > all non-aggregate functions in the select.
> >
> > For example
> >
> >   SELECT foo, count(*)
> >   FROM bar
> >
> > would be processed as:
> >
> >   SELECT foo, count(*)
> >   FROM bar
> >   GROUP BY foo
>
> I agree this would be convenient but it seems too scary to actually go
> anywhere. What would you group by in the case of:
>
> SELECT a+b, count(*) FROM bar
>
> Should it group by a,b or a+b ?

It should group by a+b.  Which is to say, every field in the select
clause that currently triggers an error because it isn't in the group
by clause.

> Also, this might be a bit shocking for MySQL users who are accustomed to
> MySQL's non-standard extension for the same syntax. There it's treated as an
> assertion that the columns are equal for all records in a group or at least
> that it doesn't matter which such value is returned, effectively equivalent to
> our DISTINCT ON feature.

I don't mind shocking MySQL users. ;-)

But seriously, if that objection is the barrier then I'd be happy to
see it be something that is explicitly turned on in the query.  For
instance:
 select autogroup bar, count(*) from foo

If that was available then I for one would type autogroup a lot more
often than group by.  After all autogroup is about as hard to type,
and I don't have to type the redundant list of fields in the group by.

Cheers,
Ben


Re: SQL feature requests

From
"Florian G. Pflug"
Date:
Ben Tilly wrote:
> On 8/22/07, Gregory Stark <stark@enterprisedb.com> wrote:
>> "Ben Tilly" <btilly@gmail.com> writes:
>>> 2. Why is 'non-integer constant in GROUP BY' an error?
>> Hm... I was a bit surprised by this warning myself. IIRC there was an
>> implementation convenience issue.
> 
> If your implementation accepts:
> 
>   group by case when true then 'foo' end
> 
> how much harder can it be to accept:
> 
>   group by 'foo'

This is not about hardness of the implementation, but rather about
non-confusing behaviour I think.

AFAIK, "group by 1" means "group by the first selected column", not
"group all rows together". But "group by 'foo'" would carry the second
meaning - "group all rows together". This is so totally counter-intuitive,
that it's not even funny...

"group by case when true then 'foo' end" looks different enough compared to
"group by 1" to make this less of a footgun.

Seems that the "group by <integer>" syntax predates the appearance of
aliases in the sql standard...

greetings, Florian flug


Re: SQL feature requests

From
Michael Glaesemann
Date:
On Aug 22, 2007, at 20:36 , Ben Tilly wrote:

> On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote:
>>
>> On Aug 22, 2007, at 18:45 , Ben Tilly wrote:
>>
>>> 1. Just a minor annoyance, but why must subqueries in FROM clauses
>>> have an alias?
>>
>> It's required by the SQL standard, AIUI. I wonder what EnterpriseDB
>> does?
>
> I can well believe that the standard says that you must accept
> subqueries with aliases.  But does it say you must reject subqueries
> without aliases?  I strongly doubt that.

If I'm reading my draft copy of the SQL:2003 spec right (and there's
a good chance that I'm not, as it's not the easiest document for me
to parse), aliases *are* required.
From 5WD-02-Foundation-2003-09 (the fifth working draft)

> 7.5 <from clause>
>
> <from clause> ::= FROM <table reference list>
> <table reference list> ::=
>     <table reference> [ { <comma> <table reference> }... ]


> 7.6 <table reference>
>
> <table reference> ::=
>     <table factor>
>   | <joined table>


> <table factor> ::= <table primary> [ <sample clause> ]


> <table primary> ::=
>     <table or query name> [ [ AS ] <correlation name>
>     [ <left paren> <derived column list><right paren> ] ]
>   | <derived table> [ AS ] <correlation name>
>     [ <left paren> <derived column list><right paren> ]
>   | <lateral derived table> [ AS ] <correlation name>
>     [ <left paren> <derived column list><right paren> ]
>   | <collection derived table> [ AS ] <correlation name>
>     [ <left paren> <derived column list><right paren> ]
>   | <table function derived table> [ AS ] <correlation name>
>     [ <left paren> <derived column list><right paren> ]
>   | <only spec> [ [ AS ] <correlation name>
>     [ <left paren> <derived column list><right paren> ] ]
>   | <left paren> <joined table> <right paren>

> <derived table> ::= <table subquery>

<correlation name> I believe is what we commonly refer to as an
alias. I think <table or query name> is a table or view name, and
<derived table> is a subquery. <correlation name> is optional for
<table or query name> but not for <derived table>. The fact that the
standard clearly makes it optional in some cases and required in
others is pretty clear that <derived table> without a <correlation
name> is to be rejected, don't you think?

> I have no clue what EnterpriseDB does.

In case it wasn't clear, the reason I bring it up is that
EnterpriseDB, while working from a PostgreSQL base, strives for
Oracle compatibility.

>> that case, couldn't you use whatever method works in PostgreSQL in
>> Oracle? I haven't checked the SQL standard, but it seems unlikely
>> it'd allow something like
>>
>> GROUP BY , , , ;
>
> That's not what Oracle accepts  that postgres does not.  What
> Oracle accepts is:
>
>   ...
>   GROUP BY 'foo';

Thanks for clarifying. If it wasn't obvious by now, you know I'm not
familiar with Oracle :)

>> AIUI, Integers are only allowed because the SQL standard explicitly
>> allows you to refer to columns by the order they appear in the SELECT
>> list. Otherwise the GROUP BY items need to be column names.
>
> Need to be?
>
> The SQL-92 standard is clear that you must accept a list of column
> names.  It is also clear that a column name must be be of the form
> field or table.field.

The 2003 draft (same as above) seems to agree with the SQL92 standard:

> 7.9 <group by clause>

> Format
> <group by clause> ::=
>     GROUPBY [ <set quantifier> ] <grouping element list>
> <grouping element list> ::=
>     <grouping element> [ { <comma> <grouping element> }... ]
> <grouping element> ::=
>     <ordinary grouping set>
>   | <rollup list>
>   | <cube list>
>   | <grouping sets specification>
>   | <empty grouping set>
> <ordinary grouping set> ::=
>     <grouping column reference>
>   | <left paren> <grouping column reference list><right paren>
> <grouping column reference> ::=
>     <column reference> [ <collate clause> ]
> <grouping column reference list> ::=
>     <grouping column reference> [ { <comma><grouping column
> reference> }... ]
> <rollup list> ::=
>     ROLLUP<left paren> <ordinary grouping set list><right paren>
> <ordinary grouping set list> ::=
>     <ordinary grouping set> [ { <comma> <ordinary grouping set> }... ]
> <cube list> ::=
>     CUBE<left paren> <ordinary grouping set list><right paren>
> <grouping sets specification> ::=
>     GROUPINGSETS <left paren> <grouping set list><right paren>
> <grouping set list> ::=
>     <grouping set> [ { <comma> <grouping set> }... ]
> <grouping set> ::=
>     <ordinary grouping set>
>   | <rollup list>
>   | <cube list>
>   | <grouping sets specification>
>   | <empty grouping set>


> <empty grouping set> ::= <left paren><right paren>

> 6.7 <column reference>

> Format
> <column reference> ::=
>     <basic identifier chain>
>   | MODULE<period> <qualified identifier><period> <column name>

There'd have to be a pretty strong reason to extend this, more than
just a convenience, I should think.

>   In no way, shape or form does that allow having
> terms like trim(foo.bar) in a group by.
>
> But every reasonable database that I know - including postgres -
> allows that.

Can you give an example of something like this working in PostgreSQL?
I get an error when I try to use a text value in a GROUP BY clause.
(Or are you referring specifically to the CASE expression corner case?)

test=# select record_id as foo, count(observation_id) as bar from
observation group by trim(observation.record_id::text);
ERROR:  column "observation.record_id" must appear in the GROUP BY
clause or be used in an aggregate function
test=# select record_id as foo, count(observation_id) as bar from
observation group by trim(observation.record_id);
ERROR:  column "observation.record_id" must appear in the GROUP BY
clause or be used in an aggregate function

> Postgres explicitly disallows a constant character expression.  But it
> allows the constant case expression that I gave.  It would be nice for
> me to not have to remember that very obscure and convoluted case.

I agree, and would move that it should be disallowed if there isn't a
reason for it to be maintained, for exactly the reason you give:
there shouldn't be such convoluted, and obscure corner case.

test=# select record_id as foo, count(observation_id) as bar from
observation group by record_id, true;
ERROR:  non-integer constant in GROUP BY
test=# select record_id as foo, count(observation_id) as bar from
observation group by record_id, case when true then true end;

> What I'm trying to say is that it would be convenient for me to be
> able to write:
>
>   select bar as "baz"
>     , count(*) as "some count"
>   from foo
>   group by bar as "baz"
>
> That's not allowed right now because as is not allowed in a group
> by statement.

Ah, I see. Thanks for clarifying.

>>> Which is to assume that
>>> a query without a group by clause, but with an aggregate function in
>>> the select, should have an implicit group by clause where you
>>> group by
>>> all non-aggregate functions in the select.
>>>
>>> For example
>>>
>>>   SELECT foo, count(*)
>>>   FROM bar
>>>
>>> would be processed as:
>>>
>>>   SELECT foo, count(*)
>>>   FROM bar
>>>   GROUP BY foo
>>
>> It's been discussed before. I don't believe it's been rejected out of
>> hand (though you can check the archives), just that no one's gotten
>> around to it. (Don't know what the SQL-spec says on this point.)
>
> I don't know what the SQL spec says, but I know (having talked to
> other developers) that many people would find it very nice.

Since I had the spec open, I tried to look at this as well, though I
must admit I found it very tough going.

I think this is the key section:

> 10.9 <aggregate function>
>
> ...
>
> Syntax Rules
>
> ...
>
> 4) The argument source of an <aggregate function> is
> Case:
> a) If AF is immediately contained in a <set function
> specification>, then a table or group of a grouped table as
> specified in Subclause 7.10, “<having clause>”, and Subclause 7.12,
> “<query specification>”.
>
> b) Otherwise, the collection of rows in the current row's window
> frame defined by the window structure  descriptor identified by the
> <window function> that simply contains AF, as defined in Subclause
> 7.11, “<window clause>”.

Now the <query specification> stuff is pretty dense, and the <window
clause> is currently impenetrable for me, so I just looked at the
<having clause> and <query specification> sections referenced above.

> 7.10 <having clause>
>
> ...
>
> Syntax Rules
>
> 1) Let HC be the <having clause>. Let TE be the <table expression>
> that immediately contains HC. If TE does not immediately contain a
> <group by clause>, then “GROUPBY ()” is implicit. Let T be the
> descriptor of the table defined by the <group by clause>GBC
> immediately contained in TE and let R be the result of GBC.

So it looks like if there's no explicit GROUP BY, we're to assume
GROUP BY (), at least in the case of HAVING.

> 7.12 <query specification>
>
> ...
>
> Syntax Rules
>
> ...
>
> 14) If <table expression> does not immediately contain a <group by
> clause> and <table expression> is simply contained in a <query
> expression> that is the aggregation query of some <set function
> specification>, then GROUP BY () is implicit.

Again, assume GROUP BY () if no explicit GROUP BY with a <set
function specification>, which I understand to be some form of
aggregation.

> 15) If T is a grouped table, then let G be the set of grouping
> columns of T. In each <value expression> contained in <select
> list> , each column reference that references a column of T shall
> reference some column C that is functionally dependent on G or
> shall be contained in an aggregated argument of a <set function
> specification> whose aggregation query is QS.

And a definition of grouped table is found:

> 4.14.2 Types of tables
>
> ...
>
> A grouped table is a set of groups derived during the evaluation of
> a <group by clause>. A group G is a collection of rows in which,
> for every grouping column GC, if the value of GC in some row is not
> distinct from GV, then the value of GC in every row is GV;
> moreover, if R1 is a row in group G1 of grouped table GT and R2 is
> a row in GT such that for every grouping column GC the value of GC
> in R1 is not distinct from the value of GC in R2, then R2 is in G1.
> Every row in GT is in exactly one group. A group may be considered
> as a table. Set functions operate on groups.

So we're already talking about an explicit GROUP BY clause, and it
doesn't seem to leave much wiggle room grouping columns that aren't
explicitly listed. I'll readily admit I very easily could be reading
this wrong or have missed a section of the spec that discusses this
in more detail. (And that goes for any of my interpretations of the
spec!)

One place you and I differ is that I think you'd be in favor of many
more extensions in cases where the SQL spec doesn't explicitly say
"Thou shalt not do X". I'm a bit more conservative here: in my
opinion, if the SQL speaks to a particular issue (for example, a
WHERE clause), then you should pretty much keep to the letter of what
it says. Extensions are more for areas where the standard is silent.
For example, full text search will be added to PostgreSQL in 8.3.
AFAIK, the SQL spec doesn't talk about full text search at all, so
this is an area where extensions are a natural fit. Others, such as
PostgreSQL's DISTINCT ON, is a much more difficult issue, and I'd
suspect if it were proposed as a new feature now it would be
rejected. However, it's been a part of PostgreSQL for a long time, so
for historical reasons it remains. Likewise, I doubt "text" as a
synonym for "varchar" would be accepted.

Anyway, enough excursions into SpecLand for me. I'd welcome anyone
else taking a crack at it.

Michael Glaesemann
grzm seespotcode net




Re: SQL feature requests

From
Michael Glaesemann
Date:
On Aug 22, 2007, at 20:49 , Ben Tilly wrote:

> If your implementation accepts:
>
>   group by case when true then 'foo' end

What would that mean? Regardless of whether or not it's accepted, it  
should have *some* meaning.

It's not equivalent to GROUP BY "foo"

test=# select record_id as foo    , count(observation_id) as bar    from observation    group by case when true
        then 'foo'             end;
 
ERROR:  column "observation.record_id" must appear in the GROUP BY  
clause or be used in an aggregate function
test=# select record_id       , count(observation_id) as bar       from observation       group by case when true
             then 'record_id'                end;
 
ERROR:  column "observation.record_id" must appear in the GROUP BY  
clause or be used in an aggregate function

*This* seems like a bug:
test=# select record_id       , count(observation_id) as bar       from observation       group by record_id
,case when true                  then 'foo'             end;
 
record_id | bar
-----------+-----         1 |   4         2 |   4         3 |   2
(3 rows)

And for good measure:

test=# select record_id       , count(observation_id) as bar       from observation       group by case when true
             then record_id                end;
 
ERROR:  column "observation.record_id" must appear in the GROUP BY  
clause or be used in an aggregate function

Michael Glaesemann
grzm seespotcode net




Re: SQL feature requests

From
Tom Lane
Date:
Michael Glaesemann <grzm@seespotcode.net> writes:
> On Aug 22, 2007, at 20:36 , Ben Tilly wrote:
>> I can well believe that the standard says that you must accept
>> subqueries with aliases.  But does it say you must reject subqueries
>> without aliases?  I strongly doubt that.

> If I'm reading my draft copy of the SQL:2003 spec right (and there's
> a good chance that I'm not, as it's not the easiest document for me
> to parse), aliases *are* required.

Yes, they are.  It's been like that since SQL92 if not before (I've
never seen a copy of SQL89).  I was meaning to go look up whether
SQL2003 had gotten any laxer, but thanks for saving me the trouble.

The question for us is whether we should exceed the spec by allowing
something it does not.  The fundamental reason why not is that we would
have to invent an alias for the unnamed sub-select.  That has several
drawbacks: more code to try to pick an unused alias, more bug reports
from people whose queries conflicted with whatever we picked.  In return
for which, we're providing absolutely zip real increase in
functionality, and we're encouraging people to write unportable
SQL-spec-incompatible code.  There is a future-proofing argument too:
if the committee ever decides this is a good idea (which may be unlikely
after 15 years, but you never know), they could well define the default
alias in a different way than we had done, and then we are stuck behind
the eight ball.

One could also assume that if the SQL committee has not made this
oh-so-obvious improvement, there is some darn good reason why not.
I'm not privy to their minutes so I don't know what arguments were
made there, but I think we can eliminate "they never thought of it"
as an explanation.  Three generations of the spec have been written
specifically to exclude this.

In short, lots of downsides here, and not a whole lot of upside.
        regards, tom lane


Re: SQL feature requests

From
"Chuck McDevitt"
Date:
> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Michael Glaesemann
> Sent: Wednesday, August 22, 2007 5:17 PM
> To: Ben Tilly
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] SQL feature requests
>
>
> On Aug 22, 2007, at 18:45 , Ben Tilly wrote:
>
> > 1. Just a minor annoyance, but why must subqueries in FROM clauses
> > have an alias?
>
> It's required by the SQL standard, AIUI. I wonder what EnterpriseDB
> does?
>
> > 2. Why is 'non-integer constant in GROUP BY' an error?
>
> >  This works for now:
> >
> >   case when true then true end
> >
> > but I don't know whether some future version of postgres might break
> > my code by banning that as well.

1.  The SQL standard requires an alias for the subquery, but many
real-world SQL implementations relax this requirement in the case where
it is unambiguous.  The standard doesn't say you have to reject
statements without the alias, it says only that you must accept the ones
that do.  PostgreSQL has many things in its SQL where we accept things
that the standard doesn't require, and I don't see a good argument why
it shouldn't allow this.

2.  The SQL standard only allows column names in group by lists, NOT
expressions.  PostgreSQL extends the standard by allowing some, but not
all, expressions in the group by list (somewhat inconsistent, in my
view).  Expressions in the group by list is actually a quite common
extension.  But PostgreSQL also extends the standard by attaching
special meaning to simple integer constants in the group by list, and
treating them as column numbers from the select list.  As far as I
remember, the SQL standard only allows that on ORDER BY clauses.
Anyway, it seems reasonable to allow expressions in the group by, such
as:
Select a*10, sum(b) from x group by a*10;

But once you allow that, it seems like you should allow any expression,
even degenerate ones such as select 'foo',sum(b) from x group by 'foo';




Re: SQL feature requests

From
"Chuck McDevitt"
Date:

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Chuck McDevitt
> Sent: Wednesday, August 22, 2007 11:53 PM
> To: Michael Glaesemann; Ben Tilly
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] SQL feature requests
>
> > -----Original Message-----
> > From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> > owner@postgresql.org] On Behalf Of Michael Glaesemann
> > Sent: Wednesday, August 22, 2007 5:17 PM
> > To: Ben Tilly
> > Cc: pgsql-hackers@postgresql.org
> > Subject: Re: [HACKERS] SQL feature requests
> >
> >
> > On Aug 22, 2007, at 18:45 , Ben Tilly wrote:
> >
> > > 1. Just a minor annoyance, but why must subqueries in FROM clauses
> > > have an alias?
> >
> > It's required by the SQL standard, AIUI. I wonder what EnterpriseDB
> > does?
> >
> > > 2. Why is 'non-integer constant in GROUP BY' an error?
> >
> > >  This works for now:
> > >
> > >   case when true then true end
> > >
> > > but I don't know whether some future version of postgres might
> break
> > > my code by banning that as well.
>
> 1.  The SQL standard requires an alias for the subquery, but many
> real-world SQL implementations relax this requirement in the case
where
> it is unambiguous.  The standard doesn't say you have to reject
> statements without the alias, it says only that you must accept the
> ones
> that do.  PostgreSQL has many things in its SQL where we accept things
> that the standard doesn't require, and I don't see a good argument why
> it shouldn't allow this.
>
> 2.  The SQL standard only allows column names in group by lists, NOT
> expressions.  PostgreSQL extends the standard by allowing some, but
not
> all, expressions in the group by list (somewhat inconsistent, in my
> view).  Expressions in the group by list is actually a quite common
> extension.  But PostgreSQL also extends the standard by attaching
> special meaning to simple integer constants in the group by list, and
> treating them as column numbers from the select list.  As far as I
> remember, the SQL standard only allows that on ORDER BY clauses.
> Anyway, it seems reasonable to allow expressions in the group by, such
> as:
>
>     Select a*10, sum(b) from x group by a*10;
>
> But once you allow that, it seems like you should allow any
expression,
> even degenerate ones such as
>     select 'foo',sum(b) from x group by 'foo';
>

Just wanted to point out that the group by thing is really just
syntactic sugar.
You can always get a SQL standard compliant system to accept the
constants this way:

Select z,sum(b) from (Select 'foo',b from x) a1 (z,b)
Group by z;

This will work (although with PostgreSQL, you might have to cast the
string constant to text, because it is "unknown" datatype).




Re: SQL feature requests

From
"Zeugswetter Andreas ADI SD"
Date:
> > If your implementation accepts:
> >
> >   group by case when true then 'foo' end

I think he meant: group by case when true then "foo" end

>
> What would that mean? Regardless of whether or not it's accepted, it
> should have *some* meaning.
>
> It's not equivalent to GROUP BY "foo"

Yea, but only because 'foo' is an identifier, and not a string constant.

> test=# select record_id as foo
>      , count(observation_id) as bar
>      from observation
>      group by case when true
>                    then 'foo'
>               end;
> ERROR:  column "observation.record_id" must appear in the GROUP BY
> clause or be used in an aggregate function

I think your example would be easier to understand if you removed the
quotes.
We don't detect the correctness of the above query. You can hardly say
that
this is a feature, but I am not inclined to see it as a troublesome bug
eighter.

Andreas


Re: SQL feature requests

From
"Zeugswetter Andreas ADI SD"
Date:
> > how much harder can it be to accept:
> >
> >   group by 'foo'

Presumably you meant group by "foo".

Imho pg should accept group by "foo". It could be part of a constant
removal, that also takes burden off the sort.

e.g. in "select x, count(*) from bar where x=5 group by x", x could be
removed since it is constant.

> This is not about hardness of the implementation, but rather about
> non-confusing behaviour I think.
>
> AFAIK, "group by 1" means "group by the first selected column", not
> "group all rows together". But "group by 'foo'" would carry the second
> meaning - "group all rows together".

Yes. I don't see the issue. 1 is imho sufficiently different even from
"1".
Pg is not alone in allowing column number in group by.

Andreas


Re: SQL feature requests

From
Gregory Stark
Date:
"Zeugswetter Andreas ADI SD" <Andreas.Zeugswetter@s-itsolutions.at> writes:

>> > how much harder can it be to accept:
>> > 
>> >   group by 'foo'
>
> Presumably you meant group by "foo".

No that's the whole point. He meant the constant string 'foo' not the column
identifier "foo" which certainly should work now.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: SQL feature requests

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> In short, lots of downsides here, and not a whole lot of upside.

I highly doubt the spec would ever conflict with allowing the user to elide
the aliases given that Oracle (and others?) have always allowed this. Moreover
if it's been 15 years without them adding it surely that argues we can be
pretty sure they won't add them?

This seems like a particularly petty case compared to a lot of other
extensions we do allow. Surely allowing arbitrary expressions in GROUP BY is
far more likely to conflict in the future given how it constrains our grammar.
And in theory that provides no added functionality over aside from programmer
convenience as well. There are tons of extensions to the spec in the Postgres
grammar. This would be one of the simplest safest ones.

The upside is the convenience which after all is the same upside as most of
our spec grammar extensions. Many many programmers are accustomed to entering
ad-hoc queries of this form and forcing them to enter an alias for no purpose
is just silly pedanticism from their point of view. The portability of ad-hoc
queries is meaningless and if you don't refer to the alias in the query then
it's truly pointless.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: SQL feature requests

From
Alvaro Herrera
Date:
Gregory Stark escribió:

> The upside is the convenience which after all is the same upside as most of
> our spec grammar extensions. Many many programmers are accustomed to entering
> ad-hoc queries of this form and forcing them to enter an alias for no purpose
> is just silly pedanticism from their point of view. The portability of ad-hoc
> queries is meaningless and if you don't refer to the alias in the query then
> it's truly pointless.

So there's the compromise: allow not specifying an alias only if it's
not used in the rest of the query at all, so the subquery would be
effectively anonymous.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: SQL feature requests

From
"Ben Tilly"
Date:
On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote:
>
> On Aug 22, 2007, at 20:49 , Ben Tilly wrote:
>
> > If your implementation accepts:
> >
> >   group by case when true then 'foo' end
>
> What would that mean? Regardless of whether or not it's accepted, it
> should have *some* meaning.

To my eyes it has a very clear meaning, we're grouping on an
expression that happens to be a constant.  Which happens to be the
same for all rows.  Which is a spectacularly useless thing to actually
do, but the ability to do it happens to be convenient when I'm looking
for something to terminate a series of commas in a dynamically built
query.

> It's not equivalent to GROUP BY "foo"

I wouldn't want it to be.  Strings and identifiers are very different things.

[...]
> *This* seems like a bug:
> test=# select record_id
>         , count(observation_id) as bar
>         from observation
>         group by record_id
>             , case when true
>                    then 'foo'
>               end;
> record_id | bar
> -----------+-----
>           1 |   4
>           2 |   4
>           3 |   2
> (3 rows)

Why does it seem like a bug to you?

Turn it around, and tell me in what way is its behaviour surprising to
someone who knows SQL.  You asked to group on something that is the
same for all rows.  That group by condition did nothing.  (Except
rendered the syntax valid when it might not have been.)  As I would
expect.

Furthermore ask yourself whether anyone who wrote that would likely
have written it by accident.

Cheers,
Ben


Re: SQL feature requests

From
Tom Lane
Date:
"Ben Tilly" <btilly@gmail.com> writes:
> ... But then I need this (possibly
> empty) list to have a valid group by statement at the end.  In Oracle
> I used to just write it like this:

>   SELECT ...
>   GROUP BY $possible_term_1
>     $possible_term_2
>     $possible_term_3
>     'end of possible groupings'

FWIW, that will work if you write it as
'end of possible groupings'::text

I'm disinclined to accept it as-is because (a) it's highly likely to be
a mistake, and (b) there's no principled way to assign a datatype to the
expression, if we are to interpret it as an expression.

Basically anything but an unadorned constant will work there.
        regards, tom lane


Re: SQL feature requests

From
Gregory Stark
Date:
"Alvaro Herrera" <alvherre@commandprompt.com> writes:

> Gregory Stark escribió:
>
>> The upside is the convenience which after all is the same upside as most of
>> our spec grammar extensions. Many many programmers are accustomed to entering
>> ad-hoc queries of this form and forcing them to enter an alias for no purpose
>> is just silly pedanticism from their point of view. The portability of ad-hoc
>> queries is meaningless and if you don't refer to the alias in the query then
>> it's truly pointless.
>
> So there's the compromise: allow not specifying an alias only if it's
> not used in the rest of the query at all, so the subquery would be
> effectively anonymous.

If what's not used in the rest of the query? If you haven't specified the
alias what would you use? Surely even if we did generate an alias name nobody
would think the generated name was guaranteed to be stable and reference it?

I think the compromise is to instead of generating aliases at all just use an
alias like "*Anonymous Subquery*" and add a boolean flag indicating that that
range table is anonymous and not a valid target for references. I started
doing that a while back but got distracted (and discouraged since it seemed
not to have widespread support).

IMHO even generating non-anonymous aliases like "*Anonymous Subquery*1" would
be fine but I'm happy to have a flag hiding them too.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: SQL feature requests

From
Michael Glaesemann
Date:
On Aug 23, 2007, at 10:47 , Ben Tilly wrote:

> On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote:
>>
>> *This* seems like a bug:
>> test=# select record_id
>>         , count(observation_id) as bar
>>         from observation
>>         group by record_id
>>             , case when true
>>                    then 'foo'
>>               end;
>> record_id | bar
>> -----------+-----
>>           1 |   4
>>           2 |   4
>>           3 |   2
>> (3 rows)
>
> Why does it seem like a bug to you?
>
> Turn it around, and tell me in what way is its behaviour surprising to
> someone who knows SQL.  You asked to group on something that is the
> same for all rows.  That group by condition did nothing.  (Except
> rendered the syntax valid when it might not have been.)  As I would
> expect.

Considering that I expect the GROUP BY clause to include only column
references (or other groupings of column references), not
expressions. Whether or not the value is the same, it surprises me
that something other than a column reference is accepted at all. I
hadn't realized this behavior was accepted in PostgreSQL, but I learn
something new every day.

My non-rigorous way of thinking about GROUP BY is that it groups this
listed columns when the values of the listed columns are the same. An
expression that evaluates to anything other than a column name
doesn't provide any information about which column to consider
grouped, and expressions don't evaluate to column names, or
identifiers in general. If I understand you correctly, a GROUP BY
item that isn't a column name would be a value that's applied to all
columns, and the actual value is irrelevant—different values don't
change the result.

So the only purpose it would serve would be to prevent a trailing
comma from raising a syntax error: you'd still need to explicitly
list the other columns (unless the implementation behavior is changed
to extend the spec there as well). What this does is allow you to use
something like this (psuedocode):

group_columns = [ 'foo', 'bar', 'baz' ]

group_column_list = ''
for col in group_columns { group_column_list += col + ',' } #
group_column_list = "foo,bar,baz,"

group_by_clause = "GROUP BY $group_column_list CASE WHEN TRUE THEN
'quux' END"

rather than

group_column_list = join group_columns, ',' # group_column_list =
"foo,bar,baz"
group_by_clause = "GROUP BY $group_column_list"

I still feel I'm missing something. If that's it, it seems like
something easy enough to handle in middleware. Sorry if it appears
I'm being dense. I've definitely learned things in this thread.

> Furthermore ask yourself whether anyone who wrote that would likely
> have written it by accident.

I don't see what that has to do with anything. There are plenty of
things I can write on purpose that would be nonsense. You might even
consider my posts as prime examples :)

Michael Glaesemann
grzm seespotcode net




Re: SQL feature requests

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> I highly doubt the spec would ever conflict with allowing the user to elide
> the aliases given that Oracle (and others?) have always allowed this. Moreover
> if it's been 15 years without them adding it surely that argues we can be
> pretty sure they won't add them?

The $64 question in my mind is exactly why hasn't the spec allowed this?
It's clear that they have gone out of their way to not allow it, and
I think it's unwise to say "oh let's do it" without understanding why not.

> This seems like a particularly petty case compared to a lot of other
> extensions we do allow.

That's exactly the problem.  Most of our other extensions are justified
by some significant capability gain.  This isn't --- it provides zero
new functionality, and the "convenience" factor amounts to the saving of
one keystroke (ok, maybe two if you insist on a space before the alias).
Pretty weak argument...
        regards, tom lane


Re: SQL feature requests

From
Alvaro Herrera
Date:
Gregory Stark escribió:
> "Alvaro Herrera" <alvherre@commandprompt.com> writes:
> 
> > Gregory Stark escribió:
> >
> >> The upside is the convenience which after all is the same upside as most of
> >> our spec grammar extensions. Many many programmers are accustomed to entering
> >> ad-hoc queries of this form and forcing them to enter an alias for no purpose
> >> is just silly pedanticism from their point of view. The portability of ad-hoc
> >> queries is meaningless and if you don't refer to the alias in the query then
> >> it's truly pointless.
> >
> > So there's the compromise: allow not specifying an alias only if it's
> > not used in the rest of the query at all, so the subquery would be
> > effectively anonymous.
> 
> If what's not used in the rest of the query?

The alias, of course.  If you reread Tom's argument, his problem is that
the alias we choose may collide with what an hypotetical future standard
may define, so the users who start to depend on the names with choose
automatically would be screwed.

> I think the compromise is to instead of generating aliases at all just use an
> alias like "*Anonymous Subquery*" and add a boolean flag indicating that that
> range table is anonymous and not a valid target for references. I started
> doing that a while back but got distracted (and discouraged since it seemed
> not to have widespread support).

Hey, you are describing an implementation of my suggestion.

-- 
Alvaro Herrera                        http://www.advogato.org/person/alvherre
A male gynecologist is like an auto mechanic who never owned a car.
(Carrie Snow)


Re: SQL feature requests

From
"Ben Tilly"
Date:
On 8/23/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Ben Tilly" <btilly@gmail.com> writes:
> > ... But then I need this (possibly
> > empty) list to have a valid group by statement at the end.  In Oracle
> > I used to just write it like this:
>
> >   SELECT ...
> >   GROUP BY $possible_term_1
> >     $possible_term_2
> >     $possible_term_3
> >     'end of possible groupings'
>
> FWIW, that will work if you write it as
>
>         'end of possible groupings'::text

Ah, that is the solution that I'll move to.

> I'm disinclined to accept it as-is because (a) it's highly likely to be
> a mistake, and (b) there's no principled way to assign a datatype to the
> expression, if we are to interpret it as an expression.
>
> Basically anything but an unadorned constant will work there.

As long as that behaviour is guaranteed, I'm OK on this one.

Ben


Re: SQL feature requests

From
"Chuck McDevitt"
Date:
>
> > This seems like a particularly petty case compared to a lot of other
> > extensions we do allow.
>
> That's exactly the problem.  Most of our other extensions are
justified
> by some significant capability gain.  This isn't --- it provides zero
> new functionality, and the "convenience" factor amounts to the saving
> of
> one keystroke (ok, maybe two if you insist on a space before the
> alias).
> Pretty weak argument...
>
>             regards, tom lane
>

Tom, it isn't just a case of "convenience".  When we are trying to
convert users from another database (say Oracle for example) to
PostgeSQL, one of the big stumbling blocks that slows down the work is
all the little query changes that people have to make (people who might
not have written the query as well), and it can require the review of
hundreds or thousands of SQL scripts and applications.  The harder it
is, the more reluctant they are to convert.

Sometimes supporting "de-facto" standards as well as official standards
makes sense.




Re: SQL feature requests

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> This seems like a particularly petty case compared to a lot of other
>> extensions we do allow.
>
> That's exactly the problem.  Most of our other extensions are justified
> by some significant capability gain.  This isn't --- it provides zero
> new functionality, and the "convenience" factor amounts to the saving of
> one keystroke (ok, maybe two if you insist on a space before the alias).
> Pretty weak argument...

I think we're talking about two different things. I'm referring to this:

postgres=# select * from (select * from test);
ERROR:  subquery in FROM must have an alias
HINT:  For example, FROM (SELECT ...) [AS] foo.

I get this about once a day because I frequently type this ad-hoc query and
always forget to put the alias in the first time:

postgres=# select count(*),n from (select count(*) as n from test group by i) group by n;
ERROR:  subquery in FROM must have an alias
HINT:  For example, FROM (SELECT ...) [AS] foo.


Not to say that four is a lot larger than one but if we're talking about two
different things then we're not likely to reach any kind of agreement...

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: SQL feature requests

From
"Joshua D. Drake"
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Chuck McDevitt wrote:

> Tom, it isn't just a case of "convenience".  When we are trying to
> convert users from another database (say Oracle for example) to
> PostgeSQL, one of the big stumbling blocks that slows down the work is
> all the little query changes that people have to make (people who might
> not have written the query as well), and it can require the review of
> hundreds or thousands of SQL scripts and applications.  The harder it
> is, the more reluctant they are to convert.
> 
> Sometimes supporting "de-facto" standards as well as official standards
> makes sense.

That is not the goal of the project.

Sincerely,

Joshua D. Drake

> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
> 
>                http://archives.postgresql.org
> 


- --
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/        UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGzcMGATb/zqfZUUQRAugDAJsF4WZXNRWVFzaeae7WfvGxxkt6eACeOkpL
3BpeYUSL9FNxpIzfioFxxbw=
=MVNX
-----END PGP SIGNATURE-----


Re: SQL feature requests

From
"Ben Tilly"
Date:
On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote:
>
> On Aug 22, 2007, at 20:36 , Ben Tilly wrote:
[...]
> > I can well believe that the standard says that you must accept
> > subqueries with aliases.  But does it say you must reject subqueries
> > without aliases?  I strongly doubt that.
>
> If I'm reading my draft copy of the SQL:2003 spec right (and there's
> a good chance that I'm not, as it's not the easiest document for me
> to parse), aliases *are* required.

Again, it lays out very carefully the syntax that must be valid.  It
does not say that extensions to that syntax are not valid.  Every
database implements such extensions.

[...]
> > I have no clue what EnterpriseDB does.
>
> In case it wasn't clear, the reason I bring it up is that
> EnterpriseDB, while working from a PostgreSQL base, strives for
> Oracle compatibility.

I got the reference.  But I don't know what EnterpriseDB does - I've
never used it.

[...]
> >> AIUI, Integers are only allowed because the SQL standard explicitly
> >> allows you to refer to columns by the order they appear in the SELECT
> >> list. Otherwise the GROUP BY items need to be column names.
> >
> > Need to be?
> >
> > The SQL-92 standard is clear that you must accept a list of column
> > names.  It is also clear that a column name must be be of the form
> > field or table.field.
>
> The 2003 draft (same as above) seems to agree with the SQL92 standard:
>
[ large snippet of the draft elided ]
>
> There'd have to be a pretty strong reason to extend this, more than
> just a convenience, I should think.

It is already extended in postgres.  For pretty good reasons.

> >   In no way, shape or form does that allow having
> > terms like trim(foo.bar) in a group by.
> >
> > But every reasonable database that I know - including postgres -
> > allows that.
>
> Can you give an example of something like this working in PostgreSQL?
> I get an error when I try to use a text value in a GROUP BY clause.
> (Or are you referring specifically to the CASE expression corner case?)

This works in every reasonable database that I have tried it in:
 select trim(foo.bar), count(*) from foo group by trim(foo.bar)

And yes, I have tried it in postgres.

[...]
> > Postgres explicitly disallows a constant character expression.  But it
> > allows the constant case expression that I gave.  It would be nice for
> > me to not have to remember that very obscure and convoluted case.
>
> I agree, and would move that it should be disallowed if there isn't a
> reason for it to be maintained, for exactly the reason you give:
> there shouldn't be such convoluted, and obscure corner case.

There is an excellent reason to generally allow complex expressions in
group by statements, and that reason is that many useful and
reasonable queries won't work if you don't.  Such as the one I gave
above.

[...]
> > I don't know what the SQL spec says, but I know (having talked to
> > other developers) that many people would find it very nice.
>
> Since I had the spec open, I tried to look at this as well, though I
> must admit I found it very tough going.
>
> I think this is the key section:
>
> > 10.9 <aggregate function>
> >
> > ...
> >
> > Syntax Rules
> >
> > ...
> >
> > 4) The argument source of an <aggregate function> is
> > Case:
> > a) If AF is immediately contained in a <set function
> > specification>, then a table or group of a grouped table as
> > specified in Subclause 7.10, "<having clause>", and Subclause 7.12,
> > "<query specification>".
> >
> > b) Otherwise, the collection of rows in the current row's window
> > frame defined by the window structure  descriptor identified by the
> > <window function> that simply contains AF, as defined in Subclause
> > 7.11, "<window clause>".
>
> Now the <query specification> stuff is pretty dense, and the <window
> clause> is currently impenetrable for me, so I just looked at the
> <having clause> and <query specification> sections referenced above.

I'm not surprised that the window clause section is impenetrable to
you.  Window clauses are part of the definition of analytic functions,
which postgres does NOT yet implement.  However they are on the todo
list.

Speaking personally, analytic functions are the single feature from
Oracle that I've most missed when moving to postgres.  I would be
happy to explain what they are and how they should work either on the
list or off to anyone who is interested in implementing them.  However
they are a topic for another thread, and probably for another week.
(I'm out next week, and am loathe to open that can of worms just yet.)

However I will note as a practical matter that implementing analytic
functions will increase how often people will want to have anonymous
subqueries in the from statement.  (Because for a certain set of
problems a useful technique is to do a group by query of an analytic
query.)

[...]
> > A grouped table is a set of groups derived during the evaluation of
> > a <group by clause>. A group G is a collection of rows in which,
> > for every grouping column GC, if the value of GC in some row is not
> > distinct from GV, then the value of GC in every row is GV;
> > moreover, if R1 is a row in group G1 of grouped table GT and R2 is
> > a row in GT such that for every grouping column GC the value of GC
> > in R1 is not distinct from the value of GC in R2, then R2 is in G1.
> > Every row in GT is in exactly one group. A group may be considered
> > as a table. Set functions operate on groups.
>
> So we're already talking about an explicit GROUP BY clause, and it
> doesn't seem to leave much wiggle room grouping columns that aren't
> explicitly listed. I'll readily admit I very easily could be reading
> this wrong or have missed a section of the spec that discusses this
> in more detail. (And that goes for any of my interpretations of the
> spec!)

Again, the spec specifies what must be accepted.  It does not ban
extensions.  Postgres already implements the useful extension of
accepting complicated expressions in a group by.  It could implement
more.

> One place you and I differ is that I think you'd be in favor of many
> more extensions in cases where the SQL spec doesn't explicitly say
> "Thou shalt not do X". I'm a bit more conservative here: in my
> opinion, if the SQL speaks to a particular issue (for example, a
> WHERE clause), then you should pretty much keep to the letter of what
> it says. Extensions are more for areas where the standard is silent.

We definitely differ.  To my eyes an extension that is well defined
and clearly useful which is currently illegal deserves consideration.
Utility is not the only concern, of course, you have to think about
how well it fits with other features, its likelyhood of breaking in
the future, etc.

Inferring the group by statement that is likely to be meant the vast
majority of the time is clearly useful.  It would get rid of a lot of
useless redundancy in a lot of people's queries.  It may be that it is
deemed too much work or too risky to implement.

> For example, full text search will be added to PostgreSQL in 8.3.
> AFAIK, the SQL spec doesn't talk about full text search at all, so
> this is an area where extensions are a natural fit. Others, such as
> PostgreSQL's DISTINCT ON, is a much more difficult issue, and I'd
> suspect if it were proposed as a new feature now it would be
> rejected. However, it's been a part of PostgreSQL for a long time, so
> for historical reasons it remains. Likewise, I doubt "text" as a
> synonym for "varchar" would be accepted.

If we had this conversation in 15 years, I'm willing to bet that full
text search would be in the spec and you'd be able to complain that
something about how it was implemented was not quite standard but
would have to remain for historical reasons...

> Anyway, enough excursions into SpecLand for me. I'd welcome anyone
> else taking a crack at it.

Cheers,
Ben


Re: SQL feature requests

From
Andrew Dunstan
Date:

Chuck McDevitt wrote:
> Sometimes supporting "de-facto" standards as well as official standards
> makes sense.
>
>
>   

On that basis we would support huge pieces of stuff that emulates MySQL 
too. Or perhaps you'd like us to follow Oracle's treatment of NULL. Who 
gets to choose what is the de facto standard we follow?

cheers

andrew


Re: SQL feature requests

From
Tom Lane
Date:
"Chuck McDevitt" <cmcdevitt@greenplum.com> writes:
> Tom, it isn't just a case of "convenience".  When we are trying to
> convert users from another database (say Oracle for example) to
> PostgeSQL, one of the big stumbling blocks that slows down the work is
> all the little query changes that people have to make

Well, if you're trying to sell it on the grounds of Oracle
compatibility, then it should actually *be* Oracle compatible.
What exactly do they do about the default-alias problem?
        regards, tom lane


Re: SQL feature requests

From
Alvaro Herrera
Date:
Gregory Stark escribió:

> I get this about once a day because I frequently type this ad-hoc query and
> always forget to put the alias in the first time:
> 
> postgres=# select count(*),n from (select count(*) as n from test group by i) group by n;
> ERROR:  subquery in FROM must have an alias
> HINT:  For example, FROM (SELECT ...) [AS] foo.

Exactly one extra keystroke gets you a non-error answer:

alvherre=# select count(*),n from (select count(*) as n from test group by i)a group by n;

-- 
Alvaro Herrera                               http://www.PlanetPostgreSQL.org/
"Si quieres ser creativo, aprende el arte de perder el tiempo"


Re: SQL feature requests

From
Brian Hurt
Date:
Chuck McDevitt wrote:

>Tom, it isn't just a case of "convenience".  When we are trying to
>convert users from another database (say Oracle for example) to
>PostgeSQL, one of the big stumbling blocks that slows down the work is
>all the little query changes that people have to make (people who might
>not have written the query as well), and it can require the review of
>hundreds or thousands of SQL scripts and applications.  The harder it
>is, the more reluctant they are to convert.
>
>Sometimes supporting "de-facto" standards as well as official standards
>makes sense.
>  
>

One of the ways "de-facto" standards are even better than standards is 
that there are even more of them.  Which de-facto standard should we 
follow: Oracle, MySQL, or MS SQL Server?  Note that following all of 
them is a bad idea, as one persons convience is another persons 
stumbling block.

Brian




Re: SQL feature requests

From
Michael Glaesemann
Date:
On Aug 23, 2007, at 12:25 , Ben Tilly wrote:

> It is already extended in postgres.  For pretty good reasons.

Yes, this particular point is moot. Whether or not the reasons are  
good is another thing and a difference of opinion.

> I'm not surprised that the window clause section is impenetrable to
> you.  Window clauses are part of the definition of analytic functions,
> which postgres does NOT yet implement.  However they are on the todo
> list.

And being worked on, from what I gather. Just not for 8.3. So  
hopefully you won't have to wait too much longer.

Michael Glaesemann
grzm seespotcode net




Re: SQL feature requests

From
"Ben Tilly"
Date:
On 8/23/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Chuck McDevitt" <cmcdevitt@greenplum.com> writes:
> > Tom, it isn't just a case of "convenience".  When we are trying to
> > convert users from another database (say Oracle for example) to
> > PostgeSQL, one of the big stumbling blocks that slows down the work is
> > all the little query changes that people have to make
>
> Well, if you're trying to sell it on the grounds of Oracle
> compatibility, then it should actually *be* Oracle compatible.
> What exactly do they do about the default-alias problem?

To the best of my knowledge such subqueries are completely anonymous.
There is no way to explicitly refer to them unless you provide an
alias.  Which is exactly the solution that was proposed twice in this
thread, and has the further benefit of being forwards compatible with
any reasonable future standard.

As verification I asked a certified Oracle DBA.  His understanding is
that Oracle may choose to rewrite the query for you or not.  If it
does not rewrite the query, then it has an internal identifier but
there is no way you can get to it.

Cheers,
Ben


Re: SQL feature requests

From
"Ben Tilly"
Date:
On 8/23/07, Michael Glaesemann <grzm@seespotcode.net> wrote:
> On Aug 23, 2007, at 10:47 , Ben Tilly wrote:
[...]
> > Why does it seem like a bug to you?
> >
> > Turn it around, and tell me in what way is its behaviour surprising to
> > someone who knows SQL.  You asked to group on something that is the
> > same for all rows.  That group by condition did nothing.  (Except
> > rendered the syntax valid when it might not have been.)  As I would
> > expect.
>
> Considering that I expect the GROUP BY clause to include only column
> references (or other groupings of column references), not
> expressions. Whether or not the value is the same, it surprises me
> that something other than a column reference is accepted at all. I
> hadn't realized this behavior was accepted in PostgreSQL, but I learn
> something new every day.

That's obviously a very different point of view than mine.  Over the
last decade of using relational databases I've always assumed that any
non-aggregate expression that is valid in a select is valid in a group
by, and I've only rarely been surprised by this.  (The only exceptions
that I can list off of the top of my head are that integer constants
can refer to columns by position, and text constants are not allowed
by postgres.)

> My non-rigorous way of thinking about GROUP BY is that it groups this
> listed columns when the values of the listed columns are the same. An
> expression that evaluates to anything other than a column name
> doesn't provide any information about which column to consider
> grouped, and expressions don't evaluate to column names, or
> identifiers in general. If I understand you correctly, a GROUP BY
> item that isn't a column name would be a value that's applied to all
> columns, and the actual value is irrelevant—different values don't
> change the result.

That's not quite how it works.  The better non-rigorous way of
thinking about it is that any non-aggregate function you can put in a
select is allowed in a group by.  So if I group by trim(foo.bar), I
will be grouping rows based on the distinct values of trim(foo.bar).
So the values 'baz', ' baz', 'baz ' and ' baz ' would all be rolled up
into one row in the group by query.  But the value 'blat' would wind
up in another row.

The case of a constant expression is the logical (if normally useless)
extension of this.

> So the only purpose it would serve would be to prevent a trailing
> comma from raising a syntax error: you'd still need to explicitly
> list the other columns (unless the implementation behavior is changed
> to extend the spec there as well). What this does is allow you to use
> something like this (psuedocode):

That is the only purpose of a constant expression is that.

> group_columns = [ 'foo', 'bar', 'baz' ]
>
> group_column_list = ''
> for col in group_columns { group_column_list += col + ',' } #
> group_column_list = "foo,bar,baz,"
>
> group_by_clause = "GROUP BY $group_column_list CASE WHEN TRUE THEN
> 'quux' END"
>
> rather than
>
> group_column_list = join group_columns, ',' # group_column_list =
> "foo,bar,baz"
> group_by_clause = "GROUP BY $group_column_list"
>
> I still feel I'm missing something. If that's it, it seems like
> something easy enough to handle in middleware. Sorry if it appears
> I'm being dense. I've definitely learned things in this thread.

That's mostly right.  However don't forget the group_columns might be
an empty list, and in that case you need to optionally not have a
group by clause at all.  (Yes, in some of my queries this is a very
real possibility.)

Plus a bit of context.  This comes up for me in reports which are
basically implemented as a series of queries using temp tables.  So
making the generation of SQL more convoluted significantly increases
the complexity of the code.  (Writing reports is most of my job, so I
get to write lots and lots of these.)

> > Furthermore ask yourself whether anyone who wrote that would likely
> > have written it by accident.
>
> I don't see what that has to do with anything. There are plenty of
> things I can write on purpose that would be nonsense. You might even
> consider my posts as prime examples :)

When you have a boundary case, sometimes you really want to answer the
question, "Is this case likely to be a result of confusion?"  Which
question is particularly relevant in this case because my strong
suspicion is that constant text expressions are banned in PostgreSQL
explicitly because of fears that they are a result of confusion.

Cheers,
Ben


Re: SQL feature requests

From
"Florian G. Pflug"
Date:
Ben Tilly wrote:
> On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote:
>> On Aug 22, 2007, at 20:49 , Ben Tilly wrote:
>>
>>> If your implementation accepts:
>>>
>>>   group by case when true then 'foo' end
>> What would that mean? Regardless of whether or not it's accepted, it
>> should have *some* meaning.
> 
> To my eyes it has a very clear meaning, we're grouping on an
> expression that happens to be a constant.  Which happens to be the
> same for all rows.  Which is a spectacularly useless thing to actually
> do, but the ability to do it happens to be convenient when I'm looking
> for something to terminate a series of commas in a dynamically built
> query.

Which is the same very clear meaning that "group by 1" has - we're
grouping on a expression which happens to be the constant 1. Hey,
wait a second. This isn't what "group by 1" means at all - it
rather means group by whatever the fist column in the select list is.

So, yes, "group by 'foo'" *seems* to have a very clear meaning - but
that clearness vanishes as soon as you take into account what "group by 1"
means.

greetings, Florian Pflug


Re: SQL feature requests

From
Josh Berkus
Date:
On Thursday 23 August 2007 09:55, Alvaro Herrera wrote:
> > I think the compromise is to instead of generating aliases at all just
> > use an alias like "*Anonymous Subquery*" and add a boolean flag
> > indicating that that range table is anonymous and not a valid target for
> > references. I started doing that a while back but got distracted (and
> > discouraged since it seemed not to have widespread support).
>
> Hey, you are describing an implementation of my suggestion.

I just don't really see the gain in doing this.  Any violations of the SQL 
standard are bad things in and of themselves; they encourage people to write 
non-spec applications which are then hard to port/support alongside other 
databases.  It's also bad PR for us.

So if we're going to support a non-spec feature (gods know we have) then it's 
going to be because it offers *significant* extra functionality.  SELECT 
DISTINCT ON (), LIMIT/OFFSET, custom data types, CREATE OPERATOR, RULES, etc. 
all give us stuff that would be *very* hard to do -- or even impossible -- 
without these features.

I just don't see the ability to omit the alias in a query with only one 
subquery (the only circumstances under which it would be safe to do so) as 
any significant gain in fuctionality.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: SQL feature requests

From
"Chuck McDevitt"
Date:

> -----Original Message-----
> From: Andrew Dunstan [mailto:andrew@dunslane.net]
> Sent: Thursday, August 23, 2007 10:26 AM
> To: Chuck McDevitt
> Cc: Tom Lane; Gregory Stark; Michael Glaesemann; Ben Tilly; pgsql-
> hackers@postgresql.org
> Subject: Re: [HACKERS] SQL feature requests
>
>
>
> Chuck McDevitt wrote:
> > Sometimes supporting "de-facto" standards as well as official
> standards
> > makes sense.
> >
> >
> >
>
> On that basis we would support huge pieces of stuff that emulates
MySQL
> too. Or perhaps you'd like us to follow Oracle's treatment of NULL.
Who
> gets to choose what is the de facto standard we follow?
>
> cheers
>
> Andrew

You must be joking... PostgreSQL already has a huge amount of
"non-standard" syntax and semantics (perhaps "extensions" is a better
word?).
Everything from non-standard cast operator, non-standard substr,
non-standard trim, non standard group by semantics (allowing simple ints
to mean column number)... Given a day, we could probably write down
several pages of "non-standard" features of PGSQL.

Who decides what de facto standards to support, and which not?  The
PostgreSQL community of course.

In general, we wouldn't want to support any de facto standard that:
 1.  Is supported only by one vendor 2.  Causes any standard SQL statement to fail, or return a different
answer from the standard.

The proposed change doesn't fail either of these.




Re: SQL feature requests

From
"Ben Tilly"
Date:
On 8/23/07, Florian G. Pflug <fgp@phlo.org> wrote:
> Ben Tilly wrote:
> > On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote:
> >> On Aug 22, 2007, at 20:49 , Ben Tilly wrote:
> >>
> >>> If your implementation accepts:
> >>>
> >>>   group by case when true then 'foo' end
> >> What would that mean? Regardless of whether or not it's accepted, it
> >> should have *some* meaning.
> >
> > To my eyes it has a very clear meaning, we're grouping on an
> > expression that happens to be a constant.  Which happens to be the
> > same for all rows.  Which is a spectacularly useless thing to actually
> > do, but the ability to do it happens to be convenient when I'm looking
> > for something to terminate a series of commas in a dynamically built
> > query.
>
> Which is the same very clear meaning that "group by 1" has - we're
> grouping on a expression which happens to be the constant 1. Hey,
> wait a second. This isn't what "group by 1" means at all - it
> rather means group by whatever the fist column in the select list is.

Which feature shocked me when I first saw it in Oracle.  It violated
every expectation that I have.

I also deliberately do NOT use that feature.  Because it is not safe
if someone else is possibly going to edit your query.  Add a field in
a natural place and, oops, your query just broke.  Also I hate
referring to things by position in code.  Particularly when they are
far away from each other as they may be in a large query.  (I've
written queries that are over 1000 lines long in the past.)

> So, yes, "group by 'foo'" *seems* to have a very clear meaning - but
> that clearness vanishes as soon as you take into account what "group by 1"
> means.

I'm happy to use "group by 'foo'::text" instead.  Anyone else in my
position will have to stumble on their own solution, but I don't think
there are that many in my position.

Cheers,
Ben


Re: SQL feature requests

From
"Chuck McDevitt"
Date:

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Florian G. Pflug
> Sent: Thursday, August 23, 2007 11:04 AM
> To: Ben Tilly
> Cc: Michael Glaesemann; Gregory Stark; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] SQL feature requests
> 
> Ben Tilly wrote:
> > On 8/22/07, Michael Glaesemann <grzm@seespotcode.net> wrote:
> >> On Aug 22, 2007, at 20:49 , Ben Tilly wrote:
> >>
> >>> If your implementation accepts:
> >>>
> >>>   group by case when true then 'foo' end
> >> What would that mean? Regardless of whether or not it's accepted, it
> >> should have *some* meaning.
> >
> > To my eyes it has a very clear meaning, we're grouping on an
> > expression that happens to be a constant.  Which happens to be the
> > same for all rows.  Which is a spectacularly useless thing to
> actually
> > do, but the ability to do it happens to be convenient when I'm
> looking
> > for something to terminate a series of commas in a dynamically built
> > query.
> 
> Which is the same very clear meaning that "group by 1" has - we're
> grouping on a expression which happens to be the constant 1. Hey,
> wait a second. This isn't what "group by 1" means at all - it
> rather means group by whatever the fist column in the select list is.
> 
> So, yes, "group by 'foo'" *seems* to have a very clear meaning - but
> that clearness vanishes as soon as you take into account what "group by
> 1"
> means.
> 
> greetings, Florian Pflug
> 

Except "group by 1" meaning "group by column 1" is a PostgreSQL extension, not a SQL standard feature, if I recall.

Anyway, I suppose this should work like ORDER BY... For some reason, we allow all expressions in ORDER BY *except* the
degeneratecase of a constant (ugly).
 

Expressions in ORDER BY are a PostgreSQL extension also... Not sure why we disallow the case of a constant, except
somebodywas worried that it would confuse users, because simple integer constants are treated special.
 

But it seems strange that this is legal in PostgreSQL:

Select * from x order by trim('foo');

But this is illegal:

Select * from x order by 'foo';

And this is accepted, but orders on the constant "1" rather than on column 1:

select * from x order by 1::int;


Re: SQL feature requests

From
Andrew Dunstan
Date:

Chuck McDevitt wrote:
> PostgreSQL already has a huge amount of
> "non-standard" syntax and semantics (perhaps "extensions" is a better
> word?).
> Everything from non-standard cast operator, non-standard substr,
> non-standard trim, non standard group by semantics (allowing simple ints
> to mean column number)... Given a day, we could probably write down
> several pages of "non-standard" features of PGSQL. 
>
>
>   

Quite so, and I've perpetrated a few myself. But for the most part they 
are either there for legacy reasons or add significant extra functionality.

I rather like Alvaro's compromise suggestion re aliases in fact. At 
least there seems to be a better case for that than for "group by 'blurfl'".

But the argument that convinces me is not that it follows some de facto 
standard, but that it would add to clarity. Requiring an alias where it 
isn't used seems to me a piece of less than excellent design.

cheers

andrew


Re: SQL feature requests

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> I just don't see the ability to omit the alias in a query with only one 
> subquery (the only circumstances under which it would be safe to do so) as 
> any significant gain in fuctionality.

Why do you think it'd be restricted to only one subquery?

As long as you take care that the subquery's column names don't match
any other ones in the query, you don't *need* an alias for it ---
there'll be no need to qualify the column names.  This extends just
fine to multiple subqueries.

(But I'm still bugged by the idea that there's some deep reason for the
SQL committee to have disallowed this syntax.  There's certainly not any
obvious reason why they did so.  Conceivably it was just politics ---
some Big Vendor using a parser that couldn't deal with it --- but maybe
there is something else, like a planned future extension that'd break
this syntax.)
        regards, tom lane


Re: SQL feature requests

From
Michael Glaesemann
Date:
On Aug 23, 2007, at 14:25 , Tom Lane wrote:

> Josh Berkus <josh@agliodbs.com> writes:
>> I just don't see the ability to omit the alias in a query with  
>> only one
>> subquery (the only circumstances under which it would be safe to  
>> do so) as
>> any significant gain in fuctionality.
>
> Why do you think it'd be restricted to only one subquery?
>
> As long as you take care that the subquery's column names don't match
> any other ones in the query, you don't *need* an alias for it ---
> there'll be no need to qualify the column names.  This extends just
> fine to multiple subqueries.

How about something like gensym? One alias you could always use and  
be guaranteed it would give a unique value. Still provide the alias,  
but don't have to think about name collisions.

Michael Glaesemann
grzm seespotcode net




Re: SQL feature requests

From
Josh Berkus
Date:
Tom,

> As long as you take care that the subquery's column names don't match
> any other ones in the query, you don't *need* an alias for it ---
> there'll be no need to qualify the column names.  This extends just
> fine to multiple subqueries.

Oh, hmmm.  I was thinking this would somehow be more of an issue with 
anonymous subqueries, but it isn't really.  You can still duplicate column 
names in named subqueries.

So, never mind that. 

I still don't see it as terrifically useful functionality, given that it's 
just saving you 4 keystrokes ...

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: SQL feature requests

From
Tom Lane
Date:
"Chuck McDevitt" <cmcdevitt@greenplum.com> writes:
> Except "group by 1" meaning "group by column 1" is a PostgreSQL extension, not a SQL standard feature, if I recall.

Correct.  Part of the reason for being conservative about changing here
is that we've got a mix of standard and nonstandard behaviors with
to-some-extent conflicting behavior.  Rejecting cases that are on the
borderline between the behaviors seems like a safer course than
accepting them and maybe doing something different than the user
expects.

A lot of this is legacy behavior that would never have passed muster
if it had been newly proposed in the last few years --- we have gotten
*far* stricter about SQL compliance than we used to be.  But at this
point backwards compatibility also has to weigh heavily.

> Expressions in ORDER BY are a PostgreSQL extension also...

Nyet --- they are required by SQL99 and later.  SQL92 and before
had "ORDER BY output-column-name-or-number" (and nothing else).
SQL99 replaced that with ORDER BY <expression>, which they then
bastardized so that it could include output column names, allowing
them to claim that they'd only eliminated the output-column-number
variant.  What we support is a rather unholy combination of the two
generations of the spec.  People are quite used to ORDER BY 1 and
so I doubt we'll ever want to eliminate the special case for it.
        regards, tom lane


Re: SQL feature requests

From
Alvaro Herrera
Date:
Michael Glaesemann wrote:
>
> On Aug 23, 2007, at 14:25 , Tom Lane wrote:
>
>> Josh Berkus <josh@agliodbs.com> writes:
>>> I just don't see the ability to omit the alias in a query with only one
>>> subquery (the only circumstances under which it would be safe to do so) 
>>> as
>>> any significant gain in fuctionality.
>>
>> Why do you think it'd be restricted to only one subquery?
>>
>> As long as you take care that the subquery's column names don't match
>> any other ones in the query, you don't *need* an alias for it ---
>> there'll be no need to qualify the column names.  This extends just
>> fine to multiple subqueries.
>
> How about something like gensym? One alias you could always use and be 
> guaranteed it would give a unique value. Still provide the alias, but don't 
> have to think about name collisions.

It is dangerous to provide a synthetic name; if the standard ever gets
modified to support alias-less subqueries, they would likely choose a
different name-generating algorithm, and we would have a
backward-compatibility problem.

Or is that a backwards-compatibility problem?  I remain unsure.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: SQL feature requests

From
Tom Lane
Date:
Josh Berkus <josh@agliodbs.com> writes:
> I still don't see it as terrifically useful functionality, given that it's 
> just saving you 4 keystrokes ...

Less than that, because the AS is optional.  The only argument I've
heard that carries much weight with me is that it eases porting from
other DBMS's that allow this.  Are there any others besides Oracle?

In a quick check, mysql 5.0.45 hews to the straight and narrow path:

mysql> create table t(f1 int);
Query OK, 0 rows affected (0.01 sec)

mysql> select * from (select f1 from t) x;
Empty set (0.00 sec)

mysql> select * from (select f1 from t) ;  
ERROR 1248 (42000): Every derived table must have its own alias
mysql> 

so you don't get to point to them as one that we'd improve
compatibility with.
        regards, tom lane


Re: SQL feature requests

From
Josh Berkus
Date:
Tom,

> so you don't get to point to them as one that we'd improve
> compatibility with.

Derby & DB2 are also strict.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco


Re: SQL feature requests

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Michael Glaesemann wrote:
>> How about something like gensym?

> It is dangerous to provide a synthetic name; if the standard ever gets
> modified to support alias-less subqueries, they would likely choose a
> different name-generating algorithm, and we would have a
> backward-compatibility problem.

I concur.  If we do this, the thing to do is have *no* alias, ie, there
is no way to qualify the names of the subquery output columns.  (IIRC
unaliased joins already work approximately like that.)  Then there's
not anything to have a backward compatibility problem with.
        regards, tom lane


Re: SQL feature requests

From
"Kevin Grittner"
Date:
>>> On Thu, Aug 23, 2007 at  3:01 PM, in message <11856.1187899268@sss.pgh.pa.us>,
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The only argument I've
> heard that carries much weight with me is that it eases porting from
> other DBMS's that allow this.  Are there any others besides Oracle?
> select * from (select f1 from t)
In Sybase:
com.sybase.jdbc2.jdbc.SybSQLException: The derived table expression is missing a correlation name. Check derived table
syntaxin the Reference Manual. 
Error code: 11753
SQL state: ZZZZZ




Re: SQL feature requests

From
"Florian G. Pflug"
Date:
Kevin Grittner wrote:
>>>> On Thu, Aug 23, 2007 at  3:01 PM, in message <11856.1187899268@sss.pgh.pa.us>,
> Tom Lane <tgl@sss.pgh.pa.us> wrote: 
>> The only argument I've
>> heard that carries much weight with me is that it eases porting from
>> other DBMS's that allow this.  Are there any others besides Oracle?
>  
>> select * from (select f1 from t) 
>  
> In Sybase:
>  
> com.sybase.jdbc2.jdbc.SybSQLException: The derived table expression is missing a correlation name. Check derived
tablesyntax in the Reference Manual.
 
> Error code: 11753
> SQL state: ZZZZZ

The really funny thing is that pgsql, mysql and at least sybase
*explicitly* dissallow the no-alias case. Which shows that  .) This seems to be common source of confusion and errors.
.)Aliasless-Subqueries wouldn't lead to ambigous grammras in those databases.     Otherwise, you'd expect to get some
moregeneric syntax error, and not     the very explicit "No alias, but expected one".
 

I agree with Tom - knowing *why* the standard committee disallows that syntax -
and why everybody except oracle chose to agree with it would be quite interesting.

greetings, Florian Pflug



Re: SQL feature requests

From
Peter Eisentraut
Date:
Florian G. Pflug wrote:
> knowing *why* the standard committee disallows that syntax -
> and why everybody except oracle chose to agree with it would be quite
> interesting.

Consider that we rightfully disallow

SELECT * FROM tab1, tab1;

because there are two tables with the same name involved.  This hints to 
the fact that the internal query processing needs to have a unique name 
for each table source.  You could argue why that would have to be, but 
it seems quite useful.  Otherwise you would also have to allow

SELECT * FROM tab1 AS a, tab2 AS a;

and that cannot possibly be good.

If you believe that, then a table source without any name at all, such 
as one produced by subqueries, should also be disallowed.  Requiring a 
unique name appears to be a stronger condition than requiring a name at 
all.  This is actually false if you consider that SQL itself does 
distinguish between unique constraints and not-null constraints.  But 
you don't really want to invent a logic by which nameless table sources 
are managed.  Strict logic would probably require the results of all 
such queries to be unknown.

Note that Oracle is notorious for mishandling null values in various 
ways.  So it's not surprising that they mishandle "null" table names in 
what appears to be pretty much the same way.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/


Re: SQL feature requests

From
Tom Lane
Date:
"Florian G. Pflug" <fgp@phlo.org> writes:
> The really funny thing is that pgsql, mysql and at least sybase
> *explicitly* dissallow the no-alias case.

I knew that pgsql does that, because I made sure it did ;-).  But it is
pretty interesting that these other DBMSes also go out of their way to
produce a specific error.  As you say, that suggests that they could
have accepted it, if they'd wanted to, at least from the standpoint of
not having a grammar structure problem.
        regards, tom lane


Re: SQL feature requests

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Florian G. Pflug wrote:
>> knowing *why* the standard committee disallows that syntax -
>> and why everybody except oracle chose to agree with it would be quite
>> interesting.

> ...
> If you believe that, then a table source without any name at all, such 
> as one produced by subqueries, should also be disallowed.

Good point.  However, what I would kind of expect the standards
committee to do is to specify that each such subquery has an
automatically generated name assigned to it.  Compare what SQL92 says
about assigning aliases for SELECT output columns:
           a) If the i-th <derived column> in the <select list> specifies             an <as clause> that contains a
<columnname> C, then the             <column name> of the i-th column of the result is C.
 
           b) If the i-th <derived column> in the <select list> does not             specify an <as clause> and the
<valueexpression> of that             <derived column> is a single <column reference>, then the             <column
name>of the i-th column of the result is C.
 
           c) Otherwise, the <column name> of the i-th column of the <query             specification> is
implementation-dependentand different             from the <column name> of any column, other than itself, of
 a table referenced by any <table reference> contained in the             SQL-statement.
 

Implementation-dependent is a term of art here: it means that each DBMS
can do whatever the heck it wants.  (But note that we fail to meet (c)
exactly, since we don't bother to generate names that are distinct ---
but in practice no one seems to care about that.)

If we could expect that any future relaxation of the spec would likewise
specify generating implementation-dependent unique table aliases for
subqueries, then we'd not be having this discussion.  What troubles me
most is why the spec hasn't had identical verbiage for table aliases as
it does for column aliases, since day one.  There's *something* going on
there that we don't know about.

(BTW, your argument would seem to also disallow alias-less JOIN tables,
but the spec has allowed those all along.)
        regards, tom lane


Re: SQL feature requests

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

>             c) Otherwise, the <column name> of the i-th column of the <query
>               specification> is implementation-dependent and different
>               from the <column name> of any column, other than itself, of
>               a table referenced by any <table reference> contained in the
>               SQL-statement.
>
> Implementation-dependent is a term of art here: it means that each DBMS
> can do whatever the heck it wants.  (But note that we fail to meet (c)
> exactly, since we don't bother to generate names that are distinct ---
> but in practice no one seems to care about that.)

Actually I suspect there are people who get annoyed by it when they try to
reference a column by name in a client driver like DBI which allows that. 

Note that if you use something like fetchrow_hashref it will actually condense
out duplicate column names since it loads the row into a hash. So if you
you're writing a program which just wants to dump the record without
understanding it you probably load it into a hash and then dump the hash in
key=>value form. And that will cause some columns to be dropped in the output.

But those people probably just figure it was their own fault and put in
aliases in their queries.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: SQL feature requests

From
Andrew Sullivan
Date:
On Thu, Aug 23, 2007 at 02:06:16PM -0400, Chuck McDevitt wrote:
> In general, we wouldn't want to support any de facto standard that:
> 
>   1.  Is supported only by one vendor
>   2.  Causes any standard SQL statement to fail, or return a different
> answer from the standard.
> 
> The proposed change doesn't fail either of these.

From what I can see upthread, it fails 1 and possibly 2.  Given that
we don't seem to know _why_ it is forbidden, there could well be a
case under 2 is a problem, and we haven't thought of it.

A

-- 
Andrew Sullivan  | ajs@crankycanuck.ca
Everything that happens in the world happens at some place.    --Jane Jacobs 


Re: SQL feature requests

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> ... note that we fail to meet (c)
>> exactly, since we don't bother to generate names that are distinct ---
>> but in practice no one seems to care about that.)

> Actually I suspect there are people who get annoyed by it when they try to
> reference a column by name in a client driver like DBI which allows that. 

> Note that if you use something like fetchrow_hashref it will actually condense
> out duplicate column names since it loads the row into a hash. So if you
> you're writing a program which just wants to dump the record without
> understanding it you probably load it into a hash and then dump the hash in
> key=>value form. And that will cause some columns to be dropped in the output.

> But those people probably just figure it was their own fault and put in
> aliases in their queries.

Well, if you're using client-side code that depends on access by name
rather than field position, you definitely have to put in AS clauses.
Even if we did generate distinct names, a client couldn't rely on
knowing in advance what they'd be.
        regards, tom lane


Re: SQL feature requests

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>
>> Note that if you use something like fetchrow_hashref it will actually condense
>> out duplicate column names since it loads the row into a hash. So if you
>> you're writing a program which just wants to dump the record without
>> understanding it you probably load it into a hash and then dump the hash in
>> key=>value form. And that will cause some columns to be dropped in the output.
>
>> But those people probably just figure it was their own fault and put in
>> aliases in their queries.
>
> Well, if you're using client-side code that depends on access by name
> rather than field position, you definitely have to put in AS clauses.
> Even if we did generate distinct names, a client couldn't rely on
> knowing in advance what they'd be.

That's why I got tied up trying to describe a scenario where you wouldn't have
to rely on knowing in advance what they would be. If you're running some kind
of reporting tool it could let the user type in arbitrary queries and then
look at what names are returned from the describe message to put in its column
headings.

If such a tool was written in perl using fetchrow_hashref I think it would end
up receiving only one of each distinct column name. Whereas it should be able
to depend on receiving all of them with distinct names, even if it won't know
what those names will be in advance.

If it's written to use arrays and column positions then it would still work
properly though. And we haven't seen any real complaints. 

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: SQL feature requests

From
Ron Mayer
Date:
Tom Lane wrote:
> Part of the reason for being conservative about changing here
> is that we've got a mix of standard and nonstandard behaviors
> 
> A lot of this is legacy behavior that would never have passed muster
> if it had been newly proposed in the last few years --- we have gotten
> *far* stricter about SQL compliance than we used to be.  But at this
> point backwards compatibility also has to weigh heavily.

Has there been any thought to eventually phasing them out?

Perhaps a GUC to give warnings in the log file when
they're encountered.   I guess we'd need 3 levels of
warnings, "off, "reasonable" and "pedantic".   When set
to the reasonable level it could only give smart warning
messages like Warning: Use of frivolous nonstandard behavior XXX. Hint: Use the standard YYY instead.
and when set to pedantic it would point out every
non-standard SQL statement - useful only for someone
to be aware of how much postgresql dependent behavior
they might have.

Then a farther future release could deprecate the
frivolous non-standard pieces presumably leading to
simpler code in the long run.


Re: SQL feature requests

From
Tom Lane
Date:
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes:
>   Warning: Use of frivolous nonstandard behavior XXX.
>   Hint: Use the standard YYY instead.

If these behaviors were really "frivolous" this might fly, but I don't
think the majority of users will hold still for the removal of either
GROUP BY 1 or ORDER BY 1.  They're embedded in popular usage, and at
least one of them was in the standard not so long ago.  Moreover,
despite the request that started this thread, the "correct" behavior
for these syntaxes is utterly useless.
        regards, tom lane


Re: SQL feature requests

From
"Zeugswetter Andreas ADI SD"
Date:
> > I still don't see it as terrifically useful functionality, given
that it's
> > just saving you 4 keystrokes ...
>
> Less than that, because the AS is optional.  The only argument I've
> heard that carries much weight with me is that it eases porting from
> other DBMS's that allow this.  Are there any others besides Oracle?

FWIW IBM implemented this syntax in Informix 11.1 Cheetah released in
July 2007.
They also allow to omit the alias, and it imho makes sense.

Andreas


Re: SQL feature requests

From
Bruce Momjian
Date:
This has been saved for the 8.4 release:
http://momjian.postgresql.org/cgi-bin/pgpatches_hold

---------------------------------------------------------------------------

Andrew Dunstan wrote:
> 
> 
> Chuck McDevitt wrote:
> > PostgreSQL already has a huge amount of
> > "non-standard" syntax and semantics (perhaps "extensions" is a better
> > word?).
> > Everything from non-standard cast operator, non-standard substr,
> > non-standard trim, non standard group by semantics (allowing simple ints
> > to mean column number)... Given a day, we could probably write down
> > several pages of "non-standard" features of PGSQL. 
> >
> >
> >   
> 
> Quite so, and I've perpetrated a few myself. But for the most part they 
> are either there for legacy reasons or add significant extra functionality.
> 
> I rather like Alvaro's compromise suggestion re aliases in fact. At 
> least there seems to be a better case for that than for "group by 'blurfl'".
> 
> But the argument that convinces me is not that it follows some de facto 
> standard, but that it would add to clarity. Requiring an alias where it 
> isn't used seems to me a piece of less than excellent design.
> 
> cheers
> 
> andrew
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: SQL feature requests

From
Bruce Momjian
Date:
FYI, we decided we didn't want this additional capability.

---------------------------------------------------------------------------

Florian G. Pflug wrote:
> Kevin Grittner wrote:
> >>>> On Thu, Aug 23, 2007 at  3:01 PM, in message <11856.1187899268@sss.pgh.pa.us>,
> > Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> >> The only argument I've
> >> heard that carries much weight with me is that it eases porting from
> >> other DBMS's that allow this.  Are there any others besides Oracle?
> >  
> >> select * from (select f1 from t) 
> >  
> > In Sybase:
> >  
> > com.sybase.jdbc2.jdbc.SybSQLException: The derived table expression is missing a correlation name. Check derived
tablesyntax in the Reference Manual.
 
> > Error code: 11753
> > SQL state: ZZZZZ
> 
> The really funny thing is that pgsql, mysql and at least sybase
> *explicitly* dissallow the no-alias case. Which shows that
>    .) This seems to be common source of confusion and errors.
>    .) Aliasless-Subqueries wouldn't lead to ambigous grammras in those databases.
>       Otherwise, you'd expect to get some more generic syntax error, and not
>       the very explicit "No alias, but expected one".
> 
> I agree with Tom - knowing *why* the standard committee disallows that syntax -
> and why everybody except oracle chose to agree with it would be quite interesting.
> 
> greetings, Florian Pflug
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +