Thread: GROUP BY column alias?

GROUP BY column alias?

From
"Eric B. Ridge"
Date:
Maybe I'm getting too old to SQL anymore, but I ran across something yesterday in a machine generated query that took
meover an hour to figure out.   

Here's a little testcase.  Maybe somebody can explain why the last "Not Expected" case does what it does.

select version();
PostgreSQL 8.4.1 on i386-apple-darwin10.0.0, compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc.
build5646), 64-bit 

create table foo(day timestamp);
insert into foo values (now());
insert into foo values (now());

Expected:
select day, count(*) from foo group by day;
            day             | count
----------------------------+-------
 2010-02-18 15:41:37.335357 |     1
 2010-02-18 15:41:39.471746 |     1
(2 rows)

Expected:
select day::date, count(*) from foo group by day;
    day     | count
------------+-------
 2010-02-18 |     1
 2010-02-18 |     1
(2 rows)

Expected:
select day::date, count(*) from foo group by day::date;
    day     | count
------------+-------
 2010-02-18 |     2
(1 row)

Expected:
select day::date as bar, count(*) from foo group by bar;
    bar     | count
------------+-------
 2010-02-18 |     2
(1 row)

Not Expected:
select day::date as day, count(*) from foo group by day;
    day     | count
------------+-------
 2010-02-18 |     1
 2010-02-18 |     1
(2 rows)

Note in the last case, the "day" column is aliased as "day", but the group by using the physical "day" column, not the
alias. That boggles my mind, especially when you consider the case above it, where "day" is aliased as "bar" and
groupingby "bar" works as expected. 

eric

Re: GROUP BY column alias?

From
Scott Bailey
Date:
Eric B. Ridge wrote:
> Maybe I'm getting too old to SQL anymore, but I ran across something yesterday in a machine generated query that took
meover an hour to figure out.   
>
> Here's a little testcase.  Maybe somebody can explain why the last "Not Expected" case does what it does.
>
> select version();
> PostgreSQL 8.4.1 on i386-apple-darwin10.0.0, compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc.
build5646), 64-bit 
>
> create table foo(day timestamp);
> insert into foo values (now());
> insert into foo values (now());
>
> Expected:
> select day, count(*) from foo group by day;
>             day             | count
> ----------------------------+-------
>  2010-02-18 15:41:37.335357 |     1
>  2010-02-18 15:41:39.471746 |     1
> (2 rows)
>
> Expected:
> select day::date, count(*) from foo group by day;
>     day     | count
> ------------+-------
>  2010-02-18 |     1
>  2010-02-18 |     1
> (2 rows)
>
> Expected:
> select day::date, count(*) from foo group by day::date;
>     day     | count
> ------------+-------
>  2010-02-18 |     2
> (1 row)
>
> Expected:
> select day::date as bar, count(*) from foo group by bar;
>     bar     | count
> ------------+-------
>  2010-02-18 |     2
> (1 row)
>
> Not Expected:
> select day::date as day, count(*) from foo group by day;
>     day     | count
> ------------+-------
>  2010-02-18 |     1
>  2010-02-18 |     1
> (2 rows)
>
> Note in the last case, the "day" column is aliased as "day", but the group by using the physical "day" column, not
thealias.  That boggles my mind, especially when you consider the case above it, where "day" is aliased as "bar" and
groupingby "bar" works as expected. 
>
> eric

I'm not sure why you would be surprised by that behavior. You are
grouping by a timestamp, so any microsecond difference will be a new group.

If you want to make that work try:
SELECT day::date, --no need to alias as same name
COUNT(*) FROM foo
GROUP BY day::date;

Scott

Re: GROUP BY column alias?

From
"Eric B. Ridge"
Date:
On Feb 18, 2010, at 4:31 PM, Scott Bailey wrote:
> I'm not sure why you would be surprised by that behavior. You are grouping by a timestamp, so any microsecond
differencewill be a new group. 

I get that. ;)  Hence the ::date.  This is what doesn't make sense:

Expected:     select day::date as bar, count(*) from foo group by bar;
Not Expected: select day::date as day, count(*) from foo group by day;

If I alias it to something other than the actual column name, it does what I expect.  But if I alias it to the column
name,it doesn't. 

I would have thought that the precedence rules would resolve the alias first, then the column name, but that doesn't
seemto be the case. 

> If you want to make that work try:
> SELECT day::date, --no need to alias as same name

The "no need to alias as same name" isn't true in my case because the queries I'm dealing with are machine generated,
andthat's what the generator does, in all cases. 

eric

Re: GROUP BY column alias?

From
Scott Bailey
Date:
Eric B. Ridge wrote:
> On Feb 18, 2010, at 4:31 PM, Scott Bailey wrote:
>> I'm not sure why you would be surprised by that behavior. You are grouping by a timestamp, so any microsecond
differencewill be a new group. 
>
> I get that. ;)  Hence the ::date.  This is what doesn't make sense:
>
> Expected:     select day::date as bar, count(*) from foo group by bar;
> Not Expected: select day::date as day, count(*) from foo group by day;
>
> If I alias it to something other than the actual column name, it does what I expect.  But if I alias it to the column
name,it doesn't. 
>
> I would have thought that the precedence rules would resolve the alias first, then the column name, but that doesn't
seemto be the case. 

SQL name resolution rules are that column names have higher precedence
than aliases and variables. So it will always bind to the column not the
alias.

Scott

Re: GROUP BY column alias?

From
"Eric B. Ridge"
Date:
On Feb 18, 2010, at 5:52 PM, Scott Bailey wrote:

> SQL name resolution rules are that column names have higher precedence than aliases and variables. So it will always
bindto the column not the alias. 

That explains it.  Thanks.  Breaks the rule of least surprise, but it is SQL.

eric

Re: GROUP BY column alias?

From
Lew
Date:
Scott Bailey wrote:
>> SQL name resolution rules are that column names
>> have higher precedence than aliases and variables.
>> So it will always bind to the column not the alias.


Eric B. Ridge wrote:
> That explains it.  Thanks.  Breaks the rule of least surprise, but it is SQL.

I don't think it does break the rule of least surprise.  How would one expect
the column or the alias to have precedence without knowledge of the rule from
documentation?  The only thing I would expect if I were unaware of the rule is
that there might be a conflict and that I'd have to look up the precedence.

--
Lew

Re: GROUP BY column alias?

From
Tom Lane
Date:
Lew <noone@lwsc.ehost-services.com> writes:
> Eric B. Ridge wrote:
>> That explains it.  Thanks.  Breaks the rule of least surprise, but it is SQL.

> I don't think it does break the rule of least surprise.  How would one
> expect the column or the alias to have precedence without knowledge of
> the rule from documentation?  The only thing I would expect if I were
> unaware of the rule is that there might be a conflict and that I'd
> have to look up the precedence.

There's actually a good deal more there than meets the eye.  Our SELECT
reference page spells it out, if you read the fine print.  In particular
it's worth noting this bit in the Compatibility section:

Namespace Available to GROUP BY and ORDER BY

In the SQL-92 standard, an ORDER BY clause can only use output column
names or numbers, while a GROUP BY clause can only use expressions based
on input column names. PostgreSQL extends each of these clauses to allow
the other choice as well (but it uses the standard's interpretation if
there is ambiguity). PostgreSQL also allows both clauses to specify
arbitrary expressions. Note that names appearing in an expression will
always be taken as input-column names, not as output-column names.

SQL:1999 and later use a slightly different definition which is not
entirely upward compatible with SQL-92. In most cases, however,
PostgreSQL will interpret an ORDER BY or GROUP BY expression the same
way SQL:1999 does.

            regards, tom lane

Re: GROUP BY column alias?

From
David Fetter
Date:
On Fri, Feb 19, 2010 at 12:07:42AM -0500, Tom Lane wrote:
> Lew <noone@lwsc.ehost-services.com> writes:
> > Eric B. Ridge wrote:
> >> That explains it.  Thanks.  Breaks the rule of least surprise,
> >> but it is SQL.
>
> SQL:1999 and later use a slightly different definition which is not
> entirely upward compatible with SQL-92. In most cases, however,
> PostgreSQL will interpret an ORDER BY or GROUP BY expression the
> same way SQL:1999 does.

The current SQL standard *supersedes* all previous ones.  There isn't
a hierarchy in the sense of "higher levels of compliance" that our
docs implicitly and falsely assume in many spots, and we need to make
them stop including this idea.

The only standard actually worth citing today is SQL:2008, and the day
the next one comes out, we need to change all our references to cite
it.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: GROUP BY column alias?

From
Tom Lane
Date:
David Fetter <david@fetter.org> writes:
> On Fri, Feb 19, 2010 at 12:07:42AM -0500, Tom Lane wrote:
>> SQL:1999 and later use a slightly different definition which is not
>> entirely upward compatible with SQL-92. In most cases, however,
>> PostgreSQL will interpret an ORDER BY or GROUP BY expression the
>> same way SQL:1999 does.

> The current SQL standard *supersedes* all previous ones.

That is the opinion of the SQL committee, all right, but it has got
precious little to do with the real world.

            regards, tom lane