Thread: group by

group by

From
"tjk@tksoft.com"
Date:
Is there something amiss with the "GROUP BY" clause?

I never had any problems with it, but I was just doing
a query trying to eliminate duplicates from a list
using GROUP BY on one of the fields, and I got
an "Illegal use of aggregates or non-group column in target list"
error.

E.g. "select email,username from emails group by email"
     produces the error.
     "select email from emails group by email"
      works, and so does
      "select distinct on email email,username from emails"

My PG version is 6.5.3.

"distinct on columnname" works and does the job, but
I am curious if I am brainfarting or if this is a
real change.

Thanks,

Troy

Re: [SQL] group by

From
Tom Lane
Date:
"tjk@tksoft.com" <tjk@tksoft.com> writes:
> Is there something amiss with the "GROUP BY" clause?

No.  Not in this example, anyway.

> E.g. "select email,username from emails group by email" 
>      produces
>      "Illegal use of aggregates or non-group column in target list"

As it should.  Which username are you expecting to retrieve from
each group?  The query is not well-defined.  You could do something
like
select email,min(username) from emails group by email

and get a well-defined result.  Basically, if you use GROUP BY
then any non-grouped columns can only appear as the arguments of
aggregate functions.

Postgres versions before 6.5 were rather lax about catching this
class of error, but 6.5 is more careful about it.

I'm not very happy with the wording of the error message; it's both
vague and confusing.  But I'm not sure how to do better, either.
The routine that checks for this is looking at a very low-level
condition that could cover a multitude of sins, and I don't see any
easy way to deliver a more-specific error message that wouldn't be
wrong (or at least misleading) as often as not :-(

>       ... and so does 
>       "select distinct on email email,username from emails"

SELECT DISTINCT ON is a non-standard abomination that does not
give predictable results.  It's basically the same problem: if
you take only one tuple out of each group with the same value
of "email", which tuple do you get?  The results are totally
dependent on unspecified implementation choices.  I've suggested
several times that we ought to remove SELECT DISTINCT ON, and unless
there's a real groundswell of discontent I'm probably going to
rip it out of 7.0...
        regards, tom lane