Thread: field must appear in the GROUP BY clause or be used in an aggregate function?

field must appear in the GROUP BY clause or be used in an aggregate function?

From

Bill Moran

Date:

27 February 2004, 12:11:41

Hey all.

I've hit an SQL problem that I'm a bit mystified by.  I have two different
questions regarding this problem: why?  and how do I work around it?

The following query:

SELECT     GCP.id,
    GCP.Name
     FROM    Gov_Capital_Project GCP,
     WHERE TLM.TLI_ID = $2
     group by GCP.id
     ORDER BY gcp.name;

Produces the following error:

ERROR:  column "gcp.name" must appear in the GROUP BY clause or be used in an aggregate function

That field is a CHAR, so I'm not sure what kind of aggregate to use,
or (more important to my understanding) why one is necessary.

As I said, I'm not sure I understand why this occurs.  I'm assuming that I
don't understand "group by" as well as I thought I did ;)

This isn't my query, I'm translating a system prototyped in MSSQL to
Postgres.  This query _does_ work in MSSQL.  Does that constitute a
bug in MSSQL, or a shortcomming of Postgres, or just a difference of
interpretation?

--
Bill Moran
Potential Technologies
http://www.potentialtech.com

Re: field must appear in the GROUP BY clause or be used

From

"John Sidney-Woollett"

Date:

27 February 2004, 12:22:20

Bill Moran said:
> I've hit an SQL problem that I'm a bit mystified by.  I have two different
> questions regarding this problem: why?  and how do I work around it?
>
> The following query:
>
> SELECT     GCP.id,
>     GCP.Name
>      FROM    Gov_Capital_Project GCP,
>      WHERE TLM.TLI_ID = $2
>      group by GCP.id
>      ORDER BY gcp.name;
>
> Produces the following error:
>
> ERROR:  column "gcp.name" must appear in the GROUP BY clause or be used in
> an aggregate function

Since you're not agregating data, can't you use a select distinct instead?

SELECT  distinct GCP.id, GCP.Name
FROM Gov_Capital_Project GCP, {?something missing here?}
WHERE TLM.TLI_ID = $2
ORDER BY gcp.name;

(BTW, I wasn't clear if the where clause trying to join to another table?)

Doesn't answer your original question, but hope it helps anyway.

John Sidney-Woollett

Re: field must appear in the GROUP BY clause or be used

From

Bill Moran

Date:

27 February 2004, 12:40:11

John Sidney-Woollett wrote:
> Bill Moran said:
>
>>I've hit an SQL problem that I'm a bit mystified by.  I have two different
>>questions regarding this problem: why?  and how do I work around it?
>>
>>The following query:
>>
>>SELECT     GCP.id,
>>    GCP.Name
>>     FROM    Gov_Capital_Project GCP,
>>     WHERE TLM.TLI_ID = $2
>>     group by GCP.id
>>     ORDER BY gcp.name;
>>
>>Produces the following error:
>>
>>ERROR:  column "gcp.name" must appear in the GROUP BY clause or be used in
>>an aggregate function
>
> Since you're not agregating data, can't you use a select distinct instead?

Not sure.  I'll have to get back to the programmer who wrote the orignal
SELECT and find out what kind of data he is actually trying to acquire.

> SELECT  distinct GCP.id, GCP.Name
> FROM Gov_Capital_Project GCP, {?something missing here?}
> WHERE TLM.TLI_ID = $2
> ORDER BY gcp.name;
>
> (BTW, I wasn't clear if the where clause trying to join to another table?)

Yes, my bad.  The actual query causing the problem is a bit longer with about
6 joins to it.  I did test:

select id, name from gov_capital_project group by id order by name;

and it causes the same error, so I thought I'd make the question simpler by
removing the parts that obviously weren't contributing to the problem.

> Doesn't answer your original question, but hope it helps anyway.

It may, thanks for the input!

Like I said, the most important part (to me) is to understand why
Postgres refuses to run this.  The fact that I don't know why points
to an obvious lack of understanding on my account, and I'd like to
remedy that :D

To that effect, if anyone can point me to a doc that will help me
gain a better understanding of why this error occurs, I'd be happy
to read it!

--
Bill Moran
Potential Technologies
http://www.potentialtech.com

Re: field must appear in the GROUP BY clause or be used

From

Mike Mascari

Date:

27 February 2004, 12:54:59

Bill Moran wrote:

> Hey all.
>
> I've hit an SQL problem that I'm a bit mystified by.  I have two different
> questions regarding this problem: why?  and how do I work around it?
>
> The following query:
>
> SELECT     GCP.id,
>     GCP.Name
>     FROM    Gov_Capital_Project GCP,
>     WHERE TLM.TLI_ID = $2
>     group by GCP.id
>     ORDER BY gcp.name;
>
> Produces the following error:
>
> ERROR:  column "gcp.name" must appear in the GROUP BY clause or be used
> in an aggregate function

The reason the grouping requires either an attribute to be
aggregated or apart of the group by list is that if it were not, an
arbitrary value would have to be selected:

[test@lexus] select * from projects;
    dept    |   project
-----------+--------------
  Finance   | Y2K
  Corporate | Y2K
  Corporate | Annual Audit
(3 rows)


[test@lexus] select dept, project from projects group by dept;
ERROR:  column "projects.project" must appear in the GROUP BY clause
or be used in an aggregate function

If this were to be permitted, which project should be selected,
'Y2K' or 'Annual Audit'?

[test@lexus] select dept, project from projects group by dept, project;
    dept    |   project
-----------+--------------
  Corporate | Y2K
  Corporate | Annual Audit
  Finance   | Y2K
(3 rows)


Of course, this has little meaning without an aggregate. All you're
doing is leveraging GROUP BY's sort. You might as well use DISTINCT.
More useful would be:

[test@lexus] select dept, count(project) from projects group by dept;
    dept    | count
-----------+-------
  Finance   |     1
  Corporate |     2
(2 rows)

or perhaps:

[test@lexus] select count(dept), project from projects group by project;
  count |   project
-------+--------------
      2 | Y2K
      1 | Annual Audit

> This isn't my query, I'm translating a system prototyped in MSSQL to
> Postgres.  This query _does_ work in MSSQL.  Does that constitute a
> bug in MSSQL, or a shortcomming of Postgres, or just a difference of
> interpretation?

If MSSQL picks an arbitrary value for the non-group by attribute, it
is violating spec.

Mike Mascari

Re: field must appear in the GROUP BY clause or be used

From

"John Sidney-Woollett"

Date:

27 February 2004, 12:55:18

Bill Moran said:
> Like I said, the most important part (to me) is to understand why
> Postgres refuses to run this.  The fact that I don't know why points
> to an obvious lack of understanding on my account, and I'd like to
> remedy that :D

I have always assumed that you had to place all (non aggregated) columns
in your select in the "group by" clause as well. I suspect that the other
database isn't so picky (or is incorrect?).

Presumably changing the query to:
  select id, name from gov_capital_project group by id, name order by name;
works fine?

> To that effect, if anyone can point me to a doc that will help me
> gain a better understanding of why this error occurs, I'd be happy
> to read it!

Have a look at:
http://www.postgresql.org/docs/7.4/static/sql-select.html#SQL-GROUPBY

[excerpted text]

GROUP BY Clause

The optional GROUP BY clause has the general form

GROUP BY expression [, ...]

GROUP BY will condense into a single row all selected rows that share the
same values for the grouped expressions. expression can be an input column
name, or the name or ordinal number of an output column (SELECT list
item), or an arbitrary expression formed from input-column values. In case
of ambiguity, a GROUP BY name will be interpreted as an input-column name
rather than an output column name.

Aggregate functions, if any are used, are computed across all rows making
up each group, producing a separate value for each group (whereas without
GROUP BY, an aggregate produces a single value computed across all the
selected rows). When GROUP BY is present, it is not valid for the SELECT
list expressions to refer to ungrouped columns except within aggregate
functions, since there would be more than one possible value to return for
an ungrouped column.

John Sidney-Woollett

Re: field must appear in the GROUP BY clause or be used

From

"scott.marlowe"

Date:

27 February 2004, 12:55:33

On Fri, 27 Feb 2004, Bill Moran wrote:

> Hey all.
>
> I've hit an SQL problem that I'm a bit mystified by.  I have two different
> questions regarding this problem: why?  and how do I work around it?
>
> The following query:
>
> SELECT     GCP.id,
>     GCP.Name
>      FROM    Gov_Capital_Project GCP,
>      WHERE TLM.TLI_ID = $2
>      group by GCP.id
>      ORDER BY gcp.name;
>
> Produces the following error:
>
> ERROR:  column "gcp.name" must appear in the GROUP BY clause or be used in an aggregate function

OK, let's look at a test table:

id | data
---------
0 | 'abc'
0 | 'def'
1 | 'ghi'

Now, let's use this query:

select id, data from test_table group by id;

what results should I get back?

I have two possible results for the data column, abc and def.  But I only
get one row with a 0 in it, so which one of those do I pick?

If I use an aggregate I can be sure to get the first or last one:

select id, max(data) from test_table group by id;

Also, you may want to look at postgresql's extension, "distinct on":

http://www.postgresql.org/docs/7.4/static/queries-select-lists.html#QUERIES-DISTINCT

It can give you the kind of results you want.

select distinct on (id) id, data from test_table;

But is know to be indeterminate, so you may get different results each
time.

Re: field must appear in the GROUP BY clause or be used

From

Richard Huxton

Date:

27 February 2004, 13:07:16

On Friday 27 February 2004 16:39, Bill Moran wrote:
> John Sidney-Woollett wrote:
> > Bill Moran said:
> >>
> >>SELECT     GCP.id,
> >>    GCP.Name
> >>     FROM    Gov_Capital_Project GCP,
> >>     WHERE TLM.TLI_ID = $2
> >>     group by GCP.id
> >>     ORDER BY gcp.name;

> >>ERROR:  column "gcp.name" must appear in the GROUP BY clause or be used
> >> in an aggregate function

> Like I said, the most important part (to me) is to understand why
> Postgres refuses to run this.  The fact that I don't know why points
> to an obvious lack of understanding on my account, and I'd like to
> remedy that :D

Like the error message says, if you're using GROUP BY everything in the SELECT
list must be an aggregate SUM(...) or used in the GROUP BY.

So, this is OK:
  SELECT dept, week, SUM(amt_sold)
  FROM weekly_sales
  GROUP BY dept,week;
This isn't:
  SELECT dept, week, SUM(amt_sold)
  FROM weekly_sales
  GROUP BY dept;

Ask yourself which "week" should be returned in the second case.

--
  Richard Huxton
  Archonet Ltd

Re: field must appear in the GROUP BY clause or be used in an aggregate function?

From

Michael Chaney

Date:

27 February 2004, 14:01:58

On Fri, Feb 27, 2004 at 11:11:28AM -0500, Bill Moran wrote:
> Hey all.
>
> I've hit an SQL problem that I'm a bit mystified by.  I have two different
> questions regarding this problem: why?  and how do I work around it?
>
> The following query:
>
> SELECT     GCP.id,
>     GCP.Name
>     FROM    Gov_Capital_Project GCP,
>     WHERE TLM.TLI_ID = $2
>     group by GCP.id
>     ORDER BY gcp.name;
>
> Produces the following error:
>
> ERROR:  column "gcp.name" must appear in the GROUP BY clause or be used in
> an aggregate function
>
> That field is a CHAR, so I'm not sure what kind of aggregate to use,
> or (more important to my understanding) why one is necessary.
>
> As I said, I'm not sure I understand why this occurs.  I'm assuming that I
> don't understand "group by" as well as I thought I did ;)
>
> This isn't my query, I'm translating a system prototyped in MSSQL to
> Postgres.  This query _does_ work in MSSQL.  Does that constitute a
> bug in MSSQL, or a shortcomming of Postgres, or just a difference of
> interpretation?

Well, if "non-standard" == "bug", then it's a bug in mssql.  Your query
doesn't make any sense.  What value for "Name" should be chosen if
there's more than one?

Michael
--
Michael Darrin Chaney
mdchaney@michaelchaney.com
http://www.michaelchaney.com/

Re: field must appear in the GROUP BY clause or be used

From

Tom Lane

Date:

27 February 2004, 14:09:47

Mike Mascari <mascarm@mascari.com> writes:
> Bill Moran wrote:
>> SELECT     GCP.id,
>> GCP.Name
>> FROM    Gov_Capital_Project GCP,
>> WHERE TLM.TLI_ID = $2
>> group by GCP.id
>> ORDER BY gcp.name;
>>
>> ERROR:  column "gcp.name" must appear in the GROUP BY clause or be used
>> in an aggregate function
>>
>> This isn't my query, I'm translating a system prototyped in MSSQL to
>> Postgres.  This query _does_ work in MSSQL.  Does that constitute a
>> bug in MSSQL, or a shortcomming of Postgres, or just a difference of
>> interpretation?

> If MSSQL picks an arbitrary value for the non-group by attribute, it
> is violating spec.

They might be operating per spec.  If "id" is a primary or unique key
for the table, then SQL99 (but not SQL92) says that it's sufficient to
group by the id column; the database is supposed to realize that the
other columns can't have more than one value per group, and allow direct
references to them.  Or at least that's my interpretation of the pages
and pages in SQL99 about functional dependency.  It seems like a pretty
useless frammish ... if you know that id is unique, why are you
bothering with GROUP BY at all?

Anyway, Postgres currently implements the SQL92 definition, which is
that you can't refer to an ungrouped column except within an aggregate
function call.  So you need to call out all the columns to be referenced
in GROUP BY.

            regards, tom lane

Re: field must appear in the GROUP BY clause or be used

From

Bill Moran

Date:

27 February 2004, 14:19:37

Mike Mascari wrote:
> Bill Moran wrote:
>
>> Hey all.
>>
>> I've hit an SQL problem that I'm a bit mystified by.  I have two
>> different
>> questions regarding this problem: why?  and how do I work around it?
>>
>> The following query:
>>
>> SELECT     GCP.id,
>>     GCP.Name
>>     FROM    Gov_Capital_Project GCP,
>>     WHERE TLM.TLI_ID = $2
>>     group by GCP.id
>>     ORDER BY gcp.name;
>>
>> Produces the following error:
>>
>> ERROR:  column "gcp.name" must appear in the GROUP BY clause or be
>> used in an aggregate function
>
> The reason the grouping requires either an attribute to be aggregated or
> apart of the group by list is that if it were not, an arbitrary value
> would have to be selected:

Thanks to everyone who responded.  All the replies have been very helpful.

Talking with the originator of the SQL statement, I came up with this:

select id, max(name) from gov_capital_project group by id order by name;
ERROR:  column "gov_capital_project.name" must appear in the GROUP BY clause or be used in an aggregate function

I turned that over in my head a little and tried this:
select id, max(name) from gov_capital_project group by id order by MAX(name);

Which finally works!  As far as I understand it, that query will supply the
same results as they were getting from MSSQL on the previous query.

A little more playing around shows that this also works:
select id, max(name) as name from gov_capital_project group by id order by name;

Which will probably be a little faster since MAX() is evaluated less.

Now I'm starting to see (maybe) why the query worked under MSSQL.  the
MSSQL version had:

SELECT id as [ID], max(name) as [Name] from gov_capital_project group by id order by name;

I'm guessing that MSSQL is fuzzy enought to figure that "group by name" actually
means "group by [Name]"?

--
Bill Moran
Potential Technologies
http://www.potentialtech.com

Re: field must appear in the GROUP BY clause or be used

From

Bill Moran

Date:

27 February 2004, 14:42:32

Tom Lane wrote:
> Mike Mascari <mascarm@mascari.com> writes:
>
>>Bill Moran wrote:
>>
>>>SELECT     GCP.id,
>>>GCP.Name
>>>FROM    Gov_Capital_Project GCP,
>>>WHERE TLM.TLI_ID = $2
>>>group by GCP.id
>>>ORDER BY gcp.name;
>>>
>>>ERROR:  column "gcp.name" must appear in the GROUP BY clause or be used
>>>in an aggregate function
>>>
>>>This isn't my query, I'm translating a system prototyped in MSSQL to
>>>Postgres.  This query _does_ work in MSSQL.  Does that constitute a
>>>bug in MSSQL, or a shortcomming of Postgres, or just a difference of
>>>interpretation?
>
>>If MSSQL picks an arbitrary value for the non-group by attribute, it
>>is violating spec.
>
> They might be operating per spec.  If "id" is a primary or unique key
> for the table, then SQL99 (but not SQL92) says that it's sufficient to
> group by the id column; the database is supposed to realize that the
> other columns can't have more than one value per group, and allow direct
> references to them.  Or at least that's my interpretation of the pages
> and pages in SQL99 about functional dependency.  It seems like a pretty
> useless frammish ... if you know that id is unique, why are you
> bothering with GROUP BY at all?

It's possible that you're right about MSSQL, the column in question _is_
unique.  I also had another theory (see other post).

As for why I'm using a GROUP BY: it's not my decision, I'm converting
SQL that someone else wrote, and (honestly) I don't understand the
data well enough to say whether it's required in this query or not.

Also, the _actual_ query that I'm converting here is more complex than
this (it's a join of 5 tables) but in my experimenting/testing, I found
that the query that I had minimized down to had the exact same behaviour.
So I posted the simplified query instead of the actual query, to make it
easier on those who would reply.

If you think it would help with Postgres' development, I'll give you
access to my development machine and the actual query involved.  I'm
sure the client won't mind, since their banking their future on the
reliability of Postgres anyway ;)

> Anyway, Postgres currently implements the SQL92 definition, which is
> that you can't refer to an ungrouped column except within an aggregate
> function call.  So you need to call out all the columns to be referenced
> in GROUP BY.

To me, that seems the most likely explanation (i.e. id is a primary key,
and MSSQL is SQL99 compliant)

--
Bill Moran
Potential Technologies
http://www.potentialtech.com

Re: field must appear in the GROUP BY clause or be used

From

Date:

27 February 2004, 14:53:00

Hey guys

I have dealt with this before.

And there is a simple solution:  If the value really is unique, just wrap it
in a max().  Since it's unique, it has *ZERO* effect on your output, but it
then complies to PostgreSQL's GROUP BY implementation, and hence will run...



Terry Fielder
Manager Software Development and Deployment
Great Gulf Homes / Ashton Woods Homes
terry@greatgulfhomes.com
Fax: (416) 441-9085


> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Tom Lane
> Sent: Friday, February 27, 2004 1:09 PM
> To: Mike Mascari
> Cc: Bill Moran; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] field must appear in the GROUP BY clause or be
> used
>
>
> Mike Mascari <mascarm@mascari.com> writes:
> > Bill Moran wrote:
> >> SELECT     GCP.id,
> >> GCP.Name
> >> FROM    Gov_Capital_Project GCP,
> >> WHERE TLM.TLI_ID = $2
> >> group by GCP.id
> >> ORDER BY gcp.name;
> >>
> >> ERROR:  column "gcp.name" must appear in the GROUP BY
> clause or be used
> >> in an aggregate function
> >>
> >> This isn't my query, I'm translating a system prototyped
> in MSSQL to
> >> Postgres.  This query _does_ work in MSSQL.  Does that constitute a
> >> bug in MSSQL, or a shortcomming of Postgres, or just a
> difference of
> >> interpretation?
>
> > If MSSQL picks an arbitrary value for the non-group by
> attribute, it
> > is violating spec.
>
> They might be operating per spec.  If "id" is a primary or unique key
> for the table, then SQL99 (but not SQL92) says that it's sufficient to
> group by the id column; the database is supposed to realize that the
> other columns can't have more than one value per group, and
> allow direct
> references to them.  Or at least that's my interpretation of the pages
> and pages in SQL99 about functional dependency.  It seems
> like a pretty
> useless frammish ... if you know that id is unique, why are you
> bothering with GROUP BY at all?
>
> Anyway, Postgres currently implements the SQL92 definition, which is
> that you can't refer to an ungrouped column except within an aggregate
> function call.  So you need to call out all the columns to be
> referenced
> in GROUP BY.
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

Re: field must appear in the GROUP BY clause or be used

From

Tom Lane

Date:

27 February 2004, 18:15:02

[ drifting a bit off the thread topic, but just for completeness... ]

Bill Moran <wmoran@potentialtech.com> writes:
> I turned that over in my head a little and tried this:
> select id, max(name) from gov_capital_project group by id order by MAX(name);
> ...
> A little more playing around shows that this also works:
> select id, max(name) as name from gov_capital_project group by id order by name;

> Which will probably be a little faster since MAX() is evaluated less.

Actually I believe you'll get the exact same plan either way.  GROUP and
ORDER BY expressions are merged with any matching SELECT-list entries
during parsing.

In fact, as of (I think) 7.4, the executor detects and eliminates
duplicate aggregate-function calls even when the parser didn't.
So for instance this:
    SELECT max(x), max(x) + 1 FROM ...
will only run the MAX() aggregate once.

            regards, tom lane