Thread: field must appear in the GROUP BY clause or be used in an aggregate function?
Hey all. I've hit an SQL problem that I'm a bit mystified by. I have two different questions regarding this problem: why? and how do I work around it? The following query: SELECT GCP.id, GCP.Name FROM Gov_Capital_Project GCP, WHERE TLM.TLI_ID = $2 group by GCP.id ORDER BY gcp.name; Produces the following error: ERROR: column "gcp.name" must appear in the GROUP BY clause or be used in an aggregate function That field is a CHAR, so I'm not sure what kind of aggregate to use, or (more important to my understanding) why one is necessary. As I said, I'm not sure I understand why this occurs. I'm assuming that I don't understand "group by" as well as I thought I did ;) This isn't my query, I'm translating a system prototyped in MSSQL to Postgres. This query _does_ work in MSSQL. Does that constitute a bug in MSSQL, or a shortcomming of Postgres, or just a difference of interpretation? -- Bill Moran Potential Technologies http://www.potentialtech.com
Bill Moran said: > I've hit an SQL problem that I'm a bit mystified by. I have two different > questions regarding this problem: why? and how do I work around it? > > The following query: > > SELECT GCP.id, > GCP.Name > FROM Gov_Capital_Project GCP, > WHERE TLM.TLI_ID = $2 > group by GCP.id > ORDER BY gcp.name; > > Produces the following error: > > ERROR: column "gcp.name" must appear in the GROUP BY clause or be used in > an aggregate function Since you're not agregating data, can't you use a select distinct instead? SELECT distinct GCP.id, GCP.Name FROM Gov_Capital_Project GCP, {?something missing here?} WHERE TLM.TLI_ID = $2 ORDER BY gcp.name; (BTW, I wasn't clear if the where clause trying to join to another table?) Doesn't answer your original question, but hope it helps anyway. John Sidney-Woollett
John Sidney-Woollett wrote: > Bill Moran said: > >>I've hit an SQL problem that I'm a bit mystified by. I have two different >>questions regarding this problem: why? and how do I work around it? >> >>The following query: >> >>SELECT GCP.id, >> GCP.Name >> FROM Gov_Capital_Project GCP, >> WHERE TLM.TLI_ID = $2 >> group by GCP.id >> ORDER BY gcp.name; >> >>Produces the following error: >> >>ERROR: column "gcp.name" must appear in the GROUP BY clause or be used in >>an aggregate function > > Since you're not agregating data, can't you use a select distinct instead? Not sure. I'll have to get back to the programmer who wrote the orignal SELECT and find out what kind of data he is actually trying to acquire. > SELECT distinct GCP.id, GCP.Name > FROM Gov_Capital_Project GCP, {?something missing here?} > WHERE TLM.TLI_ID = $2 > ORDER BY gcp.name; > > (BTW, I wasn't clear if the where clause trying to join to another table?) Yes, my bad. The actual query causing the problem is a bit longer with about 6 joins to it. I did test: select id, name from gov_capital_project group by id order by name; and it causes the same error, so I thought I'd make the question simpler by removing the parts that obviously weren't contributing to the problem. > Doesn't answer your original question, but hope it helps anyway. It may, thanks for the input! Like I said, the most important part (to me) is to understand why Postgres refuses to run this. The fact that I don't know why points to an obvious lack of understanding on my account, and I'd like to remedy that :D To that effect, if anyone can point me to a doc that will help me gain a better understanding of why this error occurs, I'd be happy to read it! -- Bill Moran Potential Technologies http://www.potentialtech.com
Bill Moran wrote: > Hey all. > > I've hit an SQL problem that I'm a bit mystified by. I have two different > questions regarding this problem: why? and how do I work around it? > > The following query: > > SELECT GCP.id, > GCP.Name > FROM Gov_Capital_Project GCP, > WHERE TLM.TLI_ID = $2 > group by GCP.id > ORDER BY gcp.name; > > Produces the following error: > > ERROR: column "gcp.name" must appear in the GROUP BY clause or be used > in an aggregate function The reason the grouping requires either an attribute to be aggregated or apart of the group by list is that if it were not, an arbitrary value would have to be selected: [test@lexus] select * from projects; dept | project -----------+-------------- Finance | Y2K Corporate | Y2K Corporate | Annual Audit (3 rows) [test@lexus] select dept, project from projects group by dept; ERROR: column "projects.project" must appear in the GROUP BY clause or be used in an aggregate function If this were to be permitted, which project should be selected, 'Y2K' or 'Annual Audit'? [test@lexus] select dept, project from projects group by dept, project; dept | project -----------+-------------- Corporate | Y2K Corporate | Annual Audit Finance | Y2K (3 rows) Of course, this has little meaning without an aggregate. All you're doing is leveraging GROUP BY's sort. You might as well use DISTINCT. More useful would be: [test@lexus] select dept, count(project) from projects group by dept; dept | count -----------+------- Finance | 1 Corporate | 2 (2 rows) or perhaps: [test@lexus] select count(dept), project from projects group by project; count | project -------+-------------- 2 | Y2K 1 | Annual Audit > This isn't my query, I'm translating a system prototyped in MSSQL to > Postgres. This query _does_ work in MSSQL. Does that constitute a > bug in MSSQL, or a shortcomming of Postgres, or just a difference of > interpretation? If MSSQL picks an arbitrary value for the non-group by attribute, it is violating spec. Mike Mascari
Bill Moran said: > Like I said, the most important part (to me) is to understand why > Postgres refuses to run this. The fact that I don't know why points > to an obvious lack of understanding on my account, and I'd like to > remedy that :D I have always assumed that you had to place all (non aggregated) columns in your select in the "group by" clause as well. I suspect that the other database isn't so picky (or is incorrect?). Presumably changing the query to: select id, name from gov_capital_project group by id, name order by name; works fine? > To that effect, if anyone can point me to a doc that will help me > gain a better understanding of why this error occurs, I'd be happy > to read it! Have a look at: http://www.postgresql.org/docs/7.4/static/sql-select.html#SQL-GROUPBY [excerpted text] GROUP BY Clause The optional GROUP BY clause has the general form GROUP BY expression [, ...] GROUP BY will condense into a single row all selected rows that share the same values for the grouped expressions. expression can be an input column name, or the name or ordinal number of an output column (SELECT list item), or an arbitrary expression formed from input-column values. In case of ambiguity, a GROUP BY name will be interpreted as an input-column name rather than an output column name. Aggregate functions, if any are used, are computed across all rows making up each group, producing a separate value for each group (whereas without GROUP BY, an aggregate produces a single value computed across all the selected rows). When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions, since there would be more than one possible value to return for an ungrouped column. John Sidney-Woollett
On Fri, 27 Feb 2004, Bill Moran wrote: > Hey all. > > I've hit an SQL problem that I'm a bit mystified by. I have two different > questions regarding this problem: why? and how do I work around it? > > The following query: > > SELECT GCP.id, > GCP.Name > FROM Gov_Capital_Project GCP, > WHERE TLM.TLI_ID = $2 > group by GCP.id > ORDER BY gcp.name; > > Produces the following error: > > ERROR: column "gcp.name" must appear in the GROUP BY clause or be used in an aggregate function OK, let's look at a test table: id | data --------- 0 | 'abc' 0 | 'def' 1 | 'ghi' Now, let's use this query: select id, data from test_table group by id; what results should I get back? I have two possible results for the data column, abc and def. But I only get one row with a 0 in it, so which one of those do I pick? If I use an aggregate I can be sure to get the first or last one: select id, max(data) from test_table group by id; Also, you may want to look at postgresql's extension, "distinct on": http://www.postgresql.org/docs/7.4/static/queries-select-lists.html#QUERIES-DISTINCT It can give you the kind of results you want. select distinct on (id) id, data from test_table; But is know to be indeterminate, so you may get different results each time.
On Friday 27 February 2004 16:39, Bill Moran wrote: > John Sidney-Woollett wrote: > > Bill Moran said: > >> > >>SELECT GCP.id, > >> GCP.Name > >> FROM Gov_Capital_Project GCP, > >> WHERE TLM.TLI_ID = $2 > >> group by GCP.id > >> ORDER BY gcp.name; > >>ERROR: column "gcp.name" must appear in the GROUP BY clause or be used > >> in an aggregate function > Like I said, the most important part (to me) is to understand why > Postgres refuses to run this. The fact that I don't know why points > to an obvious lack of understanding on my account, and I'd like to > remedy that :D Like the error message says, if you're using GROUP BY everything in the SELECT list must be an aggregate SUM(...) or used in the GROUP BY. So, this is OK: SELECT dept, week, SUM(amt_sold) FROM weekly_sales GROUP BY dept,week; This isn't: SELECT dept, week, SUM(amt_sold) FROM weekly_sales GROUP BY dept; Ask yourself which "week" should be returned in the second case. -- Richard Huxton Archonet Ltd
Re: field must appear in the GROUP BY clause or be used in an aggregate function?
From
Michael Chaney
Date:
On Fri, Feb 27, 2004 at 11:11:28AM -0500, Bill Moran wrote: > Hey all. > > I've hit an SQL problem that I'm a bit mystified by. I have two different > questions regarding this problem: why? and how do I work around it? > > The following query: > > SELECT GCP.id, > GCP.Name > FROM Gov_Capital_Project GCP, > WHERE TLM.TLI_ID = $2 > group by GCP.id > ORDER BY gcp.name; > > Produces the following error: > > ERROR: column "gcp.name" must appear in the GROUP BY clause or be used in > an aggregate function > > That field is a CHAR, so I'm not sure what kind of aggregate to use, > or (more important to my understanding) why one is necessary. > > As I said, I'm not sure I understand why this occurs. I'm assuming that I > don't understand "group by" as well as I thought I did ;) > > This isn't my query, I'm translating a system prototyped in MSSQL to > Postgres. This query _does_ work in MSSQL. Does that constitute a > bug in MSSQL, or a shortcomming of Postgres, or just a difference of > interpretation? Well, if "non-standard" == "bug", then it's a bug in mssql. Your query doesn't make any sense. What value for "Name" should be chosen if there's more than one? Michael -- Michael Darrin Chaney mdchaney@michaelchaney.com http://www.michaelchaney.com/
Mike Mascari <mascarm@mascari.com> writes: > Bill Moran wrote: >> SELECT GCP.id, >> GCP.Name >> FROM Gov_Capital_Project GCP, >> WHERE TLM.TLI_ID = $2 >> group by GCP.id >> ORDER BY gcp.name; >> >> ERROR: column "gcp.name" must appear in the GROUP BY clause or be used >> in an aggregate function >> >> This isn't my query, I'm translating a system prototyped in MSSQL to >> Postgres. This query _does_ work in MSSQL. Does that constitute a >> bug in MSSQL, or a shortcomming of Postgres, or just a difference of >> interpretation? > If MSSQL picks an arbitrary value for the non-group by attribute, it > is violating spec. They might be operating per spec. If "id" is a primary or unique key for the table, then SQL99 (but not SQL92) says that it's sufficient to group by the id column; the database is supposed to realize that the other columns can't have more than one value per group, and allow direct references to them. Or at least that's my interpretation of the pages and pages in SQL99 about functional dependency. It seems like a pretty useless frammish ... if you know that id is unique, why are you bothering with GROUP BY at all? Anyway, Postgres currently implements the SQL92 definition, which is that you can't refer to an ungrouped column except within an aggregate function call. So you need to call out all the columns to be referenced in GROUP BY. regards, tom lane
Mike Mascari wrote: > Bill Moran wrote: > >> Hey all. >> >> I've hit an SQL problem that I'm a bit mystified by. I have two >> different >> questions regarding this problem: why? and how do I work around it? >> >> The following query: >> >> SELECT GCP.id, >> GCP.Name >> FROM Gov_Capital_Project GCP, >> WHERE TLM.TLI_ID = $2 >> group by GCP.id >> ORDER BY gcp.name; >> >> Produces the following error: >> >> ERROR: column "gcp.name" must appear in the GROUP BY clause or be >> used in an aggregate function > > The reason the grouping requires either an attribute to be aggregated or > apart of the group by list is that if it were not, an arbitrary value > would have to be selected: Thanks to everyone who responded. All the replies have been very helpful. Talking with the originator of the SQL statement, I came up with this: select id, max(name) from gov_capital_project group by id order by name; ERROR: column "gov_capital_project.name" must appear in the GROUP BY clause or be used in an aggregate function I turned that over in my head a little and tried this: select id, max(name) from gov_capital_project group by id order by MAX(name); Which finally works! As far as I understand it, that query will supply the same results as they were getting from MSSQL on the previous query. A little more playing around shows that this also works: select id, max(name) as name from gov_capital_project group by id order by name; Which will probably be a little faster since MAX() is evaluated less. Now I'm starting to see (maybe) why the query worked under MSSQL. the MSSQL version had: SELECT id as [ID], max(name) as [Name] from gov_capital_project group by id order by name; I'm guessing that MSSQL is fuzzy enought to figure that "group by name" actually means "group by [Name]"? -- Bill Moran Potential Technologies http://www.potentialtech.com
Tom Lane wrote: > Mike Mascari <mascarm@mascari.com> writes: > >>Bill Moran wrote: >> >>>SELECT GCP.id, >>>GCP.Name >>>FROM Gov_Capital_Project GCP, >>>WHERE TLM.TLI_ID = $2 >>>group by GCP.id >>>ORDER BY gcp.name; >>> >>>ERROR: column "gcp.name" must appear in the GROUP BY clause or be used >>>in an aggregate function >>> >>>This isn't my query, I'm translating a system prototyped in MSSQL to >>>Postgres. This query _does_ work in MSSQL. Does that constitute a >>>bug in MSSQL, or a shortcomming of Postgres, or just a difference of >>>interpretation? > >>If MSSQL picks an arbitrary value for the non-group by attribute, it >>is violating spec. > > They might be operating per spec. If "id" is a primary or unique key > for the table, then SQL99 (but not SQL92) says that it's sufficient to > group by the id column; the database is supposed to realize that the > other columns can't have more than one value per group, and allow direct > references to them. Or at least that's my interpretation of the pages > and pages in SQL99 about functional dependency. It seems like a pretty > useless frammish ... if you know that id is unique, why are you > bothering with GROUP BY at all? It's possible that you're right about MSSQL, the column in question _is_ unique. I also had another theory (see other post). As for why I'm using a GROUP BY: it's not my decision, I'm converting SQL that someone else wrote, and (honestly) I don't understand the data well enough to say whether it's required in this query or not. Also, the _actual_ query that I'm converting here is more complex than this (it's a join of 5 tables) but in my experimenting/testing, I found that the query that I had minimized down to had the exact same behaviour. So I posted the simplified query instead of the actual query, to make it easier on those who would reply. If you think it would help with Postgres' development, I'll give you access to my development machine and the actual query involved. I'm sure the client won't mind, since their banking their future on the reliability of Postgres anyway ;) > Anyway, Postgres currently implements the SQL92 definition, which is > that you can't refer to an ungrouped column except within an aggregate > function call. So you need to call out all the columns to be referenced > in GROUP BY. To me, that seems the most likely explanation (i.e. id is a primary key, and MSSQL is SQL99 compliant) -- Bill Moran Potential Technologies http://www.potentialtech.com
Hey guys I have dealt with this before. And there is a simple solution: If the value really is unique, just wrap it in a max(). Since it's unique, it has *ZERO* effect on your output, but it then complies to PostgreSQL's GROUP BY implementation, and hence will run... Terry Fielder Manager Software Development and Deployment Great Gulf Homes / Ashton Woods Homes terry@greatgulfhomes.com Fax: (416) 441-9085 > -----Original Message----- > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general-owner@postgresql.org]On Behalf Of Tom Lane > Sent: Friday, February 27, 2004 1:09 PM > To: Mike Mascari > Cc: Bill Moran; pgsql-general@postgresql.org > Subject: Re: [GENERAL] field must appear in the GROUP BY clause or be > used > > > Mike Mascari <mascarm@mascari.com> writes: > > Bill Moran wrote: > >> SELECT GCP.id, > >> GCP.Name > >> FROM Gov_Capital_Project GCP, > >> WHERE TLM.TLI_ID = $2 > >> group by GCP.id > >> ORDER BY gcp.name; > >> > >> ERROR: column "gcp.name" must appear in the GROUP BY > clause or be used > >> in an aggregate function > >> > >> This isn't my query, I'm translating a system prototyped > in MSSQL to > >> Postgres. This query _does_ work in MSSQL. Does that constitute a > >> bug in MSSQL, or a shortcomming of Postgres, or just a > difference of > >> interpretation? > > > If MSSQL picks an arbitrary value for the non-group by > attribute, it > > is violating spec. > > They might be operating per spec. If "id" is a primary or unique key > for the table, then SQL99 (but not SQL92) says that it's sufficient to > group by the id column; the database is supposed to realize that the > other columns can't have more than one value per group, and > allow direct > references to them. Or at least that's my interpretation of the pages > and pages in SQL99 about functional dependency. It seems > like a pretty > useless frammish ... if you know that id is unique, why are you > bothering with GROUP BY at all? > > Anyway, Postgres currently implements the SQL92 definition, which is > that you can't refer to an ungrouped column except within an aggregate > function call. So you need to call out all the columns to be > referenced > in GROUP BY. > > regards, tom lane > > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster >
[ drifting a bit off the thread topic, but just for completeness... ] Bill Moran <wmoran@potentialtech.com> writes: > I turned that over in my head a little and tried this: > select id, max(name) from gov_capital_project group by id order by MAX(name); > ... > A little more playing around shows that this also works: > select id, max(name) as name from gov_capital_project group by id order by name; > Which will probably be a little faster since MAX() is evaluated less. Actually I believe you'll get the exact same plan either way. GROUP and ORDER BY expressions are merged with any matching SELECT-list entries during parsing. In fact, as of (I think) 7.4, the executor detects and eliminates duplicate aggregate-function calls even when the parser didn't. So for instance this: SELECT max(x), max(x) + 1 FROM ... will only run the MAX() aggregate once. regards, tom lane