Thread: must appear in the GROUP BY clause or be used in an aggregate function problem
must appear in the GROUP BY clause or be used in an aggregate function problem
Attempting to get a list of items that we want to be grouped by id and date, with a detail column. Table a is a typical user table; id, first and last name are all that I am using Table b is a tracking table, combining data from 8+ tables into one for ease of use for reporting purposes; id, userid, company and invoice_number are all that I am using Table c is an exception table; it has relations with the other 2 by both the tracking and user ids but all I need from it is the exception date So the data required is the users first and last name, the company, the invoice number and the exception date. And the expected structure for the report is: User Name: Company: Date: Invoice Number For those invoices that are in the exception table. The sql I can get to work is: SELECT MAX(a.name_first) AS name_first, MAX(a.name_last) AS name_last, b.company, MAX(c.report_date) AS rDate, b.invoicenum FROM resources a JOIN tracking b ON (a.id=b.resource_id)JOIN except_detail c ON (b. id = b.tracking_id) WHERE b.region = NE' AND b.state = 1 GROUP BY a.id, b. company, b.invoicenum ORDER BY name_last, name_first, b.role_name, rDate And in most cases this works fine. The problem arises when invoices get added to the exception table due to their not being an invoice number. Even though we join on the tracking id, the group by on invoicenum lumps the different blank invoices into a single line, if the same user has more than 1. What we want is for each of the individual blank invoicenum entries to have a separate line in the result. If I remove b.invoicenum from the group by then I get the error in the subject line. If I try to use an aggregate function (like I used MAX on the names) it's even worse. MAX works on the names because they are all the same. MAX on the date doesn't seem to effect the results that I can see other than if an invoice went into exception more than once, and in that case we only want the latest one anyway. Any hints as to how to get this to not lump all of the empty invoicenums for a user into a single line? And I have mentioned putting in a dummy value like the date for an invoicenum, but that works as well as I expected it would (not at all). Edward W. Rouse
Re: must appear in the GROUP BY clause or be used in an aggregate function problem
-----Original Message----- From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Edward W. Rouse Sent: Tuesday, January 31, 2012 3:27 PM To: pgsql-sql@postgresql.org Subject: [SQL] must appear in the GROUP BY clause or be used in an aggregate function problem Attempting to get a list of items that we want to be grouped by id and date, with a detail column. Table a is a typical user table; id, first and last name are all that I am using Table b is a tracking table, combining data from 8+ tables into one for ease of use for reporting purposes; id, userid, company and invoice_number are all that I am using Table c is an exception table; it has relations with the other 2 by both the tracking and user ids but all I need from it is the exception date So the data required is the users first and last name, the company, the invoice number and the exception date. And the expected structure for the report is: User Name: Company: Date: Invoice Number For those invoices that are in the exception table. The sql I can get to work is: SELECT MAX(a.name_first) AS name_first, MAX(a.name_last) AS name_last, b.company, MAX(c.report_date) AS rDate, b.invoicenum FROM resources a JOIN tracking b ON (a.id=b.resource_id)JOIN except_detail c ON (b. id = b.tracking_id) WHERE b.region = NE' AND b.state = 1 GROUP BY a.id, b. company, b.invoicenum ORDER BY name_last, name_first, b.role_name, rDate And in most cases this works fine. The problem arises when invoices get added to the exception table due to their not being an invoice number. Even though we join on the tracking id, the group by on invoicenum lumps the different blank invoices into a single line, if the same user has more than 1. What we want is for each of the individual blank invoicenum entries to have a separate line in the result. If I remove b.invoicenum from the group by then I get the error in the subject line. If I try to use an aggregate function (like I used MAX on the names) it's even worse. MAX works on the names because they are all the same. MAX on the date doesn't seem to effect the results that I can see other than if an invoice went into exception more than once, and in that case we only want the latest one anyway. Any hints as to how to get this to not lump all of the empty invoicenums for a user into a single line? And I have mentioned putting in a dummy value like the date for an invoicenum, but that works as well as I expected it would (not at all). Edward W. Rouse ---------------------------------------------------------------------------- --------------- Edward, I would suggest first processing all exceptions EXCEPT "missing invoice number" in one query, then constructing a second query that deals only with missing invoice numbers, and finally UNION-ing the two results. Also, I dislike the "MAX(varchar)" semantics. You should be able to write the query without it but without more details I cannot readily specify how. Generally you'd simply operate off of "id" and then join in the corresponding first/last names at the outer-most layer of the query. Since you are grouping on "a.id" anyway I would suggest you try it. David J.
Re: must appear in the GROUP BY clause or be used in an aggregate function problem
And in most cases this works fine. The problem arises when invoices get
added to the exception table due to their not being an invoice number. Even
though we join on the tracking id, the group by on invoicenum lumps the
different blank invoices into a single line, if the same user has more than
1. What we want is for each of the individual blank invoicenum entries to
have a separate line in the result. If I remove b.invoicenum from the group
by then I get the error in the subject line. If I try to use an aggregate
function (like I used MAX on the names) it's even worse. MAX works on the
names because they are all the same. MAX on the date doesn't seem to effect
the results that I can see other than if an invoice went into exception more
than once, and in that case we only want the latest one anyway.
Any hints as to how to get this to not lump all of the empty invoicenums for
a user into a single line? And I have mentioned putting in a dummy value
like the date for an invoicenum, but that works as well as I expected it
would (not at all).
Re: must appear in the GROUP BY clause or be used in an aggregate function problem
I would love to remove most of the aggregate functions. I am trying to update an existing query to provide better data and started with the existing one (which is where the MAX for the names came from). I originally tried to remove those, but I get the group by/aggregate function error if I do. I guess I don’t understand enough to know why these are being forced into that category. I have written queries in the past that had several columns that were not considered requiring aggregation. Can someone enlighten me on why these are being marked as requiring group/aggregation.
Something with the way the table are joined perhaps? Can I remove the joins and put the a.id = c.id in the where clause and get rid of this?
From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Samuel Gendler
Sent: Tuesday, January 31, 2012 3:57 PM
To: Edward W. Rouse
Cc: pgsql-sql@postgresql.org
Subject: Re: [SQL] must appear in the GROUP BY clause or be used in an aggregate function problem
On Tue, Jan 31, 2012 at 12:27 PM, Edward W. Rouse <erouse@comsquared.com> wrote:
And in most cases this works fine. The problem arises when invoices get
added to the exception table due to their not being an invoice number. Even
though we join on the tracking id, the group by on invoicenum lumps the
different blank invoices into a single line, if the same user has more than
1. What we want is for each of the individual blank invoicenum entries to
have a separate line in the result. If I remove b.invoicenum from the group
by then I get the error in the subject line. If I try to use an aggregate
function (like I used MAX on the names) it's even worse. MAX works on the
names because they are all the same. MAX on the date doesn't seem to effect
the results that I can see other than if an invoice went into exception more
than once, and in that case we only want the latest one anyway.
Any hints as to how to get this to not lump all of the empty invoicenums for
a user into a single line? And I have mentioned putting in a dummy value
like the date for an invoicenum, but that works as well as I expected it
would (not at all).
It seems like the use of the aggregation functions in your example aren't absolutely necessary - though perhaps the date one is. Best solution would be to get rid of the aggregate columns so that you don't have this issue.
If that isn't possible, there are definitely some kludges you can use to get it to work if you really don't want to union 2 queries together - one with and one without valid invoice ids. Assuming invoice ids are generated via a sequence, you can do the following to guarantee a unique 'id' for each empty invoice - at the expense of incrementing your sequence unnecessarily:
COALESCE(invoiceid, nextval(invoiceid_sequence))
A better version of that kludge would be to create a sequence just for this purpose and set it to a very negative number. All of your generated fake ids will then be negative numbers (so easily identified by whatever is consuming the query results) and you can reset the sequence back to the most negative possible value whenever you get concerned about running out of ids, since you won't be using the same sequence as the invoice table itself.
There are probably lots of variations on that general concept. You can write a window function which will remember the ids already seen/generated for each row and just generate an arbitrary id to be used in place of null for grouping, though you'd have to worry about using an id that has not yet appeared but which later does appear. Assuming you can create a bucket of space large enough that is guaranteed to not conflict with valid invoice ids, you could make it work. I don't know if you can have select max(x), invoiceid group by func(invoiceid), so you may have to structure it as "select max(x), func(invoiceid) group by 2" which would require that your query results can deal with faked invoiceids - negative numbers again providing a possible way to identify them.
Doesn't postgres allow operator overloading? Perhaps you could override the equality operator for that type so that just a single value (the max value, for example) always compares as unequal to itself and then use COALESCE(invoiceid, 9223372036854775807). All rows without a valid value will have that value, but the overloaded equality operator will cause them to not group together (I'm guessing. I haven't tried this). That one makes me nervous because of the potential for nasty side effects should something legitimately have that value, but one could probably make the case for having a bigger problem if a column has a value equal to max bigint.
Restructuring the query to separate valid invoice ids from invalid and/or getting rid of the aggregation does seem like the best solution, though.
Re: must appear in the GROUP BY clause or be used in an aggregate function problem
NM, I figured it out. The mere presence of an aggregate function and/or the group by clause is what’s causing all the hate for me. I will take a whack at getting this to work without them. Thanks all.
From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Edward W. Rouse
Sent: Tuesday, January 31, 2012 4:27 PM
To: 'Samuel Gendler'
Cc: pgsql-sql@postgresql.org
Subject: Re: [SQL] must appear in the GROUP BY clause or be used in an aggregate function problem
I would love to remove most of the aggregate functions. I am trying to update an existing query to provide better data and started with the existing one (which is where the MAX for the names came from). I originally tried to remove those, but I get the group by/aggregate function error if I do. I guess I don’t understand enough to know why these are being forced into that category. I have written queries in the past that had several columns that were not considered requiring aggregation. Can someone enlighten me on why these are being marked as requiring group/aggregation.
Something with the way the table are joined perhaps? Can I remove the joins and put the a.id = c.id in the where clause and get rid of this?
From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Samuel Gendler
Sent: Tuesday, January 31, 2012 3:57 PM
To: Edward W. Rouse
Cc: pgsql-sql@postgresql.org
Subject: Re: [SQL] must appear in the GROUP BY clause or be used in an aggregate function problem
On Tue, Jan 31, 2012 at 12:27 PM, Edward W. Rouse <erouse@comsquared.com> wrote:
And in most cases this works fine. The problem arises when invoices get
added to the exception table due to their not being an invoice number. Even
though we join on the tracking id, the group by on invoicenum lumps the
different blank invoices into a single line, if the same user has more than
1. What we want is for each of the individual blank invoicenum entries to
have a separate line in the result. If I remove b.invoicenum from the group
by then I get the error in the subject line. If I try to use an aggregate
function (like I used MAX on the names) it's even worse. MAX works on the
names because they are all the same. MAX on the date doesn't seem to effect
the results that I can see other than if an invoice went into exception more
than once, and in that case we only want the latest one anyway.
Any hints as to how to get this to not lump all of the empty invoicenums for
a user into a single line? And I have mentioned putting in a dummy value
like the date for an invoicenum, but that works as well as I expected it
would (not at all).
It seems like the use of the aggregation functions in your example aren't absolutely necessary - though perhaps the date one is. Best solution would be to get rid of the aggregate columns so that you don't have this issue.
If that isn't possible, there are definitely some kludges you can use to get it to work if you really don't want to union 2 queries together - one with and one without valid invoice ids. Assuming invoice ids are generated via a sequence, you can do the following to guarantee a unique 'id' for each empty invoice - at the expense of incrementing your sequence unnecessarily:
COALESCE(invoiceid, nextval(invoiceid_sequence))
A better version of that kludge would be to create a sequence just for this purpose and set it to a very negative number. All of your generated fake ids will then be negative numbers (so easily identified by whatever is consuming the query results) and you can reset the sequence back to the most negative possible value whenever you get concerned about running out of ids, since you won't be using the same sequence as the invoice table itself.
There are probably lots of variations on that general concept. You can write a window function which will remember the ids already seen/generated for each row and just generate an arbitrary id to be used in place of null for grouping, though you'd have to worry about using an id that has not yet appeared but which later does appear. Assuming you can create a bucket of space large enough that is guaranteed to not conflict with valid invoice ids, you could make it work. I don't know if you can have select max(x), invoiceid group by func(invoiceid), so you may have to structure it as "select max(x), func(invoiceid) group by 2" which would require that your query results can deal with faked invoiceids - negative numbers again providing a possible way to identify them.
Doesn't postgres allow operator overloading? Perhaps you could override the equality operator for that type so that just a single value (the max value, for example) always compares as unequal to itself and then use COALESCE(invoiceid, 9223372036854775807). All rows without a valid value will have that value, but the overloaded equality operator will cause them to not group together (I'm guessing. I haven't tried this). That one makes me nervous because of the potential for nasty side effects should something legitimately have that value, but one could probably make the case for having a bigger problem if a column has a value equal to max bigint.
Restructuring the query to separate valid invoice ids from invalid and/or getting rid of the aggregation does seem like the best solution, though.
Re: must appear in the GROUP BY clause or be used in an aggregate function problem
NM, I figured it out. The mere presence of an aggregate function and/or the group by clause is what’s causing all the hate for me. I will take a whack at getting this to work without them. Thanks all.
From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Edward W. Rouse
Sent: Tuesday, January 31, 2012 4:27 PM
To: 'Samuel Gendler'
Cc: pgsql-sql@postgresql.org
Subject: Re: [SQL] must appear in the GROUP BY clause or be used in an aggregate function problem
I would love to remove most of the aggregate functions. I am trying to update an existing query to provide better data and started with the existing one (which is where the MAX for the names came from). I originally tried to remove those, but I get the group by/aggregate function error if I do. I guess I don’t understand enough to know why these are being forced into that category. I have written queries in the past that had several columns that were not considered requiring aggregation. Can someone enlighten me on why these are being marked as requiring group/aggregation.
Something with the way the table are joined perhaps? Can I remove the joins and put the a.id = c.id in the where clause and get rid of this?
From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Samuel Gendler
Sent: Tuesday, January 31, 2012 3:57 PM
To: Edward W. Rouse
Cc: pgsql-sql@postgresql.org
Subject: Re: [SQL] must appear in the GROUP BY clause or be used in an aggregate function problem
On Tue, Jan 31, 2012 at 12:27 PM, Edward W. Rouse <erouse@comsquared.com> wrote:
And in most cases this works fine. The problem arises when invoices get
added to the exception table due to their not being an invoice number. Even
though we join on the tracking id, the group by on invoicenum lumps the
different blank invoices into a single line, if the same user has more than
1. What we want is for each of the individual blank invoicenum entries to
have a separate line in the result. If I remove b.invoicenum from the group
by then I get the error in the subject line. If I try to use an aggregate
function (like I used MAX on the names) it's even worse. MAX works on the
names because they are all the same. MAX on the date doesn't seem to effect
the results that I can see other than if an invoice went into exception more
than once, and in that case we only want the latest one anyway.
Any hints as to how to get this to not lump all of the empty invoicenums for
a user into a single line? And I have mentioned putting in a dummy value
like the date for an invoicenum, but that works as well as I expected it
would (not at all).
It seems like the use of the aggregation functions in your example aren't absolutely necessary - though perhaps the date one is. Best solution would be to get rid of the aggregate columns so that you don't have this issue.
If that isn't possible, there are definitely some kludges you can use to get it to work if you really don't want to union 2 queries together - one with and one without valid invoice ids. Assuming invoice ids are generated via a sequence, you can do the following to guarantee a unique 'id' for each empty invoice - at the expense of incrementing your sequence unnecessarily:
COALESCE(invoiceid, nextval(invoiceid_sequence))
A better version of that kludge would be to create a sequence just for this purpose and set it to a very negative number. All of your generated fake ids will then be negative numbers (so easily identified by whatever is consuming the query results) and you can reset the sequence back to the most negative possible value whenever you get concerned about running out of ids, since you won't be using the same sequence as the invoice table itself.
There are probably lots of variations on that general concept. You can write a window function which will remember the ids already seen/generated for each row and just generate an arbitrary id to be used in place of null for grouping, though you'd have to worry about using an id that has not yet appeared but which later does appear. Assuming you can create a bucket of space large enough that is guaranteed to not conflict with valid invoice ids, you could make it work. I don't know if you can have select max(x), invoiceid group by func(invoiceid), so you may have to structure it as "select max(x), func(invoiceid) group by 2" which would require that your query results can deal with faked invoiceids - negative numbers again providing a possible way to identify them.
Doesn't postgres allow operator overloading? Perhaps you could override the equality operator for that type so that just a single value (the max value, for example) always compares as unequal to itself and then use COALESCE(invoiceid, 9223372036854775807). All rows without a valid value will have that value, but the overloaded equality operator will cause them to not group together (I'm guessing. I haven't tried this). That one makes me nervous because of the potential for nasty side effects should something legitimately have that value, but one could probably make the case for having a bigger problem if a column has a value equal to max bigint.
Restructuring the query to separate valid invoice ids from invalid and/or getting rid of the aggregation does seem like the best solution, though.
Re: must appear in the GROUP BY clause or be used in an aggregate function problem
Yeah, that’s what I did. No more group by, the MAX is in a subquery. One of the reasons I hate coming behind someone else and updating their stuff is that too many people try to get cute with the code. I try to make everything as plain and simple as possible unless performance issues require otherwise. My code is boring, but easy to understand and maintain ;)
From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Samuel Gendler
Sent: Tuesday, January 31, 2012 4:52 PM
To: Edward W. Rouse
Cc: pgsql-sql@postgresql.org
Subject: Re: [SQL] must appear in the GROUP BY clause or be used in an aggregate function problem
On Tue, Jan 31, 2012 at 1:43 PM, Edward W. Rouse <erouse@comsquared.com> wrote:
NM, I figured it out. The mere presence of an aggregate function and/or the group by clause is what’s causing all the hate for me. I will take a whack at getting this to work without them. Thanks all.
All columns that are not in an aggregate function MUST be in the group by clause if there is any column in an aggregate function. If you can get rid of all aggregation, then you won't have to have the group by, either. You could use a correlated subquery to get the most recent report date, rather than using max. That would allow you to ditch all of the other aggregation, I suspect. But unless there is any chance of different versions of the name fields for a given id, then it is harmless to drop the MAX() function call and add them to the group by clause. You get the same effect.
From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Edward W. Rouse
Sent: Tuesday, January 31, 2012 4:27 PM
To: 'Samuel Gendler'
Cc: pgsql-sql@postgresql.org
Subject: Re: [SQL] must appear in the GROUP BY clause or be used in an aggregate function problem
I would love to remove most of the aggregate functions. I am trying to update an existing query to provide better data and started with the existing one (which is where the MAX for the names came from). I originally tried to remove those, but I get the group by/aggregate function error if I do. I guess I don’t understand enough to know why these are being forced into that category. I have written queries in the past that had several columns that were not considered requiring aggregation. Can someone enlighten me on why these are being marked as requiring group/aggregation.
Something with the way the table are joined perhaps? Can I remove the joins and put the a.id = c.id in the where clause and get rid of this?
From: pgsql-sql-owner@postgresql.org [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Samuel Gendler
Sent: Tuesday, January 31, 2012 3:57 PM
To: Edward W. Rouse
Cc: pgsql-sql@postgresql.org
Subject: Re: [SQL] must appear in the GROUP BY clause or be used in an aggregate function problem
On Tue, Jan 31, 2012 at 12:27 PM, Edward W. Rouse <erouse@comsquared.com> wrote:
And in most cases this works fine. The problem arises when invoices get
added to the exception table due to their not being an invoice number. Even
though we join on the tracking id, the group by on invoicenum lumps the
different blank invoices into a single line, if the same user has more than
1. What we want is for each of the individual blank invoicenum entries to
have a separate line in the result. If I remove b.invoicenum from the group
by then I get the error in the subject line. If I try to use an aggregate
function (like I used MAX on the names) it's even worse. MAX works on the
names because they are all the same. MAX on the date doesn't seem to effect
the results that I can see other than if an invoice went into exception more
than once, and in that case we only want the latest one anyway.
Any hints as to how to get this to not lump all of the empty invoicenums for
a user into a single line? And I have mentioned putting in a dummy value
like the date for an invoicenum, but that works as well as I expected it
would (not at all).
It seems like the use of the aggregation functions in your example aren't absolutely necessary - though perhaps the date one is. Best solution would be to get rid of the aggregate columns so that you don't have this issue.
If that isn't possible, there are definitely some kludges you can use to get it to work if you really don't want to union 2 queries together - one with and one without valid invoice ids. Assuming invoice ids are generated via a sequence, you can do the following to guarantee a unique 'id' for each empty invoice - at the expense of incrementing your sequence unnecessarily:
COALESCE(invoiceid, nextval(invoiceid_sequence))
A better version of that kludge would be to create a sequence just for this purpose and set it to a very negative number. All of your generated fake ids will then be negative numbers (so easily identified by whatever is consuming the query results) and you can reset the sequence back to the most negative possible value whenever you get concerned about running out of ids, since you won't be using the same sequence as the invoice table itself.
There are probably lots of variations on that general concept. You can write a window function which will remember the ids already seen/generated for each row and just generate an arbitrary id to be used in place of null for grouping, though you'd have to worry about using an id that has not yet appeared but which later does appear. Assuming you can create a bucket of space large enough that is guaranteed to not conflict with valid invoice ids, you could make it work. I don't know if you can have select max(x), invoiceid group by func(invoiceid), so you may have to structure it as "select max(x), func(invoiceid) group by 2" which would require that your query results can deal with faked invoiceids - negative numbers again providing a possible way to identify them.
Doesn't postgres allow operator overloading? Perhaps you could override the equality operator for that type so that just a single value (the max value, for example) always compares as unequal to itself and then use COALESCE(invoiceid, 9223372036854775807). All rows without a valid value will have that value, but the overloaded equality operator will cause them to not group together (I'm guessing. I haven't tried this). That one makes me nervous because of the potential for nasty side effects should something legitimately have that value, but one could probably make the case for having a bigger problem if a column has a value equal to max bigint.
Restructuring the query to separate valid invoice ids from invalid and/or getting rid of the aggregation does seem like the best solution, though.