Thread: GROUP BY or alternative means to group
Hi,
the following statement worked on mysql but gives me an error on postgres:
column "addresses.address1" must appear in the GROUP BY clause or be used in an aggregate function
I guess I am doing something wrong. I read the web answers, but none of them seem to meet my needs:
SELECT companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip FROM companies JOIN addresses_reference ON companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;
What I did now was create a view based on above statement but without grouping. This returns a list with non-distinct values for all companies that have more than one address, which is correct. But in some cases I only need one address and the problem is that I cannot use distinct.
I wanted to have some way to display a companies list that only gives me the first stored addresses related, and disregard any further addresses.
Is there any way to do this?
Thanks
Alex
Hi,
You can use one of windowing function:
this could be rank() in subquery or first_value(vale any), but there could be performance issue
another solution could be boolean flag "default" in table address_reference which should be unique for single company, I mean value true should be unique - this could be reached by unique partial index on column refid_companies with condition default = true
hope Your pg version supports windowing functions (as I remember 8.4 and above)
Of course there is a solution with subquery which finds min id in table addresses of each refid_companies in table addresses_reference and this subquery is joined with companies table, but I am afraid this is not the best one.
Regards,
Bartek
Bartek
2012/3/12 Alexander Reichstadt <lxr@mac.com>
Hi,the following statement worked on mysql but gives me an error on postgres:column "addresses.address1" must appear in the GROUP BY clause or be used in an aggregate functionI guess I am doing something wrong. I read the web answers, but none of them seem to meet my needs:SELECT companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip FROM companies JOIN addresses_reference ON companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;What I did now was create a view based on above statement but without grouping. This returns a list with non-distinct values for all companies that have more than one address, which is correct. But in some cases I only need one address and the problem is that I cannot use distinct.I wanted to have some way to display a companies list that only gives me the first stored addresses related, and disregard any further addresses.Is there any way to do this?ThanksAlex
So the mysql way for group by seems to be non-standard.
What works for postgres is the DISTINCT ON (fieldname) approach.
Thanks
Am 12.03.2012 um 20:35 schrieb Alexander Reichstadt:
Hi,the following statement worked on mysql but gives me an error on postgres:column "addresses.address1" must appear in the GROUP BY clause or be used in an aggregate functionI guess I am doing something wrong. I read the web answers, but none of them seem to meet my needs:SELECT companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip FROM companies JOIN addresses_reference ON companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;What I did now was create a view based on above statement but without grouping. This returns a list with non-distinct values for all companies that have more than one address, which is correct. But in some cases I only need one address and the problem is that I cannot use distinct.I wanted to have some way to display a companies list that only gives me the first stored addresses related, and disregard any further addresses.Is there any way to do this?ThanksAlex
Instead of the joins you can use a subquery to get the first address.
Or you can do the joins without the group by and use row_number() over(partition by companies.id) on the select list to label each company address with a number starting at 1. You can just keep rows that have row_number = 1. See http://www.postgresql.org/docs/current/static/tutorial-window.html to get the feeling how window functions work.
Kiriakos
On Mar 12, 2012, at 3:35 PM, Alexander Reichstadt wrote:
Hi,the following statement worked on mysql but gives me an error on postgres:column "addresses.address1" must appear in the GROUP BY clause or be used in an aggregate functionI guess I am doing something wrong. I read the web answers, but none of them seem to meet my needs:SELECT companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip FROM companies JOIN addresses_reference ON companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;What I did now was create a view based on above statement but without grouping. This returns a list with non-distinct values for all companies that have more than one address, which is correct. But in some cases I only need one address and the problem is that I cannot use distinct.I wanted to have some way to display a companies list that only gives me the first stored addresses related, and disregard any further addresses.Is there any way to do this?ThanksAlex
Thanks, I just posted my response to my own question for the archives. I take it also that group by is faster than distinct on. If it is a substantial performance gain I have to work on this some more. A subquery I would expect would be much of a drag, so for all keystroke-updated list-tables this would not be suitable I think.
Am 12.03.2012 um 21:57 schrieb Bartosz Dmytrak:
Hi,You can use one of windowing function:this could be rank() in subquery or first_value(vale any), but there could be performance issueanother solution could be boolean flag "default" in table address_reference which should be unique for single company, I mean value true should be unique - this could be reached by unique partial index on column refid_companies with condition default = truehope Your pg version supports windowing functions (as I remember 8.4 and above)Of course there is a solution with subquery which finds min id in table addresses of each refid_companies in table addresses_reference and this subquery is joined with companies table, but I am afraid this is not the best one.Regards,
Bartek2012/3/12 Alexander Reichstadt <lxr@mac.com>Hi,the following statement worked on mysql but gives me an error on postgres:column "addresses.address1" must appear in the GROUP BY clause or be used in an aggregate functionI guess I am doing something wrong. I read the web answers, but none of them seem to meet my needs:SELECT companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip FROM companies JOIN addresses_reference ON companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;What I did now was create a view based on above statement but without grouping. This returns a list with non-distinct values for all companies that have more than one address, which is correct. But in some cases I only need one address and the problem is that I cannot use distinct.I wanted to have some way to display a companies list that only gives me the first stored addresses related, and disregard any further addresses.Is there any way to do this?ThanksAlex
On Mon, Mar 12, 2012 at 1:35 PM, Alexander Reichstadt <lxr@mac.com> wrote: > Hi, > > the following statement worked on mysql but gives me an error on postgres: > > column "addresses.address1" must appear in the GROUP BY clause or be used in > an aggregate function > > I guess I am doing something wrong. I read the web answers, but none of them > seem to meet my needs: > > SELECT > companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip > FROM companies JOIN addresses_reference ON > companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON > addresses_reference.refid_addresses=addresses.id GROUP BY companies.id; > > > What I did now was create a view based on above statement but without > grouping. This returns a list with non-distinct values for all companies > that have more than one address, which is correct. But in some cases I only > need one address and the problem is that I cannot use distinct. > > I wanted to have some way to display a companies list that only gives me the > first stored addresses related, and disregard any further addresses. > > Is there any way to do this? If you don't care which address you get, you can use max(address) or min(address).
I guess I lack the knowledge to integrate your answer in my query....Actually I'd prefer to always see the first addressentered unless there is a where-clause added. Not sure how this works out then and haven't tested. But given the initialquery extended by distinct on it would be like so: >> SELECT distinct on (companies.id) >> companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip >> FROM companies JOIN addresses_reference ON >> companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON >> addresses_reference.refid_addresses=addresses.id GROUP BY companies.id where addresses.city ILIKE '%bla%' I guess postgres would make sense and deliver the entry with bla with the where-clause, and disregard the bla entry returningrandom addresses associated with the company without the where-clause. But where would I insert the max(address) piece? Am 12.03.2012 um 22:09 schrieb Scott Marlowe: > On Mon, Mar 12, 2012 at 1:35 PM, Alexander Reichstadt <lxr@mac.com> wrote: >> Hi, >> >> the following statement worked on mysql but gives me an error on postgres: >> >> column "addresses.address1" must appear in the GROUP BY clause or be used in >> an aggregate function >> >> I guess I am doing something wrong. I read the web answers, but none of them >> seem to meet my needs: >> >> SELECT >> companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip >> FROM companies JOIN addresses_reference ON >> companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON >> addresses_reference.refid_addresses=addresses.id GROUP BY companies.id; >> >> >> What I did now was create a view based on above statement but without >> grouping. This returns a list with non-distinct values for all companies >> that have more than one address, which is correct. But in some cases I only >> need one address and the problem is that I cannot use distinct. >> >> I wanted to have some way to display a companies list that only gives me the >> first stored addresses related, and disregard any further addresses. >> >> Is there any way to do this? > > If you don't care which address you get, you can use max(address) or > min(address). > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general
On Mon, Mar 12, 2012 at 3:19 PM, Alexander Reichstadt <lxr@mac.com> wrote: > But where would I insert the max(address) piece? > Just put max() or min() around any field in the select list that's not in the group by clause
You need to include all columns that are not aggregrative columns in the group by. Even though that is the standard it isa pain to list all columns even if you don't need them Best Regards Michael Gould Sent from Samsung mobile Alexander Reichstadt <lxr@mac.com> wrote: >Hi, > >the following statement worked on mysql but gives me an error on postgres: > >column "addresses.address1" must appear in the GROUP BY clause or be used in an aggregate function > >I guess I am doing something wrong. I read the web answers, but none of them seem to meet my needs: > >SELECT companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip FROMcompanies JOIN addresses_reference ON companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON addresses_reference.refid_addresses=addresses.idGROUP BY companies.id; > > >What I did now was create a view based on above statement but without grouping. This returns a list with non-distinct valuesfor all companies that have more than one address, which is correct. But in some cases I only need one address andthe problem is that I cannot use distinct. > >I wanted to have some way to display a companies list that only gives me the first stored addresses related, and disregardany further addresses. > >Is there any way to do this? > >Thanks >Alex
On Mon, Mar 12, 2012 at 16:18:05 -0400, Michael Gould <mgould@isstrucksoftware.net> wrote: >You need to include all columns that are not aggregrative columns in the group by. Even though that is the standard itis a pain to list all columns even if you don't need them In later versions of postgres this is relaxed a bit. If you are grouping by a primary key, you don't need to group by columns that are fixed by that key. For example the following query is accepted in 9.1 as gameid is a key for games and hence we don't need to also group by ga,es.title. SELECT games.gameid, games.title FROM games, crate WHERE games.gameid = crate.gameid AND games.contact = 'BOB' AND crate.touched >= current_timestamp + '4 year ago' GROUP BY games.gameid HAVING count(1) < 30 ORDER BY games.gameid ;
Thanks that is a help. I would be nice if any key could be used as those are normally the things I would do group by's
Bruno Wolff III <bruno@wolff.to> wrote:
On Mon, Mar 12, 2012 at 16:18:05 -0400,
Michael Gould <mgould@isstrucksoftware.net> wrote:
>You need to include all columns that are not aggregrative columns in the group by. Even though that is the standard it is a pain to list all columns even if you don't need them
In later versions of postgres this is relaxed a bit. If you are grouping
by a primary key, you don't need to group by columns that are fixed
by that key. For example the following query is accepted in 9.1 as gameid
is a key for games and hence we don't need to also group by ga,es.title.
SELECT games.gameid, games.title
FROM games, crate
WHERE
games.gameid = crate.gameid
AND
games.contact = 'BOB'
AND
crate.touched >= current_timestamp + '4 year ago'
GROUP BY games.gameid
HAVING count(1) < 30
ORDER BY games.gameid
;
Regards
Mike Gould
From my Samsung Android tablet on T-Mobile. The first nationwide 4G network
From my Samsung Android tablet on T-Mobile. The first nationwide 4G network
Bruno Wolff III <bruno@wolff.to> wrote:
On Mon, Mar 12, 2012 at 16:18:05 -0400,
Michael Gould <mgould@isstrucksoftware.net> wrote:
>You need to include all columns that are not aggregrative columns in the group by. Even though that is the standard it is a pain to list all columns even if you don't need them
In later versions of postgres this is relaxed a bit. If you are grouping
by a primary key, you don't need to group by columns that are fixed
by that key. For example the following query is accepted in 9.1 as gameid
is a key for games and hence we don't need to also group by ga,es.title.
SELECT games.gameid, games.title
FROM games, crate
WHERE
games.gameid = crate.gameid
AND
games.contact = 'BOB'
AND
crate.touched >= current_timestamp + '4 year ago'
GROUP BY games.gameid
HAVING count(1) < 30
ORDER BY games.gameid
;
On Mon, Apr 09, 2012 at 13:55:04 -0400, Michael Gould <mgould@isstrucksoftware.net> wrote: >Thanks that is a help. I would be nice if any key could be used as those are normally the things I would do group by's This is what the 9.1 documentation says: "When GROUP BY is present, it is not valid for the SELECT list expressions to refer to ungrouped columns except within aggregate functions or if the ungrouped column is functionally dependent on the grouped columns, since there would otherwise be more than one possible value to return for an ungrouped column. A functional dependency exists if the grouped columns (or a subset thereof) are the primary key of the table containing the ungrouped column." That implies you need to group by a primary key. I haven't tested if that (other keys can't provide this) is actually the case.