Thread: GROUP BY or alternative means to group

GROUP BY or alternative means to group

From
Alexander Reichstadt
Date:
Hi,

the following statement worked on mysql but gives me an error on postgres:

column "addresses.address1" must appear in the GROUP BY clause or be used in an aggregate function

I guess I am doing something wrong. I read the web answers, but none of them seem to meet my needs:

SELECT companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip FROM companies JOIN addresses_reference ON companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;


What I did now was create a view based on above statement but without grouping. This returns a list with non-distinct values for all companies that have more than one address, which is correct. But in some cases I only need one address and the problem is that I cannot use distinct.

I wanted to have some way to display a companies list that only gives me the first stored addresses related, and disregard any further addresses.

Is there any way to do this?

Thanks
Alex

Re: GROUP BY or alternative means to group

From
Bartosz Dmytrak
Date:
Hi,
You can use one of windowing function:
this could be rank() in subquery or first_value(vale any), but there could be performance issue

another solution could be boolean flag "default" in table address_reference which should be unique for single company, I mean value true should be unique - this could be reached by unique partial index on column refid_companies with condition default = true

hope Your pg version supports windowing functions (as I remember 8.4 and above)

Of course there is a solution with subquery which finds min id in table addresses of each refid_companies in table addresses_reference and this subquery is joined with companies table, but I am afraid this is not the best one.

Regards,
Bartek


2012/3/12 Alexander Reichstadt <lxr@mac.com>
Hi,

the following statement worked on mysql but gives me an error on postgres:

column "addresses.address1" must appear in the GROUP BY clause or be used in an aggregate function

I guess I am doing something wrong. I read the web answers, but none of them seem to meet my needs:

SELECT companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip FROM companies JOIN addresses_reference ON companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;


What I did now was create a view based on above statement but without grouping. This returns a list with non-distinct values for all companies that have more than one address, which is correct. But in some cases I only need one address and the problem is that I cannot use distinct.

I wanted to have some way to display a companies list that only gives me the first stored addresses related, and disregard any further addresses.

Is there any way to do this?

Thanks
Alex

Solved [Re: GROUP BY or alternative means to group]

From
Alexander Reichstadt
Date:
So the mysql way for group by seems to be non-standard.

What works for postgres is the DISTINCT ON (fieldname) approach.

Thanks



Am 12.03.2012 um 20:35 schrieb Alexander Reichstadt:

Hi,

the following statement worked on mysql but gives me an error on postgres:

column "addresses.address1" must appear in the GROUP BY clause or be used in an aggregate function

I guess I am doing something wrong. I read the web answers, but none of them seem to meet my needs:

SELECT companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip FROM companies JOIN addresses_reference ON companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;


What I did now was create a view based on above statement but without grouping. This returns a list with non-distinct values for all companies that have more than one address, which is correct. But in some cases I only need one address and the problem is that I cannot use distinct.

I wanted to have some way to display a companies list that only gives me the first stored addresses related, and disregard any further addresses.

Is there any way to do this?

Thanks
Alex

Re: GROUP BY or alternative means to group

From
Kiriakos Georgiou
Date:
Instead of the joins you can use a subquery to get the first address.

Or you can do the joins without the group by and use row_number() over(partition by companies.id) on the select list to label each company address with a number starting at 1.  You can just keep rows that have row_number = 1.  See http://www.postgresql.org/docs/current/static/tutorial-window.html to get the feeling how window functions work.

Kiriakos

On Mar 12, 2012, at 3:35 PM, Alexander Reichstadt wrote:

Hi,

the following statement worked on mysql but gives me an error on postgres:

column "addresses.address1" must appear in the GROUP BY clause or be used in an aggregate function

I guess I am doing something wrong. I read the web answers, but none of them seem to meet my needs:

SELECT companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip FROM companies JOIN addresses_reference ON companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;


What I did now was create a view based on above statement but without grouping. This returns a list with non-distinct values for all companies that have more than one address, which is correct. But in some cases I only need one address and the problem is that I cannot use distinct.

I wanted to have some way to display a companies list that only gives me the first stored addresses related, and disregard any further addresses.

Is there any way to do this?

Thanks
Alex

Re: GROUP BY or alternative means to group

From
Alexander Reichstadt
Date:
Thanks, I just posted my response to my own question for the archives. I take it also that group by is faster than distinct on. If it is a substantial performance gain I have to work on this some more. A subquery I would expect would be much of a drag, so for all keystroke-updated list-tables this would not be suitable I think.



Am 12.03.2012 um 21:57 schrieb Bartosz Dmytrak:

Hi,
You can use one of windowing function:
this could be rank() in subquery or first_value(vale any), but there could be performance issue

another solution could be boolean flag "default" in table address_reference which should be unique for single company, I mean value true should be unique - this could be reached by unique partial index on column refid_companies with condition default = true

hope Your pg version supports windowing functions (as I remember 8.4 and above)

Of course there is a solution with subquery which finds min id in table addresses of each refid_companies in table addresses_reference and this subquery is joined with companies table, but I am afraid this is not the best one.

Regards,
Bartek


2012/3/12 Alexander Reichstadt <lxr@mac.com>
Hi,

the following statement worked on mysql but gives me an error on postgres:

column "addresses.address1" must appear in the GROUP BY clause or be used in an aggregate function

I guess I am doing something wrong. I read the web answers, but none of them seem to meet my needs:

SELECT companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip FROM companies JOIN addresses_reference ON companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;


What I did now was create a view based on above statement but without grouping. This returns a list with non-distinct values for all companies that have more than one address, which is correct. But in some cases I only need one address and the problem is that I cannot use distinct.

I wanted to have some way to display a companies list that only gives me the first stored addresses related, and disregard any further addresses.

Is there any way to do this?

Thanks
Alex


Re: GROUP BY or alternative means to group

From
Scott Marlowe
Date:
On Mon, Mar 12, 2012 at 1:35 PM, Alexander Reichstadt <lxr@mac.com> wrote:
> Hi,
>
> the following statement worked on mysql but gives me an error on postgres:
>
> column "addresses.address1" must appear in the GROUP BY clause or be used in
> an aggregate function
>
> I guess I am doing something wrong. I read the web answers, but none of them
> seem to meet my needs:
>
> SELECT
>
companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip
> FROM companies JOIN addresses_reference ON
> companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON
> addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;
>
>
> What I did now was create a view based on above statement but without
> grouping. This returns a list with non-distinct values for all companies
> that have more than one address, which is correct. But in some cases I only
> need one address and the problem is that I cannot use distinct.
>
> I wanted to have some way to display a companies list that only gives me the
> first stored addresses related, and disregard any further addresses.
>
> Is there any way to do this?

If you don't care which address you get, you can use max(address) or
min(address).

Re: GROUP BY or alternative means to group

From
Alexander Reichstadt
Date:
I guess I lack the knowledge to integrate your answer in my query....Actually I'd prefer to always see the first
addressentered unless there is a where-clause added. Not sure how this works out then and haven't tested. But given the
initialquery extended by distinct on it would be like so: 

>> SELECT distinct on (companies.id)
>>
companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip
>> FROM companies JOIN addresses_reference ON
>> companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON
>> addresses_reference.refid_addresses=addresses.id GROUP BY companies.id
where addresses.city ILIKE '%bla%'

I guess postgres would make sense and deliver the entry with bla with the where-clause, and disregard the bla entry
returningrandom addresses associated with the company without the where-clause. 

But where would I insert the max(address) piece?




Am 12.03.2012 um 22:09 schrieb Scott Marlowe:

> On Mon, Mar 12, 2012 at 1:35 PM, Alexander Reichstadt <lxr@mac.com> wrote:
>> Hi,
>>
>> the following statement worked on mysql but gives me an error on postgres:
>>
>> column "addresses.address1" must appear in the GROUP BY clause or be used in
>> an aggregate function
>>
>> I guess I am doing something wrong. I read the web answers, but none of them
>> seem to meet my needs:
>>
>> SELECT
>>
companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip
>> FROM companies JOIN addresses_reference ON
>> companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON
>> addresses_reference.refid_addresses=addresses.id GROUP BY companies.id;
>>
>>
>> What I did now was create a view based on above statement but without
>> grouping. This returns a list with non-distinct values for all companies
>> that have more than one address, which is correct. But in some cases I only
>> need one address and the problem is that I cannot use distinct.
>>
>> I wanted to have some way to display a companies list that only gives me the
>> first stored addresses related, and disregard any further addresses.
>>
>> Is there any way to do this?
>
> If you don't care which address you get, you can use max(address) or
> min(address).
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general


Re: GROUP BY or alternative means to group

From
Scott Marlowe
Date:
On Mon, Mar 12, 2012 at 3:19 PM, Alexander Reichstadt <lxr@mac.com> wrote:
> But where would I insert the max(address) piece?
>

Just put max() or min() around any field in the select list that's not
in the group by clause

Re: GROUP BY or alternative means to group

From
Michael Gould
Date:
You need to include all columns that are not aggregrative columns in the group by.  Even though that is the standard it
isa pain to list all columns even if you don't need them 

Best Regards

Michael Gould

Sent from Samsung mobile

Alexander Reichstadt <lxr@mac.com> wrote:

>Hi,
>
>the following statement worked on mysql but gives me an error on postgres:
>
>column "addresses.address1" must appear in the GROUP BY clause or be used in an aggregate function
>
>I guess I am doing something wrong. I read the web answers, but none of them seem to meet my needs:
>
>SELECT
companies.id,companies.name,companies.organizationkind,addresses.address1,addresses.address2,addresses.city,addresses.zip
FROMcompanies JOIN addresses_reference ON companies.id=addresses_reference.refid_companies LEFT JOIN addresses ON
addresses_reference.refid_addresses=addresses.idGROUP BY companies.id; 
>
>
>What I did now was create a view based on above statement but without grouping. This returns a list with non-distinct
valuesfor all companies that have more than one address, which is correct. But in some cases I only need one address
andthe problem is that I cannot use distinct. 
>
>I wanted to have some way to display a companies list that only gives me the first stored addresses related, and
disregardany further addresses. 
>
>Is there any way to do this?
>
>Thanks
>Alex

Re: GROUP BY or alternative means to group

From
Bruno Wolff III
Date:
On Mon, Mar 12, 2012 at 16:18:05 -0400,
   Michael Gould <mgould@isstrucksoftware.net> wrote:
>You need to include all columns that are not aggregrative columns in the group by.  Even though that is the standard
itis a pain to list all columns even if you don't need them 

In later versions of postgres this is relaxed a bit. If you are grouping
by a primary key, you don't need to group by columns that are fixed
by that key. For example the following query is accepted in 9.1 as gameid
is a key for games and hence we don't need to also group by ga,es.title.

SELECT games.gameid, games.title
   FROM games, crate
     WHERE
       games.gameid = crate.gameid
       AND
       games.contact = 'BOB'
       AND
       crate.touched >= current_timestamp + '4 year ago'
   GROUP BY games.gameid
   HAVING count(1) < 30
   ORDER BY games.gameid
;

Re: GROUP BY or alternative means to group

From
Michael Gould
Date:
Thanks that is a help. I would be nice if any key could be used as those are normally the things I would do group by's

Regards

Mike Gould




From my Samsung Android tablet on T-Mobile. The first nationwide 4G network

Bruno Wolff III <bruno@wolff.to> wrote:
On Mon, Mar 12, 2012 at 16:18:05 -0400,
   Michael Gould <mgould@isstrucksoftware.net> wrote:
>You need to include all columns that are not aggregrative columns in the group by.  Even though that is the standard it is a pain to list all columns even if you don't need them

In later versions of postgres this is relaxed a bit. If you are grouping
by a primary key, you don't need to group by columns that are fixed
by that key. For example the following query is accepted in 9.1 as gameid
is a key for games and hence we don't need to also group by ga,es.title.

SELECT games.gameid, games.title
   FROM games, crate
     WHERE
       games.gameid = crate.gameid
       AND
       games.contact = 'BOB'
       AND
       crate.touched >= current_timestamp + '4 year ago'
   GROUP BY games.gameid
   HAVING count(1) < 30
   ORDER BY games.gameid
;

Re: GROUP BY or alternative means to group

From
Bruno Wolff III
Date:
On Mon, Apr 09, 2012 at 13:55:04 -0400,
   Michael Gould <mgould@isstrucksoftware.net> wrote:
>Thanks that is a help. I would be nice if any key could be used as those are normally the things I would do group by's

This is what the 9.1 documentation says:
"When GROUP BY is present, it is not valid for the SELECT list expressions to
refer to ungrouped columns except within aggregate functions or if the
ungrouped column is functionally dependent on the grouped columns, since
there would otherwise be more than one possible value to return for an
ungrouped column. A functional dependency exists if the grouped columns (or
a subset thereof) are the primary key of the table containing the ungrouped
column."

That implies you need to group by a primary key. I haven't tested if
that (other keys can't provide this) is actually the case.