Home > mailing lists

Re: pg, mysql comparison with "group by" clause - Mailing list pgsql-sql

From	Chris Travers
Subject	Re: pg, mysql comparison with "group by" clause
Date	October 20, 2005 18:33:12
Msg-id	4357E28C.3070401@travelamericas.com Whole thread Raw
In response to	Re: pg, mysql comparison with "group by" clause (Greg Stark <gsstark@mit.edu>)
List	pgsql-sql

Tree view

Greg Stark wrote:

>"Anthony Molinaro" <amolinaro@wgen.net> writes:
>
>  
>
>>Greg,
>>  You'll have to pardon me...
>> 
>>I saw this comment:
>>
>>"I don't see why you think people stumble on this by accident. 
>>I think it's actually an extremely common need."
>>
>>Which, if referring to the ability to have items in the select that do not
>>need to be included in the group, (excluding constants and the like) is just
>>silly.
>>    
>>
>
>Well the "constants and the like" are precisely the point. There are plenty of
>cases where adding the column to the GROUP BY is unnecessary and since
>Postgres makes no attempt to prune them out, inefficient. And constants aren't
>the only such case. The most common case is columns that are coming from a
>table where the primary key is already included in the GROUP BY list.
>  
>
I sort of see what you are saying.... but you have yet to convince me....

>In the case of columns coming from a table where the primary key is already in
>the GROUP BY list it's possible for the database to deduce that it's
>unnecessary to group on that column. 
>  
>
Well....  The question is really whether two things should be true:
1)  whether you want to assume that the programmer is going to know 
about Single/Multi Value Dependency issues per column.  IMO, this is 
more of a DB design issue than a client app issue.  And I would *not* 
want to make that assumption because for higher normal forms where this 
is likely to be a consideration, you are likely to have denormalized 
access via VIEWs anyway.

2)  Whether you are willing to rely on looking at the data first to 
determine whether the query is valid

Alternatively we are back to the ability to get the wrong answer with 
ease and in very difficult to debug ways.  I suspect that MySQL places 
an implicit MIN() around columns not included in the group by 
statement.  I fail to see why this is not an appropriate answer to his 
concern.

>But it's also possible to have cases where the programmer has out of band
>knowledge that it's unnecessary but the database doesn't have that knowledge.
>The most obvious case that comes to mind is a denormalized data model that
>includes a redundant column.
>
>  select dept_id, dept_name, count(*) from employee_list
>  
>
Ok.  You have a few choices:
SELECT MIN(dept_id), dept_name, count(*) FROM employee_list GROUP BY 
dept_name;
SELECT dept_id, dept_name, count(*) FROM employee_list GROUP BY 
dept_name, dept_id;
SELECT dept_id, MIN(dept_name), count(*) FROM employee_list GROUP BY 
dept_id;

And yes, it is bad design in every case I can think of.//
Why is this a problem?

Best Wishes,
Chris Travers
Metatron Technology Consulting

pgsql-sql by date:

From: "Bryce Nesbitt (mailing list account)"
Date: 20 October 2005, 15:59:01
Subject: Re: casting character varying to integer - order by numeric

From: Mario Splivalo
Date: 21 October 2005, 11:13:49
Subject: Postgres 7.4.9 slow!

Re: pg, mysql comparison with "group by" clause - Mailing list pgsql-sql

Previous

Next