Re: pg, mysql comparison with "group by" clause - Mailing list pgsql-sql

From Chris Travers
Subject Re: pg, mysql comparison with "group by" clause
Date
Msg-id 4357E28C.3070401@travelamericas.com
Whole thread Raw
In response to Re: pg, mysql comparison with "group by" clause  (Greg Stark <gsstark@mit.edu>)
List pgsql-sql
Greg Stark wrote:

>"Anthony Molinaro" <amolinaro@wgen.net> writes:
>
>  
>
>>Greg,
>>  You'll have to pardon me...
>> 
>>I saw this comment:
>>
>>"I don't see why you think people stumble on this by accident. 
>>I think it's actually an extremely common need."
>>
>>Which, if referring to the ability to have items in the select that do not
>>need to be included in the group, (excluding constants and the like) is just
>>silly.
>>    
>>
>
>Well the "constants and the like" are precisely the point. There are plenty of
>cases where adding the column to the GROUP BY is unnecessary and since
>Postgres makes no attempt to prune them out, inefficient. And constants aren't
>the only such case. The most common case is columns that are coming from a
>table where the primary key is already included in the GROUP BY list.
>  
>
I sort of see what you are saying.... but you have yet to convince me....

>In the case of columns coming from a table where the primary key is already in
>the GROUP BY list it's possible for the database to deduce that it's
>unnecessary to group on that column. 
>  
>
Well....  The question is really whether two things should be true:
1)  whether you want to assume that the programmer is going to know 
about Single/Multi Value Dependency issues per column.  IMO, this is 
more of a DB design issue than a client app issue.  And I would *not* 
want to make that assumption because for higher normal forms where this 
is likely to be a consideration, you are likely to have denormalized 
access via VIEWs anyway.

2)  Whether you are willing to rely on looking at the data first to 
determine whether the query is valid

Alternatively we are back to the ability to get the wrong answer with 
ease and in very difficult to debug ways.  I suspect that MySQL places 
an implicit MIN() around columns not included in the group by 
statement.  I fail to see why this is not an appropriate answer to his 
concern.

>But it's also possible to have cases where the programmer has out of band
>knowledge that it's unnecessary but the database doesn't have that knowledge.
>The most obvious case that comes to mind is a denormalized data model that
>includes a redundant column.
>
>  select dept_id, dept_name, count(*) from employee_list
>  
>
Ok.  You have a few choices:
SELECT MIN(dept_id), dept_name, count(*) FROM employee_list GROUP BY 
dept_name;
SELECT dept_id, dept_name, count(*) FROM employee_list GROUP BY 
dept_name, dept_id;
SELECT dept_id, MIN(dept_name), count(*) FROM employee_list GROUP BY 
dept_id;

And yes, it is bad design in every case I can think of.//
Why is this a problem?

Best Wishes,
Chris Travers
Metatron Technology Consulting


pgsql-sql by date:

Previous
From: "Bryce Nesbitt (mailing list account)"
Date:
Subject: Re: casting character varying to integer - order by numeric
Next
From: Mario Splivalo
Date:
Subject: Postgres 7.4.9 slow!