Re: pg, mysql comparison with "group by" clause - Mailing list pgsql-sql
From | Anthony Molinaro |
---|---|
Subject | Re: pg, mysql comparison with "group by" clause |
Date | |
Msg-id | 3C6C2B281FD3E74C9F7C9D5B1EDA4582182615@wgexch01.wgenhq.net Whole thread Raw |
In response to | pg, mysql comparison with "group by" clause ("Rick Schumeyer" <rschumeyer@ieee.org>) |
List | pgsql-sql |
Greg, Ok, I think I see where you're going (I don't agree, but I think I get you now). So, using your example of: "dept_name is guaranteed to be the same for all records with the same dept_id." Here: select d.deptno,d.dname from emp e, dept d where e.deptno=d.deptno DEPTNO DNAME ------ -------------- 10 ACCOUNTING 10 ACCOUNTING 10 ACCOUNTING 20 RESEARCH 20 RESEARCH 20 RESEARCH 20 RESEARCH 20 RESEARCH 30 SALES 30 SALES 30 SALES 30 SALES 30 SALES 30 SALES ok, so there's your scenario. And you're suggesting that one should be able to Do the following query? select d.deptno,d.dname,count(*) from emp e, dept d where e.deptno=d.deptno if that's what you suggest, then we'll just have to agree to disagree. That query needs a group by. What you're suggesting is, imo, a wholly unnecessary shortcut (almost as bad as that ridiculous "natural join" - whoever came up with that should be tarred and feathered). I think I see your point now, I just disagree. Your depending on syntax to work based on data integrity? Hmmm.... don't think I like that idea What performance improvement do you get from leaving group by out? Look at the query above, doesn't a count of distinct deptno,dname pairs have to take place anyway? What do you save by excluding group by? Are you suggesting COUNT be computed for each row (windowed) or that COUNT is computed for each group? If you want repeating rows, then you want windowing. For example: select d.deptno,d.dname,count(*)over(partition by d.deptno,d.dname) cnt from emp e, dept d where e.deptno=d.deptno DEPTNO DNAME CNT ------ -------------- --- 10 ACCOUNTING 3 10 ACCOUNTING 3 10 ACCOUNTING 3 20 RESEARCH 5 20 RESEARCH 5 20 RESEARCH 5 20 RESEARCH 5 20 RESEARCH 5 30 SALES 6 30 SALES 6 30 SALES 6 30 SALES 6 30 SALES 6 30 SALES 6 if you want "groups", then use group by: select d.deptno,d.dname,count(*) cnt from emp e, dept dwhere e.deptno=d.deptno group by d.deptno,d.dname DEPTNO DNAME CNT ------ -------------- --- 10 ACCOUNTING 3 20 RESEARCH 5 30 SALES 6 what your suggesting doesn't seem to fit in at all, particularly when pg implements window functions. If you're suggesting the pg optimizer isn't doing the right thing with group by queries, then this is an optimizer issue and that should be hacked, not group by. If you're suggesting certain rows be ditched or shortcuts be taken, then the optimizer should do that, not the programmer writing sql. Db2 and oracle have no problem doing these queries, I don't see why pg should have a problem. imo, the only items that should not be listed in the group by are: 1. constants and deterministic functions 2. scalar subqueries 3. window functions 1 - because the value is same for each row 2&3 - because they are evaluated after the grouping takes place regards, Anthony -----Original Message----- From: gsstark@mit.edu [mailto:gsstark@mit.edu] Sent: Thursday, October 13, 2005 12:25 AM To: Anthony Molinaro Cc: gsstark@mit.edu; Tom Lane; Scott Marlowe; Stephan Szabo; Rick Schumeyer; pgsql-sql@postgresql.org Subject: Re: [SQL] pg, mysql comparison with "group by" clause "Anthony Molinaro" <amolinaro@wgen.net> writes: > Greg, > You'll have to pardon me... > > I saw this comment: > > "I don't see why you think people stumble on this by accident. > I think it's actually an extremely common need." > > Which, if referring to the ability to have items in the select that do not > need to be included in the group, (excluding constants and the like) is just > silly. Well the "constants and the like" are precisely the point. There are plenty of cases where adding the column to the GROUP BY is unnecessary and since Postgres makes no attempt to prune them out, inefficient. And constants aren't the only such case. The most common case is columns that are coming from a table where the primary key is already included in the GROUP BY list. In the case of columns coming from a table where the primary key is already in the GROUP BY list it's possible for the database to deduce that it's unnecessary to group on that column. But it's also possible to have cases where the programmer has out of band knowledge that it's unnecessary but the database doesn't have that knowledge. The most obvious case that comes to mind is a denormalized data model that includes a redundant column. select dept_id, dept_name, count(*) from employee_list For example if dept_name is guaranteed to be the same for all records with the same dept_id. Of course that's generally considered poor design but it doesn't mean there aren't thousands of databases out there with data models like that. -- greg