Re: pg, mysql comparison with "group by" clause - Mailing list pgsql-sql

From Anthony Molinaro
Subject Re: pg, mysql comparison with "group by" clause
Date
Msg-id 3C6C2B281FD3E74C9F7C9D5B1EDA4582182612@wgexch01.wgenhq.net
Whole thread Raw
In response to pg, mysql comparison with "group by" clause  ("Rick Schumeyer" <rschumeyer@ieee.org>)
List pgsql-sql
> I don't see why you think people stumble on this by accident. I think
it's
> actually an extremely common need.

I can't imagine how that's a common need at all.

It makes no sense.

When you add an additional column in the select, it must be included in
the group by as it changes the meaning of the query.

Consider:

select deptno, count(*) from emp group by deptno;
   DEPTNO   COUNT(*)
---------- ----------       10          3       20          5       30          6


the query above counts the number of employees in each department.


Now consider the following query:

select deptno,job,count(*) from emp group by deptno,job;
   DEPTNO JOB         COUNT(*)
---------- --------- ----------       10 CLERK              1       10 MANAGER            1       10 PRESIDENT
1      20 CLERK              2       20 ANALYST            2       20 MANAGER            1       30 CLERK
1      30 MANAGER            1       30 SALESMAN           4 


the query above counts the number of different job types in each
department.

In mysql, you would be allowed to do the following:

select deptno,job,count(*) from emp group by deptno;

but it makes no sense. What value would it return and what does it mean?
How can that possibly represent reliable data? What would the result set
above look like? It would be meaningless in a production system.
honestly.


It's a silly bug that mysql has touted has a feature
and I can't imagine why people think it's useful.


> I think most MySQL users don't stumble on it, they learn it as the way
to
> handle the common use case when you join a master table against a
detail
> table and then want to aggregate all the detail records.

Huh? I don't follow that at all....
Perhaps your confusing the concept with window functions that neither pg
nor mysql have implemented yet?
For example, using window functions allows you to return aggregate and
detailed info simultaneously:

select ename,      deptno,      job,      count(*)over(partition by deptno) as emps_per_dept,
count(*)over(partitionby deptno,job) as job_per_dept,      count(*)over() as total  from emp 

ENAME  DEPTNO JOB       EMPS_PER_DEPT JOB_PER_DEPT TOTAL
------ ------ --------- ------------- ------------ -----
MILLER     10 CLERK                 3            1    14
CLARK      10 MANAGER               3            1    14
KING       10 PRESIDENT             3            1    14
SCOTT      20 ANALYST               5            2    14
FORD       20 ANALYST               5            2    14
SMITH      20 CLERK                 5            2    14
ADAMS      20 CLERK                 5            2    14
JONES      20 MANAGER               5            1    14
JAMES      30 CLERK                 6            1    14
BLAKE      30 MANAGER               6            1    14
ALLEN      30 SALESMAN              6            4    14
MARTIN     30 SALESMAN              6            4    14
TURNER     30 SALESMAN              6            4    14
WARD       30 SALESMAN              6            4    14


But this is not a group by, this is aggregating and windowing,
which is quite different from mysql adding that nasty little bug
and calling it a feature.
- a

-----Original Message-----
From: pgsql-sql-owner@postgresql.org
[mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Greg Stark
Sent: Wednesday, October 12, 2005 9:13 PM
To: Scott Marlowe
Cc: Greg Stark; Stephan Szabo; Rick Schumeyer; pgsql-sql@postgresql.org
Subject: Re: [SQL] pg, mysql comparison with "group by" clause

Scott Marlowe <smarlowe@g2switchworks.com> writes:

> Hehe.  When I turn on my windshield wipers and my airbag deploys, is
it
> a documented "feature" if the dealership told me about this behaviour
> ahead of time?

Well it's more like my car where the dashboard dims when I turn on my
headlights which annoys me to no end since I learned to always put my
headlights on even in the day.

> In much the same way, while this behaviour may be documented by MySQL,
I
> can't imagine it really being called a feature. But at least this
> misbehaviour is documented. However, I think most people in the MySQL
> universe just stumble onto it by accident when they try it and it
works. I'd
> at least prefer it to throw a warning or notice or something.

I don't see why you think people stumble on this by accident. I think
it's
actually an extremely common need. So common that Postgres has the same
feature (though less general) and invented a whole syntax to handle it.

I think most MySQL users don't stumble on it, they learn it as the way
to
handle the common use case when you join a master table against a detail
table
and then want to aggregate all the detail records. In standard SQL you
have to
write GROUP BY ... and list every single column you need from the master
table. Forcing the database to do a lot of redundant comparisons and
sort on
uselessly long keys where in fact you only really need it to sort and
group by
the primary key.

Remember, most MySQL users learn MySQL first, and only later learn what
is
standard SQL and what isn't.

> A Subselect would let you do such a thing as well, and while it's more
> complicated to write, it is likely to be easier to tell just what it's
> doing.

Subselects have their own problems here. Mainly Postgres's optimizer, as
good
as it is, doesn't treat them with the same code paths as joins and can't
find
all the same plans for them. But in any case you cannot always write a
subselect that's equivalent to an arbitrary join.


--
greg


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?
              http://archives.postgresql.org


pgsql-sql by date:

Previous
From: "Stewart Ben (RBAU/EQS4) *"
Date:
Subject: Re: Update timestamp on update
Next
From: Michael Glaesemann
Date:
Subject: Re: Update timestamp on update