Thread: A GROUP BY question

A GROUP BY question

From

stan

Date:

13 August 2019, 09:54:03

I am trying to write, what is for me, a fairly complex query. It uses JOINS,
and also GROUP BY. I have this working with the exception of adding the
GROUP BY clause. 

Is there some reason I cannot add a GROUP BY function to a JOIN?

Here is what I have:


CREATE OR REPLACE view tasks_view as 
select 
    project.proj_no ,
    employee.first_name ,
    employee.last_name ,
    employee.id ,
    task_instance.hours , 
    work_type.type,
    work_type.descrip,
    rate.rate,
    employee.hourly_rate ,
    rate.rate * task_instance.hours as result ,
    SUM(rate.rate * task_instance.hours) 
    ^^^^^^^^^^^^^^
from 
    task_instance
GROUP BY 
^^^^^^^^^^^^^^^^^^
    project.project_key 
^^^^^^^^^^^^^^^^^^^^^^
join rate on 
    rate.employee_key = task_instance.employee_key
    AND
    rate.work_type_key = task_instance.work_type_key
inner join employee on
    rate.employee_key = employee.employee_key
inner join work_type on
    rate.work_type_key = work_type.work_type_key
inner join project on
    project.project_key = task_instance.project_key
ORDER BY 
    project.proj_no ,
    employee.id
    ;

The underlined lines are what I added, and I get a syntax error pointing to
the join. This works fine without the added lines.

-- 
"They that would give up essential liberty for temporary safety deserve
neither liberty nor safety."
                        -- Benjamin Franklin

Re: A GROUP BY question

From

stan

Date:

13 August 2019, 11:10:18

On Tue, Aug 13, 2019 at 05:54:03AM -0400, stan wrote:
> I am trying to write, what is for me, a fairly complex query. It uses JOINS,
> and also GROUP BY. I have this working with the exception of adding the
> GROUP BY clause. 
> 
> Is there some reason I cannot add a GROUP BY function to a JOIN?
> 
> Here is what I have:
> 
> 
> CREATE OR REPLACE view tasks_view as 
> select 
>     project.proj_no ,
>     employee.first_name ,
>     employee.last_name ,
>     employee.id ,
>     task_instance.hours , 
>     work_type.type,
>     work_type.descrip,
>     rate.rate,
>     employee.hourly_rate ,
>     rate.rate * task_instance.hours as result ,
>     SUM(rate.rate * task_instance.hours) 
>     ^^^^^^^^^^^^^^
> from 
>     task_instance
> GROUP BY 
> ^^^^^^^^^^^^^^^^^^
>     project.project_key 
> ^^^^^^^^^^^^^^^^^^^^^^
> join rate on 
>     rate.employee_key = task_instance.employee_key
>     AND
>     rate.work_type_key = task_instance.work_type_key
> inner join employee on
>     rate.employee_key = employee.employee_key
> inner join work_type on
>     rate.work_type_key = work_type.work_type_key
> inner join project on
>     project.project_key = task_instance.project_key
> ORDER BY 
>     project.proj_no ,
>     employee.id
>     ;
> 
Maybe I have a basic misunderstanding. What I am trying to get is a total
cost for each project. This would be calculated by multiplying rate and
hours for each row, on a per project base, and then summing all of th
products of this multiplication.

I did get the following to be accepted from a syntax basis, but it returns
rows with the product for each row, and something in the sum column which
is the same.


DROP view tasks_view ;

CREATE OR REPLACE view tasks_view as 
select 
    project.proj_no ,
    employee.first_name ,
    employee.last_name ,
    employee.id ,
    task_instance.hours , 
    work_type.type,
    work_type.descrip,
    rate.rate,
    employee.hourly_rate ,
    rate.rate * task_instance.hours as result ,
    SUM (rate.rate * task_instance.hours) 
from 
    task_instance
join rate on 
    rate.employee_key = task_instance.employee_key
    AND
    rate.work_type_key = task_instance.work_type_key
inner join employee on
    rate.employee_key = employee.employee_key
inner join work_type on
    rate.work_type_key = work_type.work_type_key
inner join project on
    project.project_key = task_instance.project_key
GROUP BY 
    project.project_key ,
    employee.first_name ,
    employee.last_name ,
    employee.id ,
    task_instance.hours , 
    work_type.type,
    work_type.descrip,
    rate.rate,
    employee.hourly_rate 
ORDER BY 
    project.proj_no 
    ;

-- 
"They that would give up essential liberty for temporary safety deserve
neither liberty nor safety."
                        -- Benjamin Franklin

Re: A GROUP BY question

From

Alban Hertroys

Date:

13 August 2019, 12:59:22

> On 13 Aug 2019, at 13:10, stan <stanb@panix.com> wrote:
>
> select
>     project.proj_no ,

Removed columns that get in the way of your desired result. You can’t have both details and the sum over them in a
meaningfulway. 

>     SUM (rate.rate * task_instance.hours)
> from
>     task_instance
> join rate on
>     rate.employee_key = task_instance.employee_key
>     AND
>     rate.work_type_key = task_instance.work_type_key

(break)

> inner join employee on
>     rate.employee_key = employee.employee_key
> inner join work_type on
>     rate.work_type_key = work_type.work_type_key

These are now probably redundant, you don’t need them unless they filter your results.

> inner join project on
>     project.project_key = task_instance.project_key

And this JOIN could be dropped if project_key and proj_no weren’t different fields. If both are unique in project, you
coulddrop one of them and keep the same functionality with fewer joins. That said, in the “war” between surrogate and
naturalkeys I’m on the natural keys side. Clearly, not everyone agrees on that. 

> GROUP BY
>     project.project_key ,

Same columns removed here too.

> ORDER BY
>     project.proj_no
>     ;

That should give you the total cost for each project.

You could get the same result repeated per employee and per work type as you tried originally, by putting the above
revisedquery as a subquery and joining that back into the full query in the place of your project-related tables (add
theproject_key so you have something to join against). 

The repeated sum risks getting multiplied in the final output though, especially if unaware people will be putting the
resultsin an Excel sheet or something. From experience, that either results in people reporting the wrong financial
results(several orders too high) or blaming your query. 

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.

Re: A GROUP BY question

From

Jan Kohnert

Date:

13 August 2019, 13:07:46

Hi Stan,

Am Dienstag, 13. August 2019, 13:10:18 CEST schrieb stan:

> Maybe I have a basic misunderstanding. What I am trying to get is a total

> cost for each project. This would be calculated by multiplying rate and

> hours for each row, on a per project base, and then summing all of th

> products of this multiplication.

First of all, complex views including many joins, group by, having etc usually tend to show bad performance on large dataset if used for anything else than simple select from ... statement. So at least my personal experience.

> CREATE OR REPLACE view tasks_view as

> select

> project.proj_no ,

> employee.first_name ,

> employee.last_name ,

> employee.id ,

> task_instance.hours ,

> work_type.type,

> work_type.descrip,

> rate.rate,

> employee.hourly_rate ,

> rate.rate * task_instance.hours as result ,

--------------------^

this gives you the product *per line*

> SUM (rate.rate * task_instance.hours)

--------------------^

this gives you *sum of all products over all grouped lines*

> from

> task_instance

> join rate on

> rate.employee_key = task_instance.employee_key

> AND

> rate.work_type_key = task_instance.work_type_key

> inner join employee on

> rate.employee_key = employee.employee_key

> inner join work_type on

> rate.work_type_key = work_type.work_type_key

> inner join project on

> project.project_key = task_instance.project_key

> GROUP BY

> project.project_key ,

> employee.first_name ,

> employee.last_name ,

> employee.id ,

> task_instance.hours ,

> work_type.type,

> work_type.descrip,

> rate.rate,

> employee.hourly_rate

---------------------^

You group by the columns you use in the sum, so you will get no sum at all, but the product *per line* as selected just before the sum

> ORDER BY

> project.proj_no

> ;

You will have to find out if you really need to group by some lines, and take a sum over those lines or need the product (rate.rate * task_instance.hours) per line.

Best regards

Jan

Re: A GROUP BY question

From

"David G. Johnston"

Date:

13 August 2019, 13:19:51

On Tuesday, August 13, 2019, Alban Hertroys <haramrae@gmail.com> wrote:

> On 13 Aug 2019, at 13:10, stan <stanb@panix.com> wrote:
>
> select
> project.proj_no ,

Removed columns that get in the way of your desired result. You can’t have both details and the sum over them in a meaningful way.

Sure you can, at least generally, with Window Functions/Expressions (i.e., OVER)

David J.

Re: A GROUP BY question

From

Alban Hertroys

Date:

13 August 2019, 14:35:05

> On 13 Aug 2019, at 15:19, David G. Johnston <david.g.johnston@gmail.com> wrote:
>
> On Tuesday, August 13, 2019, Alban Hertroys <haramrae@gmail.com> wrote:
>
> > On 13 Aug 2019, at 13:10, stan <stanb@panix.com> wrote:
> >
> > select
> >       project.proj_no ,
>
> Removed columns that get in the way of your desired result. You can’t have both details and the sum over them in a
meaningfulway. 
>
> Sure you can, at least generally, with Window Functions/Expressions (i.e., OVER)

That’s why I added “in a meaningful way” ;)

Repeating the same SUM-result on every line in a group is not what I’d call a meaningful result; the SUM has no bearing
onthe detailed line and leads to the kind of mistakes I already mentioned. 
(For the record; I do this kind of grouping in a hierarchical database regularly, but there the grouped SUM is at a
differentlevel in the hierarchy and I consider it thus sufficiently separated from the detail rows.) 

Besides, I figured the OP was already struggling with the query syntax, adding window functions into the mix didn’t
seema good idea in the context. Possible?, sure, desirable?, I would say not. 

Alban Hertroys
--
There is always an exception to always.