Thread: A GROUP BY question
I am trying to write, what is for me, a fairly complex query. It uses JOINS, and also GROUP BY. I have this working with the exception of adding the GROUP BY clause. Is there some reason I cannot add a GROUP BY function to a JOIN? Here is what I have: CREATE OR REPLACE view tasks_view as select project.proj_no , employee.first_name , employee.last_name , employee.id , task_instance.hours , work_type.type, work_type.descrip, rate.rate, employee.hourly_rate , rate.rate * task_instance.hours as result , SUM(rate.rate * task_instance.hours) ^^^^^^^^^^^^^^ from task_instance GROUP BY ^^^^^^^^^^^^^^^^^^ project.project_key ^^^^^^^^^^^^^^^^^^^^^^ join rate on rate.employee_key = task_instance.employee_key AND rate.work_type_key = task_instance.work_type_key inner join employee on rate.employee_key = employee.employee_key inner join work_type on rate.work_type_key = work_type.work_type_key inner join project on project.project_key = task_instance.project_key ORDER BY project.proj_no , employee.id ; The underlined lines are what I added, and I get a syntax error pointing to the join. This works fine without the added lines. -- "They that would give up essential liberty for temporary safety deserve neither liberty nor safety." -- Benjamin Franklin
On Tue, Aug 13, 2019 at 05:54:03AM -0400, stan wrote: > I am trying to write, what is for me, a fairly complex query. It uses JOINS, > and also GROUP BY. I have this working with the exception of adding the > GROUP BY clause. > > Is there some reason I cannot add a GROUP BY function to a JOIN? > > Here is what I have: > > > CREATE OR REPLACE view tasks_view as > select > project.proj_no , > employee.first_name , > employee.last_name , > employee.id , > task_instance.hours , > work_type.type, > work_type.descrip, > rate.rate, > employee.hourly_rate , > rate.rate * task_instance.hours as result , > SUM(rate.rate * task_instance.hours) > ^^^^^^^^^^^^^^ > from > task_instance > GROUP BY > ^^^^^^^^^^^^^^^^^^ > project.project_key > ^^^^^^^^^^^^^^^^^^^^^^ > join rate on > rate.employee_key = task_instance.employee_key > AND > rate.work_type_key = task_instance.work_type_key > inner join employee on > rate.employee_key = employee.employee_key > inner join work_type on > rate.work_type_key = work_type.work_type_key > inner join project on > project.project_key = task_instance.project_key > ORDER BY > project.proj_no , > employee.id > ; > Maybe I have a basic misunderstanding. What I am trying to get is a total cost for each project. This would be calculated by multiplying rate and hours for each row, on a per project base, and then summing all of th products of this multiplication. I did get the following to be accepted from a syntax basis, but it returns rows with the product for each row, and something in the sum column which is the same. DROP view tasks_view ; CREATE OR REPLACE view tasks_view as select project.proj_no , employee.first_name , employee.last_name , employee.id , task_instance.hours , work_type.type, work_type.descrip, rate.rate, employee.hourly_rate , rate.rate * task_instance.hours as result , SUM (rate.rate * task_instance.hours) from task_instance join rate on rate.employee_key = task_instance.employee_key AND rate.work_type_key = task_instance.work_type_key inner join employee on rate.employee_key = employee.employee_key inner join work_type on rate.work_type_key = work_type.work_type_key inner join project on project.project_key = task_instance.project_key GROUP BY project.project_key , employee.first_name , employee.last_name , employee.id , task_instance.hours , work_type.type, work_type.descrip, rate.rate, employee.hourly_rate ORDER BY project.proj_no ; -- "They that would give up essential liberty for temporary safety deserve neither liberty nor safety." -- Benjamin Franklin
> On 13 Aug 2019, at 13:10, stan <stanb@panix.com> wrote: > > select > project.proj_no , Removed columns that get in the way of your desired result. You can’t have both details and the sum over them in a meaningfulway. > SUM (rate.rate * task_instance.hours) > from > task_instance > join rate on > rate.employee_key = task_instance.employee_key > AND > rate.work_type_key = task_instance.work_type_key (break) > inner join employee on > rate.employee_key = employee.employee_key > inner join work_type on > rate.work_type_key = work_type.work_type_key These are now probably redundant, you don’t need them unless they filter your results. > inner join project on > project.project_key = task_instance.project_key And this JOIN could be dropped if project_key and proj_no weren’t different fields. If both are unique in project, you coulddrop one of them and keep the same functionality with fewer joins. That said, in the “war” between surrogate and naturalkeys I’m on the natural keys side. Clearly, not everyone agrees on that. > GROUP BY > project.project_key , Same columns removed here too. > ORDER BY > project.proj_no > ; That should give you the total cost for each project. You could get the same result repeated per employee and per work type as you tried originally, by putting the above revisedquery as a subquery and joining that back into the full query in the place of your project-related tables (add theproject_key so you have something to join against). The repeated sum risks getting multiplied in the final output though, especially if unaware people will be putting the resultsin an Excel sheet or something. From experience, that either results in people reporting the wrong financial results(several orders too high) or blaming your query. Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.
Hi Stan,
Am Dienstag, 13. August 2019, 13:10:18 CEST schrieb stan:
> Maybe I have a basic misunderstanding. What I am trying to get is a total
> cost for each project. This would be calculated by multiplying rate and
> hours for each row, on a per project base, and then summing all of th
> products of this multiplication.
First of all, complex views including many joins, group by, having etc usually tend to show bad performance on large dataset if used for anything else than simple select from ... statement. So at least my personal experience.
> CREATE OR REPLACE view tasks_view as
> select
> project.proj_no ,
> employee.first_name ,
> employee.last_name ,
> employee.id ,
> task_instance.hours ,
> work_type.type,
> work_type.descrip,
> rate.rate,
> employee.hourly_rate ,
> rate.rate * task_instance.hours as result ,
--------------------^
this gives you the product *per line*
> SUM (rate.rate * task_instance.hours)
--------------------^
this gives you *sum of all products over all grouped lines*
> from
> task_instance
> join rate on
> rate.employee_key = task_instance.employee_key
> AND
> rate.work_type_key = task_instance.work_type_key
> inner join employee on
> rate.employee_key = employee.employee_key
> inner join work_type on
> rate.work_type_key = work_type.work_type_key
> inner join project on
> project.project_key = task_instance.project_key
> GROUP BY
> project.project_key ,
> employee.first_name ,
> employee.last_name ,
> employee.id ,
> task_instance.hours ,
> work_type.type,
> work_type.descrip,
> rate.rate,
> employee.hourly_rate
---------------------^
You group by the columns you use in the sum, so you will get no sum at all, but the product *per line* as selected just before the sum
> ORDER BY
> project.proj_no
> ;
You will have to find out if you really need to group by some lines, and take a sum over those lines or need the product (rate.rate * task_instance.hours) per line.
--
Best regards
Jan
> On 13 Aug 2019, at 13:10, stan <stanb@panix.com> wrote:
>
> select
> project.proj_no ,
Removed columns that get in the way of your desired result. You can’t have both details and the sum over them in a meaningful way.
> On 13 Aug 2019, at 15:19, David G. Johnston <david.g.johnston@gmail.com> wrote: > > On Tuesday, August 13, 2019, Alban Hertroys <haramrae@gmail.com> wrote: > > > On 13 Aug 2019, at 13:10, stan <stanb@panix.com> wrote: > > > > select > > project.proj_no , > > Removed columns that get in the way of your desired result. You can’t have both details and the sum over them in a meaningfulway. > > Sure you can, at least generally, with Window Functions/Expressions (i.e., OVER) That’s why I added “in a meaningful way” ;) Repeating the same SUM-result on every line in a group is not what I’d call a meaningful result; the SUM has no bearing onthe detailed line and leads to the kind of mistakes I already mentioned. (For the record; I do this kind of grouping in a hierarchical database regularly, but there the grouped SUM is at a differentlevel in the hierarchy and I consider it thus sufficiently separated from the detail rows.) Besides, I figured the OP was already struggling with the query syntax, adding window functions into the mix didn’t seema good idea in the context. Possible?, sure, desirable?, I would say not. Alban Hertroys -- There is always an exception to always.