Thread: simple? join
EMPLOYEE table contains: emp, first, last (emp is unique key) TIMESHEET table contains: emp, timestamp, hours I want to report employee name and total hours. So far, I've come up with: SELECT min(emp.first) as first, min(emp.last) as last, sum(ts.hours) FROM timesheet ts, employee empWHERE ts.emp = emp.empGROUPby emp.empORDER BY last, first; It seems silly to specify "min" when I know there is only one value per emp! SELECT first, last, sum(ts.hours) FROM timesheet ts, employee empWHERE ts.emp = emp.emp GROUP by emp.emp, first, lastORDERBY last, first; It seems silly to specify extraneous "group by" fields. Is there a 'better' way to write this SQL? Frank
Frank Bax writes: > EMPLOYEE table contains: emp, first, last (emp is unique key) > TIMESHEET table contains: emp, timestamp, hours > > I want to report employee name and total hours. > > So far, I've come up with: > > SELECT min(emp.first) as first, min(emp.last) as last, sum(ts.hours) > FROM timesheet ts, employee emp > WHERE ts.emp = emp.emp > GROUP by emp.emp > ORDER BY last, first; > > It seems silly to specify "min" when I know there is only one value per emp! Indeed this looks strange. > SELECT first, last, sum(ts.hours) > FROM timesheet ts, employee emp > WHERE ts.emp = emp.emp > GROUP by emp.emp, first, last > ORDER BY last, first; > > It seems silly to specify extraneous "group by" fields. There's nothing "extraneous" there. Both first and last could be duplicated, so you need to group by each one. > Is there a 'better' way to write this SQL? Not really. The latter is how I would write it. -- Peter Eisentraut peter_e@gmx.net
At 12:22 AM 1/6/02 -0500, Peter Eisentraut wrote: >Frank Bax writes: >> EMPLOYEE table contains: emp, first, last (emp is unique key) >> TIMESHEET table contains: emp, timestamp, hours >> I want to report employee name and total hours. >> SELECT first, last, sum(ts.hours) >> FROM timesheet ts, employee emp >> WHERE ts.emp = emp.emp >> GROUP by emp.emp, first, last >> ORDER BY last, first; >> >> It seems silly to specify extraneous "group by" fields. > >There's nothing "extraneous" there. Both first and last could be >duplicated, so you need to group by each one. But first and last can't be duplicated if emp is defined as unique. If I am also selecting a dozen or so other fields from "employee" table, must I also include them all in the GROUP BY clause, even though I know "emp" identifies a unique row in this table? Frank
Frank Bax writes: > At 12:22 AM 1/6/02 -0500, Peter Eisentraut wrote: > >Frank Bax writes: > >> EMPLOYEE table contains: emp, first, last (emp is unique key) > >> TIMESHEET table contains: emp, timestamp, hours > >> I want to report employee name and total hours. > >> SELECT first, last, sum(ts.hours) > >> FROM timesheet ts, employee emp > >> WHERE ts.emp = emp.emp > >> GROUP by emp.emp, first, last > >> ORDER BY last, first; > >> > >> It seems silly to specify extraneous "group by" fields. > > > >There's nothing "extraneous" there. Both first and last could be > >duplicated, so you need to group by each one. > > But first and last can't be duplicated if emp is defined as unique. Strictly speaking, you're right. However, by the time you get to GROUP BY the tables have been joined so the notion of a unique constraint has been lost. Maybe it shouldn't, but as it stands, there's nothing you can do better here. > If I > am also selecting a dozen or so other fields from "employee" table, must I > also include them all in the GROUP BY clause, even though I know "emp" > identifies a unique row in this table? Yes you do. There's the possibility to write it differently like so: SELECT * FROM (SELECT emp, sum(ts.hours) FROM timesheet ts, employee emp WHERE ts.emp = emp.emp GROUP by emp.emp) AS a INNER JOIN (SELECT emp, first, last, more, things, here FROM employee) AS b ON (a.emp = b.emp) ... This could be useful if the second query in the inner join involves more than one table, but on the whole this can get pretty messy. -- Peter Eisentraut peter_e@gmx.net
Frank Bax <fbax@sympatico.ca> writes: > But first and last can't be duplicated if emp is defined as unique. If I > am also selecting a dozen or so other fields from "employee" table, must I > also include them all in the GROUP BY clause, even though I know "emp" > identifies a unique row in this table? SQL92 says you must. SQL99, however, seems to understand the concept you are putting forward. If I understand what they're getting at, the notion of "functional dependency" that is defined at excruciating length in SQL99 is essentially that a unique key functionally determines all the other columns in its table. And the places where SQL92 says something like "shall reference a grouping column of T or shall be specified within a <set function specification>" are changed in SQL99 to read "shall reference a column that is functionally dependent on G or shall be specified within a <set function specification>" (G being the set of grouping columns). Also, SQL99 considers this an optional feature: 3) Without Feature T301, "Functional dependencies", if T is a grouped table, then in each <value expression>,each <column reference> that references a column of T shall reference a grouping column orbe specified in a <set function specification>. So, when and if we get around to implementing this particular SQL99 feature, what you are suggesting will work. Right now it doesn't (and I'll wager that darn few other SQL implementations support this feature as yet, either). BTW, SQL99's GROUP BY clause seems to have a whole bunch of baroque new features besides this one. regards, tom lane
> So, when and if we get around to implementing this particular SQL99 > feature, what you are suggesting will work. Right now it doesn't > (and I'll wager that darn few other SQL implementations support this > feature as yet, either). I think MySQL does. I always found it strange that I had to add columns to the group by clause in postgres that were functionally dependent on the "real" grouped variable. In fact, before I realised that you had to add them to the group by clause, I just wrapped the functionally dependent fields in a MAX function (or min, whatever). ie. SELECT orders.product_id, MAX(product_name), MAX(unit_price) FROM products NATURAL JOIN orders GROUP BY orders.product_id; In fact, if one created a 'do nothing' aggregate function, you could avoid having to add things to the group by clause in a cleaner fashion. Chris
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes: >> So, when and if we get around to implementing this particular SQL99 >> feature, what you are suggesting will work. Right now it doesn't >> (and I'll wager that darn few other SQL implementations support this >> feature as yet, either). > I think MySQL does. Do they bother to check validity of the column reference at all? The cheesy way to do this would be not to check; if you referenced a non-functionally-dependent column, you'd get a randomly chosen element value with no warning... but for correct queries it would look like they had support. regards, tom lane
At 05:52 PM 1/7/02 -0500, Peter Eisentraut wrote: >Frank Bax writes: > >> At 12:22 AM 1/6/02 -0500, Peter Eisentraut wrote: >> >Frank Bax writes: >> >> EMPLOYEE table contains: emp, first, last (emp is unique key) >> >> TIMESHEET table contains: emp, timestamp, hours >> >> I want to report employee name and total hours. >> >> SELECT first, last, sum(ts.hours) >> >> FROM timesheet ts, employee emp >> >> WHERE ts.emp = emp.emp >> >> GROUP by emp.emp, first, last >> >> ORDER BY last, first; >> >> >> >> It seems silly to specify extraneous "group by" fields. >> > >> >There's nothing "extraneous" there. Both first and last could be >> >duplicated, so you need to group by each one. >> >> But first and last can't be duplicated if emp is defined as unique. > >Strictly speaking, you're right. However, by the time you get to GROUP BY >the tables have been joined so the notion of a unique constraint has been >lost. Maybe it shouldn't, but as it stands, there's nothing you can do >better here. > >> If I >> am also selecting a dozen or so other fields from "employee" table, must I >> also include them all in the GROUP BY clause, even though I know "emp" >> identifies a unique row in this table? > >Yes you do. There's the possibility to write it differently like so: > >SELECT * >FROM > (SELECT emp, sum(ts.hours) > FROM timesheet ts, employee emp > WHERE ts.emp = emp.emp > GROUP by emp.emp) AS a > INNER JOIN > (SELECT emp, first, last, more, things, here FROM employee) AS b > ON (a.emp = b.emp) >... > >This could be useful if the second query in the inner join involves more >than one table, but on the whole this can get pretty messy. Thanks for the sample!! A few more changes, and it's quite readable: SELECT * FROM (SELECT emp, sum(hours) FROM timesheet GROUP by emp) AS ts INNER JOIN (SELECT emp, first, last FROM employee)AS emp ON (ts.emp = emp.emp) As an added bonus, this runs almost twice as fast as either of the originals (both original versions used same plan)! Frank