Thread: simple? join

simple? join

From

Frank Bax

Date:

04 January 2002, 12:15:12

EMPLOYEE table contains: emp, first, last (emp is unique key)
TIMESHEET table contains: emp, timestamp, hours

I want to report employee name and total hours.

So far, I've come up with:

SELECT min(emp.first) as first, min(emp.last) as last, sum(ts.hours) FROM timesheet ts, employee empWHERE ts.emp =
emp.empGROUPby emp.empORDER BY last, first;
 

It seems silly to specify "min" when I know there is only one value per emp!

SELECT first, last, sum(ts.hours) FROM timesheet ts, employee empWHERE ts.emp = emp.emp GROUP by emp.emp, first,
lastORDERBY last, first;
 

It seems silly to specify extraneous "group by" fields.

Is there a 'better' way to write this SQL?

Frank

Re: simple? join

From

Peter Eisentraut

Date:

06 January 2002, 00:22:18

Frank Bax writes:

> EMPLOYEE table contains: emp, first, last (emp is unique key)
> TIMESHEET table contains: emp, timestamp, hours
>
> I want to report employee name and total hours.
>
> So far, I've come up with:
>
> SELECT min(emp.first) as first, min(emp.last) as last, sum(ts.hours)
>   FROM timesheet ts, employee emp
>  WHERE ts.emp = emp.emp
>  GROUP by emp.emp
>  ORDER BY last, first;
>
> It seems silly to specify "min" when I know there is only one value per emp!

Indeed this looks strange.

> SELECT first, last, sum(ts.hours)
>   FROM timesheet ts, employee emp
>  WHERE ts.emp = emp.emp
>  GROUP by emp.emp, first, last
>  ORDER BY last, first;
>
> It seems silly to specify extraneous "group by" fields.

There's nothing "extraneous" there.  Both first and last could be
duplicated, so you need to group by each one.

> Is there a 'better' way to write this SQL?

Not really.  The latter is how I would write it.

-- 
Peter Eisentraut   peter_e@gmx.net

Re: simple? join

From

Frank Bax

Date:

07 January 2002, 17:25:14

At 12:22 AM 1/6/02 -0500, Peter Eisentraut wrote:
>Frank Bax writes:
>> EMPLOYEE table contains: emp, first, last (emp is unique key)
>> TIMESHEET table contains: emp, timestamp, hours
>> I want to report employee name and total hours.
>> SELECT first, last, sum(ts.hours)
>>   FROM timesheet ts, employee emp
>>  WHERE ts.emp = emp.emp
>>  GROUP by emp.emp, first, last
>>  ORDER BY last, first;
>>
>> It seems silly to specify extraneous "group by" fields.
>
>There's nothing "extraneous" there.  Both first and last could be
>duplicated, so you need to group by each one.

But first and last can't be duplicated if emp is defined as unique.  If I
am also selecting a dozen or so other fields from "employee" table, must I
also include them all in the GROUP BY clause, even though I know "emp"
identifies a unique row in this table?

Frank

Re: simple? join

From

Peter Eisentraut

Date:

07 January 2002, 17:54:33

Frank Bax writes:

> At 12:22 AM 1/6/02 -0500, Peter Eisentraut wrote:
> >Frank Bax writes:
> >> EMPLOYEE table contains: emp, first, last (emp is unique key)
> >> TIMESHEET table contains: emp, timestamp, hours
> >> I want to report employee name and total hours.
> >> SELECT first, last, sum(ts.hours)
> >>   FROM timesheet ts, employee emp
> >>  WHERE ts.emp = emp.emp
> >>  GROUP by emp.emp, first, last
> >>  ORDER BY last, first;
> >>
> >> It seems silly to specify extraneous "group by" fields.
> >
> >There's nothing "extraneous" there.  Both first and last could be
> >duplicated, so you need to group by each one.
>
> But first and last can't be duplicated if emp is defined as unique.

Strictly speaking, you're right.  However, by the time you get to GROUP BY
the tables have been joined so the notion of a unique constraint has been
lost.  Maybe it shouldn't, but as it stands, there's nothing you can do
better here.

> If I
> am also selecting a dozen or so other fields from "employee" table, must I
> also include them all in the GROUP BY clause, even though I know "emp"
> identifies a unique row in this table?

Yes you do.  There's the possibility to write it differently like so:

SELECT *
FROM   (SELECT emp, sum(ts.hours)    FROM timesheet ts, employee emp    WHERE ts.emp = emp.emp    GROUP by emp.emp) AS
a  INNER JOIN   (SELECT emp, first, last, more, things, here FROM employee) AS b   ON (a.emp = b.emp)
 
...

This could be useful if the second query in the inner join involves more
than one table, but on the whole this can get pretty messy.

-- 
Peter Eisentraut   peter_e@gmx.net

Re: simple? join

From

Tom Lane

Date:

07 January 2002, 18:44:34

Frank Bax <fbax@sympatico.ca> writes:
> But first and last can't be duplicated if emp is defined as unique.  If I
> am also selecting a dozen or so other fields from "employee" table, must I
> also include them all in the GROUP BY clause, even though I know "emp"
> identifies a unique row in this table?

SQL92 says you must.

SQL99, however, seems to understand the concept you are putting forward.
If I understand what they're getting at, the notion of "functional
dependency" that is defined at excruciating length in SQL99 is
essentially that a unique key functionally determines all the other
columns in its table.  And the places where SQL92 says something like
"shall reference a grouping column of T or shall be specified within a
<set function specification>" are changed in SQL99 to read "shall
reference a column that is functionally dependent on G or shall be
specified within a <set function specification>" (G being the set of
grouping columns).

Also, SQL99 considers this an optional feature:
        3) Without Feature T301, "Functional dependencies", if T is           a grouped table, then in each <value
expression>,each           <column reference> that references a column of T shall           reference a grouping column
orbe specified in a <set function           specification>.

So, when and if we get around to implementing this particular SQL99
feature, what you are suggesting will work.  Right now it doesn't
(and I'll wager that darn few other SQL implementations support this
feature as yet, either).

BTW, SQL99's GROUP BY clause seems to have a whole bunch of baroque
new features besides this one.
        regards, tom lane

Re: simple? join

From

"Christopher Kings-Lynne"

Date:

07 January 2002, 20:54:25

> So, when and if we get around to implementing this particular SQL99
> feature, what you are suggesting will work.  Right now it doesn't
> (and I'll wager that darn few other SQL implementations support this
> feature as yet, either).

I think MySQL does.  I always found it strange that I had to add columns to
the group by clause in postgres that were functionally dependent on the
"real" grouped variable.

In fact, before I realised that you had to add them to the group by clause,
I just wrapped the functionally dependent fields in a MAX function (or min,
whatever).

ie. SELECT orders.product_id, MAX(product_name), MAX(unit_price) FROM
products NATURAL JOIN orders GROUP BY orders.product_id;

In fact, if one created a 'do nothing' aggregate function, you could avoid
having to add things to the group by clause in a cleaner fashion.

Chris

Re: simple? join

From

Tom Lane

Date:

07 January 2002, 22:44:21

"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
>> So, when and if we get around to implementing this particular SQL99
>> feature, what you are suggesting will work.  Right now it doesn't
>> (and I'll wager that darn few other SQL implementations support this
>> feature as yet, either).

> I think MySQL does.

Do they bother to check validity of the column reference at all?
The cheesy way to do this would be not to check; if you referenced a
non-functionally-dependent column, you'd get a randomly chosen element
value with no warning... but for correct queries it would look like
they had support.
        regards, tom lane

Re: simple? join

From

Frank Bax

Date:

08 January 2002, 14:24:40

At 05:52 PM 1/7/02 -0500, Peter Eisentraut wrote:
>Frank Bax writes:
>
>> At 12:22 AM 1/6/02 -0500, Peter Eisentraut wrote:
>> >Frank Bax writes:
>> >> EMPLOYEE table contains: emp, first, last (emp is unique key)
>> >> TIMESHEET table contains: emp, timestamp, hours
>> >> I want to report employee name and total hours.
>> >> SELECT first, last, sum(ts.hours)
>> >>   FROM timesheet ts, employee emp
>> >>  WHERE ts.emp = emp.emp
>> >>  GROUP by emp.emp, first, last
>> >>  ORDER BY last, first;
>> >>
>> >> It seems silly to specify extraneous "group by" fields.
>> >
>> >There's nothing "extraneous" there.  Both first and last could be
>> >duplicated, so you need to group by each one.
>>
>> But first and last can't be duplicated if emp is defined as unique.
>
>Strictly speaking, you're right.  However, by the time you get to GROUP BY
>the tables have been joined so the notion of a unique constraint has been
>lost.  Maybe it shouldn't, but as it stands, there's nothing you can do
>better here.
>
>> If I
>> am also selecting a dozen or so other fields from "employee" table, must I
>> also include them all in the GROUP BY clause, even though I know "emp"
>> identifies a unique row in this table?
>
>Yes you do.  There's the possibility to write it differently like so:
>
>SELECT *
>FROM
>    (SELECT emp, sum(ts.hours)
>     FROM timesheet ts, employee emp
>     WHERE ts.emp = emp.emp
>     GROUP by emp.emp) AS a
>    INNER JOIN
>    (SELECT emp, first, last, more, things, here FROM employee) AS b
>    ON (a.emp = b.emp)
>...
>
>This could be useful if the second query in the inner join involves more
>than one table, but on the whole this can get pretty messy.

Thanks for the sample!! A few more changes, and it's quite readable:
SELECT * FROM   (SELECT emp, sum(hours) FROM timesheet GROUP by emp) AS ts INNER JOIN   (SELECT emp, first, last FROM
employee)AS emp ON (ts.emp = emp.emp)
 

As an added bonus, this runs almost twice as fast as either of the
originals (both original versions used same plan)!

Frank