Re: Help with query involving aggregation and joining. - Mailing list pgsql-sql

From Eddie Cheung
Subject Re: Help with query involving aggregation and joining.
Date
Msg-id 20030224051836.25599.qmail@web80406.mail.yahoo.com
Whole thread Raw
In response to Re: Help with query involving aggregation and joining.  (Bruno Wolff III <bruno@wolff.to>)
Responses Re: Help with query involving aggregation and joining.
List pgsql-sql
Hi,

I was very glad to see the replies from you guys this
morning.  The two suggested SQL queries did not return
the expected results, but will help me to explain the
problem I am facing further.

1) Josh suggested the following query. (I have made
minor changes by adding the table name to each field)
  SELECT history.id, history.courseId, course.name,
MAX(history.submission) AS submission  FROM history JOIN course ON history.courseId =
course.Id  GROUP BY history.id, history.courseId, course.name  ORDER BY course.name;

The results returned are:id | courseid |   name    | submission
----+----------+-----------+------------ 2 |      102 | Chemistry | 2002-02-17 4 |      102 | Chemistry | 2002-02-22 3
|     104 | Maths     | 2002-04-30 5 |      104 | Maths     | 2002-03-15 6 |      104 | Maths     | 2002-01-21 1 |
101| Physics   | 2002-01-20
 

There are duplicate courses because the history.id
column has different values. The history.id cannot be
use in the GROUP BY clause. But it is one of the
displaying field, so I could not remove it from the
GROUP BY clause either.

2) Bruno suggested the following query:
 select distinct on (course.courseid) history.id, course.courseid, course.name,
history.submission from course natural join history order by course.courseid, history.submission desc; 
I have not used NATURAL JOIN before, but from what I
know, it joins the columns with the same name. Since
the joining columns of History and Course have
different names, I have replace JOIN clause. Please
let me know if I have made a mistake.

The modified query is:  SELECT DISTINCT ON (course.id) course.id,
history.id, course.name, history.submission   FROM history JOIN course ON history.courseId =
course.id   ORDER BY course.id, history.submission desc;

The results returned are :id  | id |   name    | submission
-----+----+-----------+------------101 |  1 | Physics   | 2002-01-20102 |  4 | Chemistry | 2002-02-22104 |  3 | Maths
 | 2002-04-30
 
The problem here is that the results are not ordered
by the submission date. If I sort by
"history.submission" first, I get  ERROR: SELECT DISTINCT ON expressions must match
initial ORDER BY expressions.
Please note that I cannot select distinct on the
course.name either because it is NOT unique. The
original tables are much larger, and the only unique
column is the id.

I have included the queries to create the tables here.

------------------------

CREATE TABLE course (id integer,name varchar(32),  Constraint course_pkey Primary Key (id)
);

CREATE TABLE history (id integer NOT NULL,courseid integer REFERENCES course(id),submission date,Constraint
history_pkeyPrimary Key (id)
 
);

INSERT INTO course (id,name) VALUES (101,'Physics');
INSERT INTO course (id,name) VALUES (102,'Chemistry');
INSERT INTO course (id,name) VALUES (103,'Biology');
INSERT INTO course (id,name) VALUES (104,'Maths');
INSERT INTO course (id,name) VALUES (105,'English');

INSERT INTO history (id,courseid,submission) VALUES
(1,101,'2002-01-20');
INSERT INTO history (id,courseid,submission) VALUES
(2,102,'2002-02-17');
INSERT INTO history (id,courseid,submission) VALUES
(3,104,'2002-04-30');
INSERT INTO history (id,courseid,submission) VALUES
(4,102,'2002-02-22');
INSERT INTO history (id,courseid,submission) VALUES
(5,104,'2002-03-15');
INSERT INTO history (id,courseid,submission) VALUES
(6,104,'2002-01-21');

--------------------------------


Thanks for all your help.


Regards,
Eddie



--- Bruno Wolff III <bruno@wolff.to> wrote:
> On Sun, Feb 23, 2003 at 11:02:27 -0800,
>   Eddie Cheung <vampyre5@yahoo.com> wrote:
> > 
> > HISTORY
> > =======
> > id | courseId  | submission
> > ---+-----------+-------------
> >  1 |  101      | 2002-01-20 
> >  2 |  102      | 2002-02-17
> >  3 |  104      | 2002-04-30
> >  4 |  102      | 2002-02-22
> >  5 |  104      | 2002-03-15
> >  6 |  104      | 2002-01-21
> >  
> >  
> > COURSE
> > ====== 
> >   id  | name
> > ------+-----------
> >   101 | Physics
> >   102 | Chemistry
> >   103 | Biology
> >   104 | Maths
> >   105 | English
> > 
> > 
> > Basically I would like to display the latest
> > submission for each course in a table as shown
> below,
> > order by name of the courses.
> > 
> > Query Results:
> > ==============
> >  id | courseId |  name    | submission
> > ---------------------------------------
> >  4  |  102     | Chemisty | 2002-02-22
> >  3  |  104     | Maths    | 2002-04-30
> >  1  |  101     | Physics  | 2002-01-20
> 
> I think you want to do something like:
> 
> select distinct on (course.courseid)
>   history.id, course.courseid, course.name,
> history.submission
>   from course natural join history
>   order by course.courseid, history.submission desc;
> 
> ---------------------------(end of
> broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to
majordomo@postgresql.org


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com


pgsql-sql by date:

Previous
From: Eddie Cheung
Date:
Subject: Re: Help with query involving aggregation and joining.
Next
From: Josh Berkus
Date:
Subject: Re: Help with query involving aggregation and joining.