Re: Help with query involving aggregation and joining. - Mailing list pgsql-sql
From | Eddie Cheung |
---|---|
Subject | Re: Help with query involving aggregation and joining. |
Date | |
Msg-id | 20030224051836.25599.qmail@web80406.mail.yahoo.com Whole thread Raw |
In response to | Re: Help with query involving aggregation and joining. (Bruno Wolff III <bruno@wolff.to>) |
Responses |
Re: Help with query involving aggregation and joining.
|
List | pgsql-sql |
Hi, I was very glad to see the replies from you guys this morning. The two suggested SQL queries did not return the expected results, but will help me to explain the problem I am facing further. 1) Josh suggested the following query. (I have made minor changes by adding the table name to each field) SELECT history.id, history.courseId, course.name, MAX(history.submission) AS submission FROM history JOIN course ON history.courseId = course.Id GROUP BY history.id, history.courseId, course.name ORDER BY course.name; The results returned are:id | courseid | name | submission ----+----------+-----------+------------ 2 | 102 | Chemistry | 2002-02-17 4 | 102 | Chemistry | 2002-02-22 3 | 104 | Maths | 2002-04-30 5 | 104 | Maths | 2002-03-15 6 | 104 | Maths | 2002-01-21 1 | 101| Physics | 2002-01-20 There are duplicate courses because the history.id column has different values. The history.id cannot be use in the GROUP BY clause. But it is one of the displaying field, so I could not remove it from the GROUP BY clause either. 2) Bruno suggested the following query: select distinct on (course.courseid) history.id, course.courseid, course.name, history.submission from course natural join history order by course.courseid, history.submission desc; I have not used NATURAL JOIN before, but from what I know, it joins the columns with the same name. Since the joining columns of History and Course have different names, I have replace JOIN clause. Please let me know if I have made a mistake. The modified query is: SELECT DISTINCT ON (course.id) course.id, history.id, course.name, history.submission FROM history JOIN course ON history.courseId = course.id ORDER BY course.id, history.submission desc; The results returned are :id | id | name | submission -----+----+-----------+------------101 | 1 | Physics | 2002-01-20102 | 4 | Chemistry | 2002-02-22104 | 3 | Maths | 2002-04-30 The problem here is that the results are not ordered by the submission date. If I sort by "history.submission" first, I get ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY expressions. Please note that I cannot select distinct on the course.name either because it is NOT unique. The original tables are much larger, and the only unique column is the id. I have included the queries to create the tables here. ------------------------ CREATE TABLE course (id integer,name varchar(32), Constraint course_pkey Primary Key (id) ); CREATE TABLE history (id integer NOT NULL,courseid integer REFERENCES course(id),submission date,Constraint history_pkeyPrimary Key (id) ); INSERT INTO course (id,name) VALUES (101,'Physics'); INSERT INTO course (id,name) VALUES (102,'Chemistry'); INSERT INTO course (id,name) VALUES (103,'Biology'); INSERT INTO course (id,name) VALUES (104,'Maths'); INSERT INTO course (id,name) VALUES (105,'English'); INSERT INTO history (id,courseid,submission) VALUES (1,101,'2002-01-20'); INSERT INTO history (id,courseid,submission) VALUES (2,102,'2002-02-17'); INSERT INTO history (id,courseid,submission) VALUES (3,104,'2002-04-30'); INSERT INTO history (id,courseid,submission) VALUES (4,102,'2002-02-22'); INSERT INTO history (id,courseid,submission) VALUES (5,104,'2002-03-15'); INSERT INTO history (id,courseid,submission) VALUES (6,104,'2002-01-21'); -------------------------------- Thanks for all your help. Regards, Eddie --- Bruno Wolff III <bruno@wolff.to> wrote: > On Sun, Feb 23, 2003 at 11:02:27 -0800, > Eddie Cheung <vampyre5@yahoo.com> wrote: > > > > HISTORY > > ======= > > id | courseId | submission > > ---+-----------+------------- > > 1 | 101 | 2002-01-20 > > 2 | 102 | 2002-02-17 > > 3 | 104 | 2002-04-30 > > 4 | 102 | 2002-02-22 > > 5 | 104 | 2002-03-15 > > 6 | 104 | 2002-01-21 > > > > > > COURSE > > ====== > > id | name > > ------+----------- > > 101 | Physics > > 102 | Chemistry > > 103 | Biology > > 104 | Maths > > 105 | English > > > > > > Basically I would like to display the latest > > submission for each course in a table as shown > below, > > order by name of the courses. > > > > Query Results: > > ============== > > id | courseId | name | submission > > --------------------------------------- > > 4 | 102 | Chemisty | 2002-02-22 > > 3 | 104 | Maths | 2002-04-30 > > 1 | 101 | Physics | 2002-01-20 > > I think you want to do something like: > > select distinct on (course.courseid) > history.id, course.courseid, course.name, > history.submission > from course natural join history > order by course.courseid, history.submission desc; > > ---------------------------(end of > broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org __________________________________________________ Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now. http://mailplus.yahoo.com