Thread: Tricky join question
Hi all I have a join problem that seems to be too difficult for me to solve: I have: table person id integer, name varchar(32) data: 1,"Jack" 2,"Jill" 3,"Bob" table course id integer, name varchar(32) data: 1,"SQL Beginner" 2,"SQL Advanced" table person_course person_id number, course_id number data: (currently empty) Now, I would like to know for every person the courses they have taken. In mysql, the following statement: SELECT c.id, c.name, pc.person_id FROM person as p left outer join person_course as pc on p.id = pc.person_id right outer join course as c on pc.course_id = c.id where p.id = 2 order by 1; will get me the following result: +---+-----------------------------------------+----------+ | id| name | person_id| +---+-----------------------------------------+----------+ | 1 | SQL Beginner | | | 2 | SQL Advanced | | +---+-----------------------------------------+----------+ Can I get Postgres to give me the same result somehow? The above statement will return nothing at all under postgres. Bye Tim
am Fri, dem 22.12.2006, um 12:12:06 +0100 mailte Tim Tassonis folgendes: > Hi all > > I have a join problem that seems to be too difficult for me to solve: > > I have: > > table person > id integer, > name varchar(32) > > data: > > 1,"Jack" > 2,"Jill" > 3,"Bob" > > > table course > id integer, > name varchar(32) > > data: > > 1,"SQL Beginner" > 2,"SQL Advanced" > > > > table person_course > person_id number, > course_id number > > data: First, you should use referential integrity: test=# create table person(id int primary key, name text); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "person_pkey" for table "person" CREATE TABLE test=# create table course(id int primary key, name text); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "course_pkey" for table "course" CREATE TABLE test=# create table person_course(person_id int references person, course_id int references course ); CREATE TABLE > > (currently empty) Okay, i insert some data: test=# insert into person_course values (1,1); INSERT 0 1 test=# insert into person_course values (3,1); INSERT 0 1 test=# insert into person_course values (3,2); INSERT 0 1 > > > Now, I would like to know for every person the courses they have taken. Similar to your result: test=# select b.id, b.name, c.id from person_course a left join course b on a.course_id=b.id left join person c on a.person_id=c.id; id | name | id ----+--------------+---- 1 | SQL Beginner | 1 1 | SQL Beginner | 3 2 | SQL Advanced | 3 (3 rows) In my opinion better: test=# select c.id, c.name, b.name from person_course a left join course b on a.course_id=b.id left join person c on a.person_id=c.id; id | name | name ----+------+-------------- 1 | Jack | SQL Beginner 3 | Bob | SQL Beginner 3 | Bob | SQL Advanced (3 rows) Please, read more about referential integrity. Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47215, D1: 0160/7141639 (mehr: -> Header) GnuPG-ID: 0x3FFF606C, privat 0x7F4584DA http://wwwkeys.de.pgp.net
Hi Andreas > First, you should use referential integrity: I do, that is not the point. It was a simplified data model. Of course I have primary keys and stuff, but they don't affect join behaviour at all. > > test=# create table person(id int primary key, name text); > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "person_pkey" for table "person" > CREATE TABLE > test=# create table course(id int primary key, name text); > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "course_pkey" for table "course" > CREATE TABLE > test=# create table person_course(person_id int references person, course_id int references course ); > CREATE TABLE > Well, you shouldn't, thats not my problem. > > > >> >> Now, I would like to know for every person the courses they have taken. > > Similar to your result: > > test=# select b.id, b.name, c.id from person_course a left join course b on a.course_id=b.id left join person c on a.person_id=c.id; > id | name | id > ----+--------------+---- > 1 | SQL Beginner | 1 > 1 | SQL Beginner | 3 > 2 | SQL Advanced | 3 > (3 rows) This is absolutely not what I want. I want a row for every person and every course, regardless whether the person has taken the course or not. If the person has not taken the course, I want a null value in the person id column: SELECT c.id, c.name, pc.person_id FROM person as p left outer join person_course as pc on p.id = pc.person_id right outer join course as c on pc.course_id = c.id where p.id = 2 order by 1; +---+-----------------------------------------+----------+ | id| name | person_id| +---+-----------------------------------------+----------+ | 1 | SQL Beginner | | | 2 | SQL Advanced | | +---+-----------------------------------------+----------+ Note here that I restrict my select to the person with the ID 2. Since this person has not taken any course, the person_id is null. If I leave the restriction on the person, I get person times courses rows, the person_id only filled when a person has actually taken a course. With the rows you added person_course and without restrictin to a specific person, the result of your query should be: +---+-----------------------------------------+------------+ | id| name | person_id | +---+-----------------------------------------+------------+ | 1 | SQL Beginner | 1 | | 1 | SQL Beginner | | | 1 | SQL Beginner | 3 | | 2 | SQL Advanced | | | 2 | SQL Advanced | | | 2 | SQL Advanced | 3 | +---+-----------------------------------------+------------+ In mysql, you get this with the following clause: SELECT c.id, c.name, pc.person_id FROM person as p left outer join person_course as pc on p.id = pc.person_id right outer join course as c on pc.course_id = c.id order by 1; > Please, read more about referential integrity. Thanks, but I already have read a lot about it 14 years ago. Bye Tim
Hi, Without restriction you're getting: On Fri, Dec 22, 2006 at 02:55:56PM +0100, Tim Tassonis wrote: > +---+-----------------------------------------+------------+ > | id| name | person_id | > +---+-----------------------------------------+------------+ > | 1 | SQL Beginner | 1 | > | 1 | SQL Beginner | | > | 1 | SQL Beginner | 3 | > | 2 | SQL Advanced | | > | 2 | SQL Advanced | | > | 2 | SQL Advanced | 3 | > +---+-----------------------------------------+------------+ There are no rows in the table with person_id=2, so PostgreSQL is returning the correct result (no rows). Seems like a bug in MySQL. > In mysql, you get this with the following clause: > > SELECT c.id, c.name, pc.person_id > FROM person as p > left outer join person_course as pc on p.id = pc.person_id > right outer join course as c on pc.course_id = c.id > order by 1; I think what you want is to apply to restriction on person earlier, maybe: SELECT c.id, c.name, pc.person_id FROM person as p left outer join person_course as pc on (p.id = pc.person_id and p.id = 2) right outer join course as c on pc.course_id = c.id order by 1; Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Tim Tassonis wrote: > Hi Andreas > >> First, you should use referential integrity: > > > I do, that is not the point. It was a simplified data model. Of course I > have primary keys and stuff, but they don't affect join behaviour at all. > >> >> test=# create table person(id int primary key, name text); >> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index >> "person_pkey" for table "person" >> CREATE TABLE >> test=# create table course(id int primary key, name text); >> NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index >> "course_pkey" for table "course" >> CREATE TABLE >> test=# create table person_course(person_id int references person, >> course_id int references course ); >> CREATE TABLE >> > > Well, you shouldn't, thats not my problem. > >> >> >> >>> >>> Now, I would like to know for every person the courses they have taken. >> >> >> Similar to your result: >> >> test=# select b.id, b.name, c.id from person_course a left join course >> b on a.course_id=b.id left join person c on a.person_id=c.id; >> id | name | id >> ----+--------------+---- >> 1 | SQL Beginner | 1 >> 1 | SQL Beginner | 3 >> 2 | SQL Advanced | 3 >> (3 rows) > > > This is absolutely not what I want. I want a row for every person and > every course, regardless whether the person has taken the course or not. > If the person has not taken the course, I want a null value in the > person id column: > > SELECT c.id, c.name, pc.person_id > FROM person as p > left outer join person_course as pc on p.id = pc.person_id > right outer join course as c on pc.course_id = c.id > where p.id = 2 order by 1; > > +---+-----------------------------------------+----------+ > | id| name | person_id| > +---+-----------------------------------------+----------+ > | 1 | SQL Beginner | | > | 2 | SQL Advanced | | > +---+-----------------------------------------+----------+ > > Note here that I restrict my select to the person with the ID 2. Since > this person has not taken any course, the person_id is null. If I leave > the restriction on the person, I get person times courses rows, the > person_id only filled when a person has actually taken a course. > > With the rows you added person_course and without restrictin to a > specific person, the result of your query should be: > > > +---+-----------------------------------------+------------+ > | id| name | person_id | > +---+-----------------------------------------+------------+ > | 1 | SQL Beginner | 1 | > | 1 | SQL Beginner | | > | 1 | SQL Beginner | 3 | > | 2 | SQL Advanced | | > | 2 | SQL Advanced | | > | 2 | SQL Advanced | 3 | > +---+-----------------------------------------+------------+ > > In mysql, you get this with the following clause: > > SELECT c.id, c.name, pc.person_id > FROM person as p > left outer join person_course as pc on p.id = pc.person_id > right outer join course as c on pc.course_id = c.id > order by 1; > >> Please, read more about referential integrity. > > > Thanks, but I already have read a lot about it 14 years ago. > > Bye > Tim > In all your long years of experience, perhaps you haven't come across this? http://catb.org/~esr/faqs/smart-questions.html If you're going to ask a question here the least you could do is meet us half-way. b
Tim Tassonis <timtas@cubic.ch> writes: > In mysql, the following statement: > SELECT c.id, c.name, pc.person_id > FROM person as p > left outer join person_course as pc on p.id = pc.person_id > right outer join course as c on pc.course_id = c.id > where p.id = 2 order by 1; > will get me the following result: > +---+-----------------------------------------+----------+ > | id| name | person_id| > +---+-----------------------------------------+----------+ > | 1 | SQL Beginner | | > | 2 | SQL Advanced | | > +---+-----------------------------------------+----------+ Really? It would be unbelievably broken if so, but a quick experiment with mysql 5.0.27 says they return an empty set same as us. You *would* get that answer without the WHERE clause, but neither of those rows meet the WHERE. Look at the complete join output: regression=# SELECT * FROM person as p left outer join person_course as pc on p.id = pc.person_id right outer join course as c on pc.course_id = c.id ; id | name | person_id | course_id | id | name ----+------+-----------+-----------+----+-------------- | | | | 1 | SQL Beginner | | | | 2 | SQL Advanced (2 rows) The person-left-join-person_course join produces rows, but none of them can match course during the right join, so they don't get through. I think what you want might be a full join for the second step: regression=# SELECT * FROM person as p left outer join person_course as pc on p.id = pc.person_id full outer join course as c on pc.course_id = c.id ; id | name | person_id | course_id | id | name ----+------+-----------+-----------+----+-------------- 1 | Jack | | | | 2 | Jill | | | | 3 | Bob | | | | | | | | 1 | SQL Beginner | | | | 2 | SQL Advanced (5 rows) regression=# SELECT * FROM person as p left outer join person_course as pc on p.id = pc.person_id full outer join course as c on pc.course_id = c.id where p.id = 2 order by 1; id | name | person_id | course_id | id | name ----+------+-----------+-----------+----+------ 2 | Jill | | | | (1 row) regression=# insert into person_course values(2,2); INSERT 0 1 regression=# SELECT * FROM person as p left outer join person_course as pc on p.id = pc.person_id full outer join course as c on pc.course_id = c.id where p.id = 2 order by 1; id | name | person_id | course_id | id | name ----+------+-----------+-----------+----+-------------- 2 | Jill | 2 | 2 | 2 | SQL Advanced (1 row) BTW, I tried to duplicate this in mysql and was surprised to find that 5.0.27 doesn't seem to support full join at all :-( regards, tom lane
Tim Tassonis <timtas@cubic.ch> schrieb: > This is absolutely not what I want. I want a row for every person and every > course, regardless whether the person has taken the course or not. If the > person has not taken the course, I want a null value in the person id > column: test=# select c.id, c.name, pc.person_id from course c cross join person p left outer join person_course pc on (p.id,c.id)=(pc.person_id,pc.course_id); id | name | person_id ----+--------------+----------- 1 | SQL Beginner | 1 1 | SQL Beginner | 1 | SQL Beginner | 3 2 | SQL Advanced | 2 | SQL Advanced | 2 | SQL Advanced | 3 (6 rows) But i think, this is a little bit stupid, because row 4 and 5 are the same. Perhaps this would be better: test=# select c.id, c.name, p.id as person, pc.person_id from course c cross join person p left outer join person_course pc on (p.id,c.id)=(pc.person_id,pc.course_id); id | name | person | person_id ----+--------------+--------+----------- 1 | SQL Beginner | 1 | 1 1 | SQL Beginner | 2 | 1 | SQL Beginner | 3 | 3 2 | SQL Advanced | 1 | 2 | SQL Advanced | 2 | 2 | SQL Advanced | 3 | 3 (6 rows) Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
Thanks to you all for your replies. I was able to solve my problem after some more reading in the manual: select c.id, c.name, pc.person_id from person as p cross join course as c left outer join person_course as pc on (p.id = pc.person_id and c.id = pc.course_id) where p.id = 2; A few remarks to the answers: I seem to have been misunderstood in a way that people understood that I implied that mysql is right and postgres is wrong. This was in no way my opinion. I just reported what results I got under mysql. I very much prefer Postgres over mysql and never questioned postgres' correctness. To brian: >>> Please, read more about referential integrity. >> >> >> Thanks, but I already have read a lot about it 14 years ago. >> >> Bye >> Tim >> > > In all your long years of experience, perhaps you haven't come across this? > > http://catb.org/~esr/faqs/smart-questions.html > > If you're going to ask a question here the least you could do is meet us half-way. > I think I asked quite politely, did not blame anybody else and just described my problem. I agree, I felt a bit insulted when being told to read about referential integrity, because that clearly had nothing to do with my question and I do know about it. I'm coming from Oracle and not from mysql. And thanks Andreas, I just saw your latest post which has the same solution as I got in it. Bye Tim