Thread: Tricky join question

Tricky join question

From
Tim Tassonis
Date:
Hi all

I have a join problem that seems to be too difficult for me to solve:

I have:

table person
    id          integer,
    name        varchar(32)

data:

1,"Jack"
2,"Jill"
3,"Bob"


table course
    id         integer,
    name       varchar(32)

data:

1,"SQL Beginner"
2,"SQL Advanced"



table person_course
    person_id  number,
    course_id  number

data:

(currently empty)


Now, I would like to know for every person the courses they have taken.

In mysql, the following statement:

SELECT c.id, c.name, pc.person_id
FROM  person as p
       left outer join person_course as pc on p.id = pc.person_id
       right outer join course as c on pc.course_id = c.id
where p.id = 2 order by 1;


will get me the following result:

+---+-----------------------------------------+----------+
| id| name                                    | person_id|
+---+-----------------------------------------+----------+
| 1 | SQL Beginner                            |          |
| 2 | SQL Advanced                            |          |
+---+-----------------------------------------+----------+


Can I get Postgres to give me the same result somehow? The above
statement will return nothing at all under postgres.


Bye
Tim







Re: Tricky join question

From
"A. Kretschmer"
Date:
am  Fri, dem 22.12.2006, um 12:12:06 +0100 mailte Tim Tassonis folgendes:
> Hi all
>
> I have a join problem that seems to be too difficult for me to solve:
>
> I have:
>
> table person
>     id          integer,
>     name        varchar(32)
>
> data:
>
> 1,"Jack"
> 2,"Jill"
> 3,"Bob"
>
>
> table course
>     id         integer,
>     name       varchar(32)
>
> data:
>
> 1,"SQL Beginner"
> 2,"SQL Advanced"
>
>
>
> table person_course
>     person_id  number,
>     course_id  number
>
> data:

First, you should use referential integrity:

test=# create table person(id int primary key, name text);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "person_pkey" for table "person"
CREATE TABLE
test=# create table course(id int primary key, name text);
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "course_pkey" for table "course"
CREATE TABLE
test=# create table person_course(person_id int references person, course_id int references course );
CREATE TABLE


>
> (currently empty)

Okay, i insert some data:

test=# insert into person_course values (1,1);
INSERT 0 1
test=# insert into person_course values (3,1);
INSERT 0 1
test=# insert into person_course values (3,2);
INSERT 0 1



>
>
> Now, I would like to know for every person the courses they have taken.

Similar to your result:

test=# select b.id, b.name, c.id from person_course a left join course b on a.course_id=b.id left join person c on
a.person_id=c.id;
 id |     name     | id
----+--------------+----
  1 | SQL Beginner |  1
  1 | SQL Beginner |  3
  2 | SQL Advanced |  3
(3 rows)

In my opinion better:

test=# select c.id, c.name, b.name from person_course a left join course b on a.course_id=b.id left join person c on
a.person_id=c.id;
 id | name |     name
----+------+--------------
  1 | Jack | SQL Beginner
  3 | Bob  | SQL Beginner
  3 | Bob  | SQL Advanced
(3 rows)


Please, read more about referential integrity.


Andreas
--
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47215,   D1: 0160/7141639 (mehr: -> Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

Re: Tricky join question

From
Tim Tassonis
Date:
Hi Andreas

> First, you should use referential integrity:

I do, that is not the point. It was a simplified data model. Of course I
have primary keys and stuff, but they don't affect join behaviour at all.
>
> test=# create table person(id int primary key, name text);
> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "person_pkey" for table "person"
> CREATE TABLE
> test=# create table course(id int primary key, name text);
> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "course_pkey" for table "course"
> CREATE TABLE
> test=# create table person_course(person_id int references person, course_id int references course );
> CREATE TABLE
>

Well, you shouldn't, thats not my problem.
>
>
>
>>
>> Now, I would like to know for every person the courses they have taken.
>
> Similar to your result:
>
> test=# select b.id, b.name, c.id from person_course a left join course b on a.course_id=b.id left join person c on
a.person_id=c.id;
>  id |     name     | id
> ----+--------------+----
>   1 | SQL Beginner |  1
>   1 | SQL Beginner |  3
>   2 | SQL Advanced |  3
> (3 rows)

This is absolutely not what I want. I want a row for every person and
every course, regardless whether the person has taken the course or not.
   If the person has not taken the course, I want a null value in the
person id column:

SELECT c.id, c.name, pc.person_id
FROM  person as p
       left outer join person_course as pc on p.id = pc.person_id
       right outer join course as c on pc.course_id = c.id
where p.id = 2 order by 1;

+---+-----------------------------------------+----------+
| id| name                                    | person_id|
+---+-----------------------------------------+----------+
| 1 | SQL Beginner                            |          |
| 2 | SQL Advanced                            |          |
+---+-----------------------------------------+----------+

Note here that I restrict my select to the person with the ID 2. Since
this person has not taken any course, the person_id is null. If I leave
the restriction on the person, I get person times courses rows, the
person_id only filled when a person has actually taken a course.

With the rows you added person_course and without restrictin to a
specific person, the result of your query should be:


+---+-----------------------------------------+------------+
| id| name                                    | person_id  |
+---+-----------------------------------------+------------+
| 1 | SQL Beginner                            |  1         |
| 1 | SQL Beginner                            |            |
| 1 | SQL Beginner                            |  3         |
| 2 | SQL Advanced                            |            |
| 2 | SQL Advanced                            |            |
| 2 | SQL Advanced                            |  3         |
+---+-----------------------------------------+------------+

In mysql, you get this with the following clause:

SELECT c.id, c.name, pc.person_id
FROM  person as p
       left outer join person_course as pc on p.id = pc.person_id
       right outer join course as c on pc.course_id = c.id
order by 1;

> Please, read more about referential integrity.

Thanks, but I already have read a lot about it 14 years ago.

Bye
Tim



Re: Tricky join question

From
Martijn van Oosterhout
Date:
Hi,

Without restriction you're getting:

On Fri, Dec 22, 2006 at 02:55:56PM +0100, Tim Tassonis wrote:
> +---+-----------------------------------------+------------+
> | id| name                                    | person_id  |
> +---+-----------------------------------------+------------+
> | 1 | SQL Beginner                            |  1         |
> | 1 | SQL Beginner                            |            |
> | 1 | SQL Beginner                            |  3         |
> | 2 | SQL Advanced                            |            |
> | 2 | SQL Advanced                            |            |
> | 2 | SQL Advanced                            |  3         |
> +---+-----------------------------------------+------------+

There are no rows in the table with person_id=2, so PostgreSQL is
returning the correct result (no rows). Seems like a bug in MySQL.

> In mysql, you get this with the following clause:
>
> SELECT c.id, c.name, pc.person_id
> FROM  person as p
>       left outer join person_course as pc on p.id = pc.person_id
>       right outer join course as c on pc.course_id = c.id
> order by 1;

I think what you want is to apply to restriction on person earlier,
maybe:

SELECT c.id, c.name, pc.person_id
FROM  person as p
      left outer join person_course as pc on (p.id = pc.person_id and p.id = 2)
      right outer join course as c on pc.course_id = c.id
order by 1;

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: Tricky join question

From
brian
Date:
Tim Tassonis wrote:
> Hi Andreas
>
>> First, you should use referential integrity:
>
>
> I do, that is not the point. It was a simplified data model. Of course I
> have primary keys and stuff, but they don't affect join behaviour at all.
>
>>
>> test=# create table person(id int primary key, name text);
>> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
>> "person_pkey" for table "person"
>> CREATE TABLE
>> test=# create table course(id int primary key, name text);
>> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
>> "course_pkey" for table "course"
>> CREATE TABLE
>> test=# create table person_course(person_id int references person,
>> course_id int references course );
>> CREATE TABLE
>>
>
> Well, you shouldn't, thats not my problem.
>
>>
>>
>>
>>>
>>> Now, I would like to know for every person the courses they have taken.
>>
>>
>> Similar to your result:
>>
>> test=# select b.id, b.name, c.id from person_course a left join course
>> b on a.course_id=b.id left join person c on a.person_id=c.id;
>>  id |     name     | id
>> ----+--------------+----
>>   1 | SQL Beginner |  1
>>   1 | SQL Beginner |  3
>>   2 | SQL Advanced |  3
>> (3 rows)
>
>
> This is absolutely not what I want. I want a row for every person and
> every course, regardless whether the person has taken the course or not.
>   If the person has not taken the course, I want a null value in the
> person id column:
>
> SELECT c.id, c.name, pc.person_id
> FROM  person as p
>       left outer join person_course as pc on p.id = pc.person_id
>       right outer join course as c on pc.course_id = c.id
> where p.id = 2 order by 1;
>
> +---+-----------------------------------------+----------+
> | id| name                                    | person_id|
> +---+-----------------------------------------+----------+
> | 1 | SQL Beginner                            |          |
> | 2 | SQL Advanced                            |          |
> +---+-----------------------------------------+----------+
>
> Note here that I restrict my select to the person with the ID 2. Since
> this person has not taken any course, the person_id is null. If I leave
> the restriction on the person, I get person times courses rows, the
> person_id only filled when a person has actually taken a course.
>
> With the rows you added person_course and without restrictin to a
> specific person, the result of your query should be:
>
>
> +---+-----------------------------------------+------------+
> | id| name                                    | person_id  |
> +---+-----------------------------------------+------------+
> | 1 | SQL Beginner                            |  1         |
> | 1 | SQL Beginner                            |            |
> | 1 | SQL Beginner                            |  3         |
> | 2 | SQL Advanced                            |            |
> | 2 | SQL Advanced                            |            |
> | 2 | SQL Advanced                            |  3         |
> +---+-----------------------------------------+------------+
>
> In mysql, you get this with the following clause:
>
> SELECT c.id, c.name, pc.person_id
> FROM  person as p
>       left outer join person_course as pc on p.id = pc.person_id
>       right outer join course as c on pc.course_id = c.id
> order by 1;
>
>> Please, read more about referential integrity.
>
>
> Thanks, but I already have read a lot about it 14 years ago.
>
> Bye
> Tim
>

In all your long years of experience, perhaps you haven't come across this?

http://catb.org/~esr/faqs/smart-questions.html

If you're going to ask a question here the least you could do is meet us
half-way.

b


Re: Tricky join question

From
Tom Lane
Date:
Tim Tassonis <timtas@cubic.ch> writes:
> In mysql, the following statement:

> SELECT c.id, c.name, pc.person_id
> FROM  person as p
>        left outer join person_course as pc on p.id = pc.person_id
>        right outer join course as c on pc.course_id = c.id
> where p.id = 2 order by 1;

> will get me the following result:

> +---+-----------------------------------------+----------+
> | id| name                                    | person_id|
> +---+-----------------------------------------+----------+
> | 1 | SQL Beginner                            |          |
> | 2 | SQL Advanced                            |          |
> +---+-----------------------------------------+----------+

Really?  It would be unbelievably broken if so, but a quick experiment
with mysql 5.0.27 says they return an empty set same as us.

You *would* get that answer without the WHERE clause, but neither of
those rows meet the WHERE.  Look at the complete join output:

regression=# SELECT *
FROM  person as p
       left outer join person_course as pc on p.id = pc.person_id
       right outer join course as c on pc.course_id = c.id
;
 id | name | person_id | course_id | id |     name
----+------+-----------+-----------+----+--------------
    |      |           |           |  1 | SQL Beginner
    |      |           |           |  2 | SQL Advanced
(2 rows)

The person-left-join-person_course join produces rows, but none of them
can match course during the right join, so they don't get through.

I think what you want might be a full join for the second step:

regression=# SELECT *
FROM  person as p
       left outer join person_course as pc on p.id = pc.person_id
       full outer join course as c on pc.course_id = c.id
;
 id | name | person_id | course_id | id |     name
----+------+-----------+-----------+----+--------------
  1 | Jack |           |           |    |
  2 | Jill |           |           |    |
  3 | Bob  |           |           |    |
    |      |           |           |  1 | SQL Beginner
    |      |           |           |  2 | SQL Advanced
(5 rows)

regression=# SELECT *
FROM  person as p
       left outer join person_course as pc on p.id = pc.person_id
       full outer join course as c on pc.course_id = c.id
where p.id = 2 order by 1;
 id | name | person_id | course_id | id | name
----+------+-----------+-----------+----+------
  2 | Jill |           |           |    |
(1 row)

regression=# insert into person_course values(2,2);
INSERT 0 1
regression=# SELECT *
FROM  person as p
       left outer join person_course as pc on p.id = pc.person_id
       full outer join course as c on pc.course_id = c.id
where p.id = 2 order by 1;
 id | name | person_id | course_id | id |     name
----+------+-----------+-----------+----+--------------
  2 | Jill |         2 |         2 |  2 | SQL Advanced
(1 row)

BTW, I tried to duplicate this in mysql and was surprised to find that
5.0.27 doesn't seem to support full join at all :-(

            regards, tom lane

Re: Tricky join question

From
Andreas Kretschmer
Date:
Tim Tassonis <timtas@cubic.ch> schrieb:
> This is absolutely not what I want. I want a row for every person and every
> course, regardless whether the person has taken the course or not.   If the
> person has not taken the course, I want a null value in the person id
> column:

test=# select c.id, c.name, pc.person_id
from course c
cross join person p
left outer join person_course pc on (p.id,c.id)=(pc.person_id,pc.course_id);
 id |     name     | person_id
----+--------------+-----------
  1 | SQL Beginner |         1
  1 | SQL Beginner |
  1 | SQL Beginner |         3
  2 | SQL Advanced |
  2 | SQL Advanced |
  2 | SQL Advanced |         3
(6 rows)

But i think, this is a little bit stupid, because row 4 and 5 are the
same.

Perhaps this would be better:

test=# select c.id, c.name, p.id as person, pc.person_id
from course c
cross join person p
left outer join person_course pc on (p.id,c.id)=(pc.person_id,pc.course_id);
 id |     name     | person | person_id
----+--------------+--------+-----------
  1 | SQL Beginner |      1 |         1
  1 | SQL Beginner |      2 |
  1 | SQL Beginner |      3 |         3
  2 | SQL Advanced |      1 |
  2 | SQL Advanced |      2 |
  2 | SQL Advanced |      3 |         3
(6 rows)



Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."    (unknow)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°

Re: Tricky join question

From
Tim Tassonis
Date:
Thanks to you all for your replies. I was able to solve my problem after
some more reading in the manual:


select c.id, c.name, pc.person_id
from person as p
      cross join course as c
left outer join person_course as pc
      on  (p.id = pc.person_id and c.id = pc.course_id)
where p.id = 2;


A few remarks to the answers:

I seem to have been misunderstood in a way that people understood that I
implied that mysql is right and postgres is wrong. This was in no way my
opinion. I just reported what results I got under mysql. I very much
prefer Postgres over mysql and never questioned postgres' correctness.

To brian:

>>> Please, read more about referential integrity.
>>
>>
>> Thanks, but I already have read a lot about it 14 years ago.
>>
>> Bye
>> Tim
>>
>
> In all your long years of experience, perhaps you haven't come across this?
>
> http://catb.org/~esr/faqs/smart-questions.html
>
> If you're going to ask a question here the least you could do is meet us half-way.
>

I think I asked quite politely, did not blame anybody else and just
described my problem. I agree, I felt a bit insulted when being told to
read about referential integrity, because that clearly had nothing to do
with my question and I do know about it. I'm coming from Oracle and not
from mysql.

And thanks Andreas, I just saw your latest post which has the same
solution as I got in it.

Bye
Tim