Thread: SQL Help: Multiple LEFT OUTER JOINs

SQL Help: Multiple LEFT OUTER JOINs

From
Bill Moseley
Date:
I need to generate a table of teachers, and the count of classes they
taught in the past and are scheduled to teach in the future.

 id  |  last_name   |   total    future_class_count | past_class_count
-----+--------------+----------+--------------------+-----------------
   3 |   Smith      |       12 |                  3 |               9
   8 |   Jones      |        0 |                  0 |               0


table person
    id
    last_name

table class
    id
    class_time

table role
    id
    role_name  -- for limiting to a type of teacher

-- link tables

table person_role
    person  references person
    role    references role

-- This table ties a person to a class, thus making them an instructor

table instructors
    person references person
    class references class


I can easily get instructors and the total count of their classes:

    SELECT      person.id AS id, last_name, count(instructors.class)

      FROM      person LEFT OUTER JOIN instructors ON (person.id = instructors.person),
                person_role

     WHERE      person_role.person = person.id
                AND person_role.role = 3  -- limit to this type of teacher

  GROUP BY      id, last_name;


Here's where I'm missing something.  Trying to do an outer join on
to bring in the class row with its class_time column:


    SELECT      person.id AS id, last_name,
                count(instructors.class) as total,
                sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as future_class_count,
                sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as past_class_count


      FROM      (person LEFT OUTER JOIN instructors ON (person.id = instructors.person)) t
                     LEFT OUTER JOIN class on ( t.class = class.id ),
                person_role

     WHERE      person_role.person = person.id
                AND person_role.role = 3

  GROUP BY      person.id, last_name;





--
Bill Moseley
moseley@hank.org


Re: SQL Help: Multiple LEFT OUTER JOINs

From
Bruno Wolff III
Date:
On Mon, Nov 21, 2005 at 05:40:10 -0800,
  Bill Moseley <moseley@hank.org> wrote:
>
> Here's where I'm missing something.  Trying to do an outer join on
> to bring in the class row with its class_time column:

You don't say exactly why you are having a problem with this, but I think you
would be better off doing an inner join between instructors and class and
then do an outer join of that result to person.

>
>
>     SELECT      person.id AS id, last_name,
>                 count(instructors.class) as total,
>                 sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as future_class_count,
>                 sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as past_class_count
>
>
>       FROM      (person LEFT OUTER JOIN instructors ON (person.id = instructors.person)) t
>                      LEFT OUTER JOIN class on ( t.class = class.id ),
>                 person_role
>
>      WHERE      person_role.person = person.id
>                 AND person_role.role = 3
>
>   GROUP BY      person.id, last_name;
>
>
>
>
>
> --
> Bill Moseley
> moseley@hank.org
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

Re: SQL Help: Multiple LEFT OUTER JOINs

From
Bill Moseley
Date:
On Mon, Nov 21, 2005 at 11:45:34AM -0600, Bruno Wolff III wrote:
> On Mon, Nov 21, 2005 at 05:40:10 -0800,
>   Bill Moseley <moseley@hank.org> wrote:
> >
> > Here's where I'm missing something.  Trying to do an outer join on
> > to bring in the class row with its class_time column:
>
> You don't say exactly why you are having a problem with this, but I think you
> would be better off doing an inner join between instructors and class and
> then do an outer join of that result to person.

Sorry, I thought I was so far off it might be obvious.  I suspect I'm
making the query harder than it really is.

This query just eats CPU and doesn't seem to finish, but I didn't let
it run more than a minute (which is forever as far as I'm concerned).
The tables are not that big (10,000 people, 1500 classes)


> >     SELECT      person.id AS id, last_name,
> >                 count(instructors.class) as total,
> >                 sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as future_class_count,
> >                 sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as past_class_count
> >
> >
> >       FROM      (person LEFT OUTER JOIN instructors ON (person.id = instructors.person)) t
> >                      LEFT OUTER JOIN class on ( t.class = class.id ),
> >                 person_role
> >
> >      WHERE      person_role.person = person.id
> >                 AND person_role.role = 3
> >
> >   GROUP BY      person.id, last_name;

Well, I'm stabbing in the dark now.  You mean like:

    SELECT      person.id AS id, first_name, last_name,
                count(instructors.class) as total_classes,
                sum (CASE WHEN class.id IS NULL THEN 0 ELSE 1 END) as total_class_count,  -- which is better?
                sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as future_class_count,
                sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as past_class_count


      FROM      (class INNER JOIN instructors ON ( class.id = instructors.class )) t
                    LEFT OUTER JOIN person  ON ( person.id = t.person ),
                person_role

    WHERE       person_role.person = person.id
                AND person_role.role = 3

    GROUP BY    person.id, first_name, last_name;

Still eats CPU.

 GroupAggregate  (cost=1750458.67..1890662.91 rows=10212 width=39)
   ->  Sort  (cost=1750458.67..1767958.67 rows=7000000 width=39)
         Sort Key: person.id, person.first_name, person.last_name
         ->  Nested Loop  (cost=111.27..140276.35 rows=7000000 width=39)
               ->  Nested Loop  (cost=91.27..256.35 rows=7000 width=35)
                     ->  Hash Join  (cost=71.27..96.35 rows=7 width=31)
                           Hash Cond: ("outer".id = "inner"."class")
                           ->  Seq Scan on "class"  (cost=0.00..20.00 rows=1000 width=12)
                           ->  Hash  (cost=71.25..71.25 rows=7 width=27)
                                 ->  Nested Loop  (cost=3.20..71.25 rows=7 width=27)
                                       ->  Hash Join  (cost=3.20..30.77 rows=7 width=12)
                                             Hash Cond: ("outer".person = "inner".person)
                                             ->  Seq Scan on instructors  (cost=0.00..20.00 rows=1000 width=8)
                                             ->  Hash  (cost=3.01..3.01 rows=75 width=4)
                                                   ->  Index Scan using person_role_role_index on person_role
(cost=0.00..3.01rows=75 width=4) 
                                                         Index Cond: (role = 3)
                                       ->  Index Scan using person_pkey on person  (cost=0.00..5.77 rows=1 width=23)
                                             Index Cond: ("outer".person = person.id)
                     ->  Materialize  (cost=20.00..30.00 rows=1000 width=4)
                           ->  Seq Scan on instructors  (cost=0.00..20.00 rows=1000 width=4)
               ->  Materialize  (cost=20.00..30.00 rows=1000 width=4)
                     ->  Seq Scan on "class"  (cost=0.00..20.00 rows=1000 width=4)
(22 rows)



> >
> >
> >
> >
> >
> > --
> > Bill Moseley
> > moseley@hank.org
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> >        subscribe-nomail command to majordomo@postgresql.org so that your
> >        message can get through to the mailing list cleanly
>

--
Bill Moseley
moseley@hank.org


Re: SQL Help: Multiple LEFT OUTER JOINs

From
John McCawley
Date:
It looks to me like your problem is that weird area where you alias your
inner join as "t" and thenn inner join based on this alias.  You're
getting a cartesian product somewhere, as evidenced by the
"rows=7000000" in your explain.
I already deleted the old mail with your table structure, but try
changing that FROM section to:

class INNER JOIN instructors ON class.id = instructors.class
LEFT OUTER JOIN person  ON person.id = (whatevertable).person


Bill Moseley wrote:

>
> Well, I'm stabbing in the dark now.  You mean like:
>
>    SELECT      person.id AS id, first_name, last_name,
> count(instructors.class) as total_classes,
>                sum (CASE WHEN class.id IS NULL THEN 0 ELSE 1 END) as
> total_class_count,  -- which is better?
>                sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as
> future_class_count,
>                sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END)
> as past_class_count
>
>
>      FROM      (class INNER JOIN instructors ON ( class.id =
> instructors.class )) t
>                    LEFT OUTER JOIN person  ON ( person.id = t.person ),
>                person_role
>    WHERE       person_role.person = person.id
>                AND person_role.role = 3
>
>    GROUP BY    person.id, first_name, last_name;
>
> Still eats CPU.
>
> GroupAggregate  (cost=1750458.67..1890662.91 rows=10212 width=39)
>   ->  Sort  (cost=1750458.67..1767958.67 rows=7000000 width=39)
>         Sort Key: person.id, person.first_name, person.last_name
>         ->  Nested Loop  (cost=111.27..140276.35 rows=7000000 width=39)
>               ->  Nested Loop  (cost=91.27..256.35 rows=7000 width=35)
>                     ->  Hash Join  (cost=71.27..96.35 rows=7 width=31)
>                           Hash Cond: ("outer".id = "inner"."class")
>                           ->  Seq Scan on "class"  (cost=0.00..20.00
> rows=1000 width=12)
>                           ->  Hash  (cost=71.25..71.25 rows=7 width=27)
>                                 ->  Nested Loop  (cost=3.20..71.25
> rows=7 width=27)
>                                       ->  Hash Join  (cost=3.20..30.77
> rows=7 width=12)
>                                             Hash Cond: ("outer".person
> = "inner".person)
>                                             ->  Seq Scan on
> instructors  (cost=0.00..20.00 rows=1000 width=8)
>                                             ->  Hash  (cost=3.01..3.01
> rows=75 width=4)
>                                                   ->  Index Scan using
> person_role_role_index on person_role  (cost=0.00..3.01 rows=75 width=4)
>                                                         Index Cond:
> (role = 3)
>                                       ->  Index Scan using person_pkey
> on person  (cost=0.00..5.77 rows=1 width=23)
>                                             Index Cond:
> ("outer".person = person.id)
>                     ->  Materialize  (cost=20.00..30.00 rows=1000
> width=4)
>                           ->  Seq Scan on instructors
> (cost=0.00..20.00 rows=1000 width=4)
>               ->  Materialize  (cost=20.00..30.00 rows=1000 width=4)
>                     ->  Seq Scan on "class"  (cost=0.00..20.00
> rows=1000 width=4)
> (22 rows)
>
>
>
>
>


Re: SQL Help: Multiple LEFT OUTER JOINs

From
John McCawley
Date:
I just noticed, also goofy is your ", person_role" in your from with no
criteria.  I would generally put the "person_role.person = person.id" as
an INNER JOIN, and then only have the "person_role.role=3" in the
where.  It doesn't look like that's the specific problem, but I
generally find that kind of mixed syntax muddles a query.


John McCawley wrote:

>> Well, I'm stabbing in the dark now.  You mean like:
>>
>>    SELECT      person.id AS id, first_name, last_name,
>> count(instructors.class) as total_classes,
>>                sum (CASE WHEN class.id IS NULL THEN 0 ELSE 1 END) as
>> total_class_count,  -- which is better?
>>                sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END)
>> as future_class_count,
>>                sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END)
>> as past_class_count
>>
>>
>>      FROM      (class INNER JOIN instructors ON ( class.id =
>> instructors.class )) t
>>                    LEFT OUTER JOIN person  ON ( person.id = t.person ),
>>                person_role
>>    WHERE       person_role.person = person.id
>>                AND person_role.role = 3
>>
>>    GROUP BY    person.id, first_name, last_name;
>>
>> Still eats CPU.
>>
>> GroupAggregate  (cost=1750458.67..1890662.91 rows=10212 width=39)
>>   ->  Sort  (cost=1750458.67..1767958.67 rows=7000000 width=39)
>>         Sort Key: person.id, person.first_name, person.last_name
>>         ->  Nested Loop  (cost=111.27..140276.35 rows=7000000 width=39)
>>               ->  Nested Loop  (cost=91.27..256.35 rows=7000 width=35)
>>                     ->  Hash Join  (cost=71.27..96.35 rows=7 width=31)
>>                           Hash Cond: ("outer".id = "inner"."class")
>>                           ->  Seq Scan on "class"  (cost=0.00..20.00
>> rows=1000 width=12)
>>                           ->  Hash  (cost=71.25..71.25 rows=7 width=27)
>>                                 ->  Nested Loop  (cost=3.20..71.25
>> rows=7 width=27)
>>                                       ->  Hash Join
>> (cost=3.20..30.77 rows=7 width=12)
>>                                             Hash Cond:
>> ("outer".person = "inner".person)
>>                                             ->  Seq Scan on
>> instructors  (cost=0.00..20.00 rows=1000 width=8)
>>                                             ->  Hash
>> (cost=3.01..3.01 rows=75 width=4)
>>                                                   ->  Index Scan
>> using person_role_role_index on person_role  (cost=0.00..3.01 rows=75
>> width=4)
>>                                                         Index Cond:
>> (role = 3)
>>                                       ->  Index Scan using
>> person_pkey on person  (cost=0.00..5.77 rows=1 width=23)
>>                                             Index Cond:
>> ("outer".person = person.id)
>>                     ->  Materialize  (cost=20.00..30.00 rows=1000
>> width=4)
>>                           ->  Seq Scan on instructors
>> (cost=0.00..20.00 rows=1000 width=4)
>>               ->  Materialize  (cost=20.00..30.00 rows=1000 width=4)
>>                     ->  Seq Scan on "class"  (cost=0.00..20.00
>> rows=1000 width=4)
>> (22 rows)
>
>

Re: SQL Help: Multiple LEFT OUTER JOINs

From
Bill Moseley
Date:
On Mon, Nov 21, 2005 at 03:25:56PM -0600, John McCawley wrote:
> I just noticed, also goofy is your ", person_role" in your from with no
> criteria.  I would generally put the "person_role.person = person.id" as
> an INNER JOIN, and then only have the "person_role.role=3" in the
> where.  It doesn't look like that's the specific problem, but I
> generally find that kind of mixed syntax muddles a query.

I need to read more about the FROM clause, as I can't seem to get what
you are suggesting.

The now working query (thanks to you!) is:

    SELECT      person.id AS id, first_name, last_name,
                count(instructors.class) as total_classes,
                -- sum (CASE WHEN class.id IS NULL THEN 0 ELSE 1 END) as total_class_count,
                sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as future_class_count,
                sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as past_class_count


      FROM     class INNER JOIN instructors ON class.id = instructors.class
                    LEFT OUTER JOIN person  ON person.id = instructors.person,
                person_role

    WHERE       person_role.person = person.id
                AND person_role.role = 2

    GROUP BY    person.id, first_name, last_name

    ORDER BY    future_class_count;


Not sure how to construct that.  Not this, as it returns odd counts

    SELECT      person.id AS id,
                count(instructors.class) as total_classes,
                sum (CASE WHEN class.id IS NULL THEN 0 ELSE 1 END) as total_class_count,  -- which is better?
                sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as future_class_count,
                sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as past_class_count


      FROM     class INNER JOIN instructors     ON class.id = instructors.class
                    LEFT OUTER JOIN person      ON person.id = instructors.person
                        INNER JOIN person_role  ON person_role.person = person.id

    WHERE       person_role.role = 2

    GROUP BY    person.id, first_name, last_name

    ORDER BY    total_classes;



 id  | total_classes | total_class_count | future_class_count | past_class_count
-----+---------------+-------------------+--------------------+------------------
  90 |             1 |                 1 |                  0 |                1
  98 |             1 |                 1 |                  0 |                1
  92 |             1 |                 1 |                  0 |                1
 123 |             1 |                 1 |                  0 |                1
 122 |             1 |                 1 |                  0 |                1
 121 |             2 |                 2 |                  0 |                2
  66 |             2 |                 2 |                  0 |                2
  74 |             2 |                 2 |                  0 |                2
  56 |             2 |                 2 |                  0 |                2
  85 |             2 |                 2 |                  0 |                2
 119 |             2 |                 2 |                  0 |                2
  41 |             2 |                 2 |                  0 |                2
  33 |             2 |                 2 |                  0 |                2
  65 |             2 |                 2 |                  0 |                2
 105 |             3 |                 3 |                  0 |                3
  83 |             3 |                 3 |                  0 |                3
 102 |             3 |                 3 |                  0 |                3
  32 |             4 |                 4 |                  0 |                4
  71 |             4 |                 4 |                  0 |                4
  70 |             4 |                 4 |                  0 |                4
  14 |             4 |                 4 |                  0 |                4
  29 |             4 |                 4 |                  0 |                4
  77 |             4 |                 4 |                  0 |                4
  86 |             4 |                 4 |                  0 |                4
  50 |             4 |                 4 |                  0 |                4
 107 |             4 |                 4 |                  0 |                4
   8 |             4 |                 4 |                  0 |                4
 114 |             4 |                 4 |                  0 |                4
  42 |             4 |                 4 |                  0 |                4
  82 |             4 |                 4 |                  0 |                4
  28 |             4 |                 4 |                  0 |                4
  17 |             4 |                 4 |                  0 |                4
  52 |             4 |                 4 |                  0 |                4
   9 |             4 |                 4 |                  0 |                4
  31 |             4 |                 4 |                  0 |                4

--
Bill Moseley
moseley@hank.org


Re: SQL Help: Multiple LEFT OUTER JOINs

From
Bill Moseley
Date:
On Mon, Nov 21, 2005 at 02:06:35PM -0800, Bill Moseley wrote:
> The now working query (thanks to you!) is:

No that doesn't work.  It's dropping the people that have never
been assigned a class to teach (i.e. don't have a row in the
"instructors" link table).


>       FROM     class INNER JOIN instructors ON class.id = instructors.class
>                     LEFT OUTER JOIN person  ON person.id = instructors.person,
>                 person_role

I really seem to need the multiple left outer join.  This works:

    SELECT      person.id AS id, last_name,
                person_role.role AS role,
                count(instructors.class),
                sum (CASE WHEN class_time > now() THEN 1 ELSE 0 END) as future_class_count,
                sum (CASE WHEN class_time <= now() THEN 1 ELSE 0 END) as past_class_count


      FROM      person LEFT OUTER JOIN instructors ON (person.id = instructors.person)
                    LEFT OUTER JOIN class ON (instructors.class = class.id),
                person_role

     WHERE      person_role.person = person.id
                -- AND person_role.role = 2

  GROUP BY      person.id, last_name, person_role.role;


I'm not clear how to move that "person_role.person = person.id" into
the FROM statement.  Does it matter?


--
Bill Moseley
moseley@hank.org


Re: SQL Help: Multiple LEFT OUTER JOINs

From
John McCawley
Date:
 >  I'm not clear how to move that "person_role.person = person.id" into
the FROM statement. Does it matter?

This should work:

FROM person
INNER JOIN person_role ON person.id = person_role.person
LEFT OUTER JOIN instructors ON (person.id = instructors.person)
LEFT OUTER JOIN class ON (instructors.class = class.id), person_role
GROUP BY person.id, last_name, person_role.role;


The reason *I* think it matters is that I like to keep my "constant"
join clauses in the from, and my variable criteria in the where.  i.e.
in your query, you always want the role that joins to the person...there
are no cases (in this query) where you want other roles, therefore I
wouldn't put it in the where.  I leave my where clause free for criteria
such as "where person.id = $variable" etc.  I will occasionally put
variables higher in my joins, but only if EXPLAIN ANALYZE tells me it's
faster (which rarely happens for me), but I NEVER wait until my where to
clarify the fundamental criteria whereby a table in the FROM is joined.
It's just my preference for the sake of clarity.

In other words I would:

SELECT person.last_name, person_role.role FROM
person_role INNER JOIN person ON person_role.person = person.id
WHERE person.id = $variable

rather than:

SELECT person.last_name, person_role.role
FROM
person_role, person
WHERE
person_role.person = person.id AND
person.id = $variable

>
>