Thread: Trouble with an outer join

Trouble with an outer join

From
Martin Foster
Date:
I want to run an outer join query on two tables so that the tags from
one will always appear even if the other table is lacking those
attributes.     Unfortunately, while the join seems to work there is no
filtering taking place, causing a certain amount of strife.

The two tables are as follows:

            Table "ethereal.tag"
  Column  |         Type          |                  Modifiers
---------+-----------------------+----------------------------------------------
  tagname | character varying(25) | not null
  tagtype | character varying(15) | not null default 'system'
  tagdata | text                  |
Indexes:
     "pktag" PRIMARY KEY, btree (tagname, tagtype)

            Table "ethereal.realmdesign"
     Column     |         Type          | Modifiers
---------------+-----------------------+-----------
  realmname     | character varying(30) | not null
  tagname       | character varying(20) | not null
  designcontent | text                  |
Indexes:
     "pkrealmdesign" PRIMARY KEY, btree (realmname, tagname)


Obviously 'TagName' is the field which matches both.   These tables are
not linked together and the removal of an entry from Tag will cause the
removal from RealmDesign through code.

The query is as follows.  Note that there is a lot of filtering as to
prevent too many rows which are not required from being dealt with:

SELECT
   t.TagName       AS "TagName",
   t.TagType       AS "TagType",
   d.RealmName     AS "RealmName"
FROM ethereal.Tag t
LEFT OUTER JOIN ethereal.RealmDesign d
  ON (t.TagName=d.TagName
AND t.TagType='template'
AND (t.TagName LIKE 'Realm%'
  OR  t.TagName LIKE 'Offline%')
AND d.RealmName='Horror')
ORDER BY t.TagName;

The EXPLAIN ANALYZE seems to indicate that the filtering takes place:

  Merge Left Join  (cost=122.55..127.57 rows=946 width=35) (actual
time=10.633..13.591 rows=946 loops=1)
    Merge Cond: ("outer"."?column3?" = "inner"."?column3?")
    Join Filter: ((("outer".tagtype)::text = 'template'::text) AND
(("outer"."?column3?" ~~ 'Realm%'::text) OR ("outer"."?column3?" ~~
'Offline%'::text)))
    ->  Sort  (cost=100.22..102.58 rows=946 width=25) (actual
time=10.479..11.405 rows=946 loops=1)
          Sort Key: (t.tagname)::text
          ->  Seq Scan on tag t  (cost=0.00..53.46 rows=946 width=25)
(actual time=0.011..1.788 rows=946 loops=1)
    ->  Sort  (cost=22.33..22.36 rows=11 width=26) (actual
time=0.134..0.138 rows=11 loops=1)
          Sort Key: (d.tagname)::text
          ->  Seq Scan on realmdesign d  (cost=0.00..22.14 rows=11
width=26) (actual time=0.026..0.091 rows=11 loops=1)
                Filter: ((realmname)::text = 'Horror'::text)
  Total runtime: 14.418 ms


The problem is however, that the filtering does not take place.  In fact
it outright ignores it and returns every entry from the Tag table no
matter what I do to change the query.  Sample limited output is below:


         TagName        |    TagType    | RealmName
-----------------------+---------------+-----------
  OptPrivacy            | contrib       |
  OptPrivacy            | configuration |
  OptRating             | contrib       |
  OptScope              | configuration |
  OptSort               | contrib       |
  OptSrchField          | gallery       |
  OptSrchSort           | gallery       |
  OptSrchType           | gallery       |
  OptWeekdays           | gallery       |
  RealmBanner           | template      |
  RealmDice             | template      | Horror
  RealmFooter           | template      |
  RealmFrameAutoscroll  | template      | Horror
  RealmFrameNormal      | template      | Horror
  RealmHeader           | template      | Horror
  RealmInfo             | template      |
  RealmJavascript       | template      |
  RealmNotice           | template      | Horror
  RealmPanel            | template      | Horror
  RealmPrivate          | template      | Horror
  RealmRefresh          | template      |
  RealmRemote           | template      | Horror
  RealmSeperator        | template      | Horror
  RealmSupp             | template      |
  RealmWarning          | template      | Horror
  RealmZoom             | template      | Horror
  SetAbuse              | user          |

 From the output its pretty clear that the first 10 should have been
omitted for more then one reason.   However they appear every time and
in order to compensate for this, I have the script skip through unneeded
entries manually.

So what exactly am I doing wrong?

    Martin Foster
    martin@ethereal-realms.org


Re: Trouble with an outer join

From
Stephan Szabo
Date:
On Sun, 25 Sep 2005, Martin Foster wrote:

> I want to run an outer join query on two tables so that the tags from
> one will always appear even if the other table is lacking those
> attributes.     Unfortunately, while the join seems to work there is no
> filtering taking place, causing a certain amount of strife.
>
> The two tables are as follows:
>
>             Table "ethereal.tag"
>   Column  |         Type          |                  Modifiers
> ---------+-----------------------+----------------------------------------------
>   tagname | character varying(25) | not null
>   tagtype | character varying(15) | not null default 'system'
>   tagdata | text                  |
> Indexes:
>      "pktag" PRIMARY KEY, btree (tagname, tagtype)
>
>             Table "ethereal.realmdesign"
>      Column     |         Type          | Modifiers
> ---------------+-----------------------+-----------
>   realmname     | character varying(30) | not null
>   tagname       | character varying(20) | not null
>   designcontent | text                  |
> Indexes:
>      "pkrealmdesign" PRIMARY KEY, btree (realmname, tagname)
>
>
> Obviously 'TagName' is the field which matches both.   These tables are
> not linked together and the removal of an entry from Tag will cause the
> removal from RealmDesign through code.
>
> The query is as follows.  Note that there is a lot of filtering as to
> prevent too many rows which are not required from being dealt with:
>
> SELECT
>    t.TagName       AS "TagName",
>    t.TagType       AS "TagType",
>    d.RealmName     AS "RealmName"
> FROM ethereal.Tag t
> LEFT OUTER JOIN ethereal.RealmDesign d
>   ON (t.TagName=d.TagName
> AND t.TagType='template'
> AND (t.TagName LIKE 'Realm%'
>   OR  t.TagName LIKE 'Offline%')
> AND d.RealmName='Horror')
> ORDER BY t.TagName;

...

>  From the output its pretty clear that the first 10 should have been
> omitted for more then one reason.   However they appear every time and
> in order to compensate for this, I have the script skip through unneeded
> entries manually.
>
> So what exactly am I doing wrong?

AFAIK, conditions like t.TagType='template' in the ON condition of an
outer join are not going to constrain the rows from t that are created but
instead constrain whether or not a row from d in considered as valid (ie,
you're saying to extend with NULLs for TagTypes other than 'template').

I think some of those conditions you want in a WHERE clause, possibly all
the ones that refer only to t.

Re: Trouble with an outer join

From
Martin Foster
Date:
Stephan Szabo wrote:

>> From the output its pretty clear that the first 10 should have been
>>omitted for more then one reason.   However they appear every time and
>>in order to compensate for this, I have the script skip through unneeded
>>entries manually.
>>
>>So what exactly am I doing wrong?
>
>
> AFAIK, conditions like t.TagType='template' in the ON condition of an
> outer join are not going to constrain the rows from t that are created but
> instead constrain whether or not a row from d in considered as valid (ie,
> you're saying to extend with NULLs for TagTypes other than 'template').
>
> I think some of those conditions you want in a WHERE clause, possibly all
> the ones that refer only to t.
>

SELECT
     t.TagName       AS "TagName",
     t.TagType       AS "TagType",
     d.RealmName     AS "RealmName"
  FROM ethereal.Tag t
  LEFT OUTER JOIN ethereal.RealmDesign d
    ON (t.TagName=d.TagName)
  WHERE t.TagType='template'
  AND (t.TagName LIKE 'Realm%'
    OR  t.TagName LIKE 'Offline%')
  AND d.RealmName='Horror'
  ORDER BY t.TagName;

Let's try that change which oddly enough does not do an outer join at
all.   Here is the sample output:

        TagName        | TagType  | RealmName
----------------------+----------+-----------
  RealmDice            | template | Horror
  RealmFrameAutoscroll | template | Horror
  RealmFrameNormal     | template | Horror
  RealmHeader          | template | Horror
  RealmNotice          | template | Horror
  RealmPanel           | template | Horror
  RealmPrivate         | template | Horror
  RealmRemote          | template | Horror
  RealmSeperator       | template | Horror
  RealmWarning         | template | Horror
  RealmZoom            | template | Horror

Now let's try a variation:

SELECT
     t.TagName       AS "TagName",
     t.TagType       AS "TagType",
     d.RealmName     AS "RealmName"
  FROM ethereal.Tag t
  LEFT OUTER JOIN ethereal.RealmDesign d
    ON (t.TagName=d.TagName AND d.RealmName='Horror')
  WHERE t.TagType='template'
  AND (t.TagName LIKE 'Realm%'
    OR  t.TagName LIKE 'Offline%')
  ORDER BY t.TagName;

Which allows us to get what we need.   Which gets rather confusing as to
how to get a join to work exactly like people expect it too.

        TagName        | TagType  | RealmName
----------------------+----------+-----------
  OfflineInfo          | template |
  OfflinePage          | template |
  OfflinePanel         | template |
  OfflineWarning       | template |
  RealmBanner          | template |
  RealmDice            | template | Horror
  RealmFooter          | template |
  RealmFrameAutoscroll | template | Horror
  RealmFrameNormal     | template | Horror
  RealmHeader          | template | Horror
  RealmInfo            | template |
  RealmJavascript      | template |
  RealmNotice          | template | Horror
  RealmPanel           | template | Horror
  RealmPrivate         | template | Horror
  RealmRefresh         | template |
  RealmRemote          | template | Horror
  RealmSeperator       | template | Horror
  RealmSupp            | template |
  RealmWarning         | template | Horror
  RealmZoom            | template | Horror
(21 rows)

Anyone know good documentation on how to determine exactly where to cram
thing as necesssary?

    Martin Foster
    Creator/Designer Ethereal Realms
    martin@ethereal-realms.org

Re: Trouble with an outer join

From
Date:
this should work just fine:

 SELECT
     t.TagName       AS "TagName",
     t.TagType       AS "TagType",
     d.RealmName     AS "RealmName"
  FROM ethereal.Tag t
  LEFT OUTER JOIN ethereal.RealmDesign d
    ON (t.TagName=d.TagName)
  WHERE t.TagType='template'
  AND (t.TagName LIKE 'Realm%'
    OR  t.TagName LIKE 'Offline%')
  AND (d.RealmName='Horror' or d.RealmName IS NULL)
  ORDER BY t.TagName;

cheers,
thomas

----- Original Message -----
From: "Martin Foster" <martin@ethereal-realms.org>
To: "Stephan Szabo" <sszabo@megazone.bigpanda.com>
Cc: "PostgreSQL Novice List" <pgsql-novice@postgresql.org>
Sent: Monday, September 26, 2005 2:30 AM
Subject: Re: [NOVICE] Trouble with an outer join


> Stephan Szabo wrote:
>
>>> From the output its pretty clear that the first 10 should have been
>>>omitted for more then one reason.   However they appear every time and
>>>in order to compensate for this, I have the script skip through unneeded
>>>entries manually.
>>>
>>>So what exactly am I doing wrong?
>>
>>
>> AFAIK, conditions like t.TagType='template' in the ON condition of an
>> outer join are not going to constrain the rows from t that are created
>> but
>> instead constrain whether or not a row from d in considered as valid (ie,
>> you're saying to extend with NULLs for TagTypes other than 'template').
>>
>> I think some of those conditions you want in a WHERE clause, possibly all
>> the ones that refer only to t.
>>
>
> SELECT
>     t.TagName       AS "TagName",
>     t.TagType       AS "TagType",
>     d.RealmName     AS "RealmName"
>  FROM ethereal.Tag t
>  LEFT OUTER JOIN ethereal.RealmDesign d
>    ON (t.TagName=d.TagName)
>  WHERE t.TagType='template'
>  AND (t.TagName LIKE 'Realm%'
>    OR  t.TagName LIKE 'Offline%')
>  AND d.RealmName='Horror'
>  ORDER BY t.TagName;
>
> Let's try that change which oddly enough does not do an outer join at all.
> Here is the sample output:
>
>        TagName        | TagType  | RealmName
> ----------------------+----------+-----------
>  RealmDice            | template | Horror
>  RealmFrameAutoscroll | template | Horror
>  RealmFrameNormal     | template | Horror
>  RealmHeader          | template | Horror
>  RealmNotice          | template | Horror
>  RealmPanel           | template | Horror
>  RealmPrivate         | template | Horror
>  RealmRemote          | template | Horror
>  RealmSeperator       | template | Horror
>  RealmWarning         | template | Horror
>  RealmZoom            | template | Horror
>
> Now let's try a variation:
>
> SELECT
>     t.TagName       AS "TagName",
>     t.TagType       AS "TagType",
>     d.RealmName     AS "RealmName"
>  FROM ethereal.Tag t
>  LEFT OUTER JOIN ethereal.RealmDesign d
>    ON (t.TagName=d.TagName AND d.RealmName='Horror')
>  WHERE t.TagType='template'
>  AND (t.TagName LIKE 'Realm%'
>    OR  t.TagName LIKE 'Offline%')
>  ORDER BY t.TagName;
>
> Which allows us to get what we need.   Which gets rather confusing as to
> how to get a join to work exactly like people expect it too.
>
>        TagName        | TagType  | RealmName
> ----------------------+----------+-----------
>  OfflineInfo          | template |
>  OfflinePage          | template |
>  OfflinePanel         | template |
>  OfflineWarning       | template |
>  RealmBanner          | template |
>  RealmDice            | template | Horror
>  RealmFooter          | template |
>  RealmFrameAutoscroll | template | Horror
>  RealmFrameNormal     | template | Horror
>  RealmHeader          | template | Horror
>  RealmInfo            | template |
>  RealmJavascript      | template |
>  RealmNotice          | template | Horror
>  RealmPanel           | template | Horror
>  RealmPrivate         | template | Horror
>  RealmRefresh         | template |
>  RealmRemote          | template | Horror
>  RealmSeperator       | template | Horror
>  RealmSupp            | template |
>  RealmWarning         | template | Horror
>  RealmZoom            | template | Horror
> (21 rows)
>
> Anyone know good documentation on how to determine exactly where to cram
> thing as necesssary?
>
> Martin Foster
> Creator/Designer Ethereal Realms
> martin@ethereal-realms.org
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>               http://archives.postgresql.org
>



Re: Trouble with an outer join

From
Martin Foster
Date:
me@alternize.com wrote:
> this should work just fine:
>
> SELECT
>     t.TagName       AS "TagName",
>     t.TagType       AS "TagType",
>     d.RealmName     AS "RealmName"
>  FROM ethereal.Tag t
>  LEFT OUTER JOIN ethereal.RealmDesign d
>    ON (t.TagName=d.TagName)
>  WHERE t.TagType='template'
>  AND (t.TagName LIKE 'Realm%'
>    OR  t.TagName LIKE 'Offline%')
>  AND (d.RealmName='Horror' or d.RealmName IS NULL)
>  ORDER BY t.TagName;
>
> cheers,
> thomas
>

What's the difference versus yours above and the one I just corrected?
  Anything unexpected that I should expect from mine?

SELECT
     t.TagName       AS "TagName",
     t.TagType       AS "TagType",
     d.RealmName     AS "RealmName"
  FROM ethereal.Tag t
  LEFT OUTER JOIN ethereal.RealmDesign d
    ON (t.TagName=d.TagName AND d.RealmName='Horror')
  WHERE t.TagType='template'
  AND (t.TagName LIKE 'Realm%'
    OR  t.TagName LIKE 'Offline%')
  ORDER BY t.TagName;

    Martin Foster
    martin@ethereal-realms.org


Re: Trouble with an outer join

From
Date:
honestly, i don't even know why your version results in rows having "null"
as d.RealmName as these would not match the clause (t.TagName=d.TagName AND
d.RealmName='Horror')

when doing left joins, i'll always stick to these rules:

in the ON clause, put the fields that link the tables together (i.e. foreign
keys). this will result in a "virtual" result table where the left fields
are coming from table1 and the right fields from table2, containing the
values if there is an corresponding entry or else containing null.

then in the WHERE clauses, i filter this "virtual" result table as if it is
a real existing table with null-able fields. of course what the query
optimizer does in the background and how the results are really put together
is beyond my knowledge. also it *might* be faster to include some of the
clauses in one place or another...

cheers,
thomas



----- Original Message -----
From: "Martin Foster" <martin@ethereal-realms.org>
To: "Thomas" <me@alternize.com>; "PostgreSQL Novice List"
<pgsql-novice@postgresql.org>
Sent: Monday, September 26, 2005 2:40 AM
Subject: Re: [NOVICE] Trouble with an outer join


> me@alternize.com wrote:
>> this should work just fine:
>>
>> SELECT
>>     t.TagName       AS "TagName",
>>     t.TagType       AS "TagType",
>>     d.RealmName     AS "RealmName"
>>  FROM ethereal.Tag t
>>  LEFT OUTER JOIN ethereal.RealmDesign d
>>    ON (t.TagName=d.TagName)
>>  WHERE t.TagType='template'
>>  AND (t.TagName LIKE 'Realm%'
>>    OR  t.TagName LIKE 'Offline%')
>>  AND (d.RealmName='Horror' or d.RealmName IS NULL)
>>  ORDER BY t.TagName;
>>
>> cheers,
>> thomas
>>
>
> What's the difference versus yours above and the one I just corrected?
> Anything unexpected that I should expect from mine?
>
> SELECT
>     t.TagName       AS "TagName",
>     t.TagType       AS "TagType",
>     d.RealmName     AS "RealmName"
>  FROM ethereal.Tag t
>  LEFT OUTER JOIN ethereal.RealmDesign d
>    ON (t.TagName=d.TagName AND d.RealmName='Horror')
>  WHERE t.TagType='template'
>  AND (t.TagName LIKE 'Realm%'
>    OR  t.TagName LIKE 'Offline%')
>  ORDER BY t.TagName;
>
> Martin Foster
> martin@ethereal-realms.org
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>



Re: Trouble with an outer join

From
Stephan Szabo
Date:
On Sun, 25 Sep 2005, Martin Foster wrote:

> Stephan Szabo wrote:
>
> >> From the output its pretty clear that the first 10 should have been
> >>omitted for more then one reason.   However they appear every time and
> >>in order to compensate for this, I have the script skip through unneeded
> >>entries manually.
> >>
> >>So what exactly am I doing wrong?
> >
> >
> > AFAIK, conditions like t.TagType='template' in the ON condition of an
> > outer join are not going to constrain the rows from t that are created but
> > instead constrain whether or not a row from d in considered as valid (ie,
> > you're saying to extend with NULLs for TagTypes other than 'template').
> >
> > I think some of those conditions you want in a WHERE clause, possibly all
> > the ones that refer only to t.
> >
>
> SELECT
>      t.TagName       AS "TagName",
>      t.TagType       AS "TagType",
>      d.RealmName     AS "RealmName"
>   FROM ethereal.Tag t
>   LEFT OUTER JOIN ethereal.RealmDesign d
>     ON (t.TagName=d.TagName)
>   WHERE t.TagType='template'
>   AND (t.TagName LIKE 'Realm%'
>     OR  t.TagName LIKE 'Offline%')
>   AND d.RealmName='Horror'
>   ORDER BY t.TagName;
>
> Let's try that change which oddly enough does not do an outer join at
> all.   Here is the sample output:

In the case where the LEFT OUTER JOIN has no row in d to match to a row in
t, the t row is extended by NULLs for the d columns. It does an outer
join, it's just that after that the set is constrained down such that
those rows for which a NULL extended row would not be part of the output.

> Now let's try a variation:
>
> SELECT
>      t.TagName       AS "TagName",
>      t.TagType       AS "TagType",
>      d.RealmName     AS "RealmName"
>   FROM ethereal.Tag t
>   LEFT OUTER JOIN ethereal.RealmDesign d
>     ON (t.TagName=d.TagName AND d.RealmName='Horror')
>   WHERE t.TagType='template'
>   AND (t.TagName LIKE 'Realm%'
>     OR  t.TagName LIKE 'Offline%')
>   ORDER BY t.TagName;
>
> Which allows us to get what we need.   Which gets rather confusing as to
> how to get a join to work exactly like people expect it too.

> Anyone know good documentation on how to determine exactly where to cram
> thing as necesssary?

Unfortunately, I don't know of good documentation that's particularly
detailed and understandable (but admittedly I've not looked carefully).
The spec is fairly precise but mostly incomprehensible.

----

Roughly speaking,
t1 Left outer join t2 on (condition) is defined as:
 select * from tn
  union all
 select * from xn1
where
 tn is the multiset of rows of the cartesian product of t1 and t2 for
which condition is true
 xn1 is the set of rows in t1 that have no row in tn extended with NULLs
to the right (ie, rows in t1 for which no joining to a row in t2 on
condition returned true).

Conditions in WHERE would be then applied to the output of the above.

----

 Conditions in the on clause control whether a row from t1 matches to a
row of t2 and is part of tn or is extended by NULLs and is part of xn1.
 Conditions in the where clause then apply and only allow through rows
that meet the criteria.

Re: Trouble with an outer join

From
Martin Foster
Date:
Stephan Szabo wrote:
>
>
> Unfortunately, I don't know of good documentation that's particularly
> detailed and understandable (but admittedly I've not looked carefully).
> The spec is fairly precise but mostly incomprehensible.
>
> ----
>
> Roughly speaking,
> t1 Left outer join t2 on (condition) is defined as:
>  select * from tn
>   union all
>  select * from xn1
> where
>  tn is the multiset of rows of the cartesian product of t1 and t2 for
> which condition is true
>  xn1 is the set of rows in t1 that have no row in tn extended with NULLs
> to the right (ie, rows in t1 for which no joining to a row in t2 on
> condition returned true).
>
> Conditions in WHERE would be then applied to the output of the above.
>
> ----
>
>  Conditions in the on clause control whether a row from t1 matches to a
> row of t2 and is part of tn or is extended by NULLs and is part of xn1.
>  Conditions in the where clause then apply and only allow through rows
> that meet the criteria.


I learned how to do joins on an Oracle 7 server.  So the use of *= or =*
was the way an outer join was done.  Needless to say, the JOIN clause is
a bit different and at times seems a little less then obvious.

However from what you said, this would explain why the addition of AND
d.RealmName='Horror' works as expected.    It limits which rows are
joined from the RealmDesign table.

Thanks!

    Martin Foster
    martin@ethereal-realms.org