Thread: Trouble with an outer join
I want to run an outer join query on two tables so that the tags from one will always appear even if the other table is lacking those attributes. Unfortunately, while the join seems to work there is no filtering taking place, causing a certain amount of strife. The two tables are as follows: Table "ethereal.tag" Column | Type | Modifiers ---------+-----------------------+---------------------------------------------- tagname | character varying(25) | not null tagtype | character varying(15) | not null default 'system' tagdata | text | Indexes: "pktag" PRIMARY KEY, btree (tagname, tagtype) Table "ethereal.realmdesign" Column | Type | Modifiers ---------------+-----------------------+----------- realmname | character varying(30) | not null tagname | character varying(20) | not null designcontent | text | Indexes: "pkrealmdesign" PRIMARY KEY, btree (realmname, tagname) Obviously 'TagName' is the field which matches both. These tables are not linked together and the removal of an entry from Tag will cause the removal from RealmDesign through code. The query is as follows. Note that there is a lot of filtering as to prevent too many rows which are not required from being dealt with: SELECT t.TagName AS "TagName", t.TagType AS "TagType", d.RealmName AS "RealmName" FROM ethereal.Tag t LEFT OUTER JOIN ethereal.RealmDesign d ON (t.TagName=d.TagName AND t.TagType='template' AND (t.TagName LIKE 'Realm%' OR t.TagName LIKE 'Offline%') AND d.RealmName='Horror') ORDER BY t.TagName; The EXPLAIN ANALYZE seems to indicate that the filtering takes place: Merge Left Join (cost=122.55..127.57 rows=946 width=35) (actual time=10.633..13.591 rows=946 loops=1) Merge Cond: ("outer"."?column3?" = "inner"."?column3?") Join Filter: ((("outer".tagtype)::text = 'template'::text) AND (("outer"."?column3?" ~~ 'Realm%'::text) OR ("outer"."?column3?" ~~ 'Offline%'::text))) -> Sort (cost=100.22..102.58 rows=946 width=25) (actual time=10.479..11.405 rows=946 loops=1) Sort Key: (t.tagname)::text -> Seq Scan on tag t (cost=0.00..53.46 rows=946 width=25) (actual time=0.011..1.788 rows=946 loops=1) -> Sort (cost=22.33..22.36 rows=11 width=26) (actual time=0.134..0.138 rows=11 loops=1) Sort Key: (d.tagname)::text -> Seq Scan on realmdesign d (cost=0.00..22.14 rows=11 width=26) (actual time=0.026..0.091 rows=11 loops=1) Filter: ((realmname)::text = 'Horror'::text) Total runtime: 14.418 ms The problem is however, that the filtering does not take place. In fact it outright ignores it and returns every entry from the Tag table no matter what I do to change the query. Sample limited output is below: TagName | TagType | RealmName -----------------------+---------------+----------- OptPrivacy | contrib | OptPrivacy | configuration | OptRating | contrib | OptScope | configuration | OptSort | contrib | OptSrchField | gallery | OptSrchSort | gallery | OptSrchType | gallery | OptWeekdays | gallery | RealmBanner | template | RealmDice | template | Horror RealmFooter | template | RealmFrameAutoscroll | template | Horror RealmFrameNormal | template | Horror RealmHeader | template | Horror RealmInfo | template | RealmJavascript | template | RealmNotice | template | Horror RealmPanel | template | Horror RealmPrivate | template | Horror RealmRefresh | template | RealmRemote | template | Horror RealmSeperator | template | Horror RealmSupp | template | RealmWarning | template | Horror RealmZoom | template | Horror SetAbuse | user | From the output its pretty clear that the first 10 should have been omitted for more then one reason. However they appear every time and in order to compensate for this, I have the script skip through unneeded entries manually. So what exactly am I doing wrong? Martin Foster martin@ethereal-realms.org
On Sun, 25 Sep 2005, Martin Foster wrote: > I want to run an outer join query on two tables so that the tags from > one will always appear even if the other table is lacking those > attributes. Unfortunately, while the join seems to work there is no > filtering taking place, causing a certain amount of strife. > > The two tables are as follows: > > Table "ethereal.tag" > Column | Type | Modifiers > ---------+-----------------------+---------------------------------------------- > tagname | character varying(25) | not null > tagtype | character varying(15) | not null default 'system' > tagdata | text | > Indexes: > "pktag" PRIMARY KEY, btree (tagname, tagtype) > > Table "ethereal.realmdesign" > Column | Type | Modifiers > ---------------+-----------------------+----------- > realmname | character varying(30) | not null > tagname | character varying(20) | not null > designcontent | text | > Indexes: > "pkrealmdesign" PRIMARY KEY, btree (realmname, tagname) > > > Obviously 'TagName' is the field which matches both. These tables are > not linked together and the removal of an entry from Tag will cause the > removal from RealmDesign through code. > > The query is as follows. Note that there is a lot of filtering as to > prevent too many rows which are not required from being dealt with: > > SELECT > t.TagName AS "TagName", > t.TagType AS "TagType", > d.RealmName AS "RealmName" > FROM ethereal.Tag t > LEFT OUTER JOIN ethereal.RealmDesign d > ON (t.TagName=d.TagName > AND t.TagType='template' > AND (t.TagName LIKE 'Realm%' > OR t.TagName LIKE 'Offline%') > AND d.RealmName='Horror') > ORDER BY t.TagName; ... > From the output its pretty clear that the first 10 should have been > omitted for more then one reason. However they appear every time and > in order to compensate for this, I have the script skip through unneeded > entries manually. > > So what exactly am I doing wrong? AFAIK, conditions like t.TagType='template' in the ON condition of an outer join are not going to constrain the rows from t that are created but instead constrain whether or not a row from d in considered as valid (ie, you're saying to extend with NULLs for TagTypes other than 'template'). I think some of those conditions you want in a WHERE clause, possibly all the ones that refer only to t.
Stephan Szabo wrote: >> From the output its pretty clear that the first 10 should have been >>omitted for more then one reason. However they appear every time and >>in order to compensate for this, I have the script skip through unneeded >>entries manually. >> >>So what exactly am I doing wrong? > > > AFAIK, conditions like t.TagType='template' in the ON condition of an > outer join are not going to constrain the rows from t that are created but > instead constrain whether or not a row from d in considered as valid (ie, > you're saying to extend with NULLs for TagTypes other than 'template'). > > I think some of those conditions you want in a WHERE clause, possibly all > the ones that refer only to t. > SELECT t.TagName AS "TagName", t.TagType AS "TagType", d.RealmName AS "RealmName" FROM ethereal.Tag t LEFT OUTER JOIN ethereal.RealmDesign d ON (t.TagName=d.TagName) WHERE t.TagType='template' AND (t.TagName LIKE 'Realm%' OR t.TagName LIKE 'Offline%') AND d.RealmName='Horror' ORDER BY t.TagName; Let's try that change which oddly enough does not do an outer join at all. Here is the sample output: TagName | TagType | RealmName ----------------------+----------+----------- RealmDice | template | Horror RealmFrameAutoscroll | template | Horror RealmFrameNormal | template | Horror RealmHeader | template | Horror RealmNotice | template | Horror RealmPanel | template | Horror RealmPrivate | template | Horror RealmRemote | template | Horror RealmSeperator | template | Horror RealmWarning | template | Horror RealmZoom | template | Horror Now let's try a variation: SELECT t.TagName AS "TagName", t.TagType AS "TagType", d.RealmName AS "RealmName" FROM ethereal.Tag t LEFT OUTER JOIN ethereal.RealmDesign d ON (t.TagName=d.TagName AND d.RealmName='Horror') WHERE t.TagType='template' AND (t.TagName LIKE 'Realm%' OR t.TagName LIKE 'Offline%') ORDER BY t.TagName; Which allows us to get what we need. Which gets rather confusing as to how to get a join to work exactly like people expect it too. TagName | TagType | RealmName ----------------------+----------+----------- OfflineInfo | template | OfflinePage | template | OfflinePanel | template | OfflineWarning | template | RealmBanner | template | RealmDice | template | Horror RealmFooter | template | RealmFrameAutoscroll | template | Horror RealmFrameNormal | template | Horror RealmHeader | template | Horror RealmInfo | template | RealmJavascript | template | RealmNotice | template | Horror RealmPanel | template | Horror RealmPrivate | template | Horror RealmRefresh | template | RealmRemote | template | Horror RealmSeperator | template | Horror RealmSupp | template | RealmWarning | template | Horror RealmZoom | template | Horror (21 rows) Anyone know good documentation on how to determine exactly where to cram thing as necesssary? Martin Foster Creator/Designer Ethereal Realms martin@ethereal-realms.org
this should work just fine: SELECT t.TagName AS "TagName", t.TagType AS "TagType", d.RealmName AS "RealmName" FROM ethereal.Tag t LEFT OUTER JOIN ethereal.RealmDesign d ON (t.TagName=d.TagName) WHERE t.TagType='template' AND (t.TagName LIKE 'Realm%' OR t.TagName LIKE 'Offline%') AND (d.RealmName='Horror' or d.RealmName IS NULL) ORDER BY t.TagName; cheers, thomas ----- Original Message ----- From: "Martin Foster" <martin@ethereal-realms.org> To: "Stephan Szabo" <sszabo@megazone.bigpanda.com> Cc: "PostgreSQL Novice List" <pgsql-novice@postgresql.org> Sent: Monday, September 26, 2005 2:30 AM Subject: Re: [NOVICE] Trouble with an outer join > Stephan Szabo wrote: > >>> From the output its pretty clear that the first 10 should have been >>>omitted for more then one reason. However they appear every time and >>>in order to compensate for this, I have the script skip through unneeded >>>entries manually. >>> >>>So what exactly am I doing wrong? >> >> >> AFAIK, conditions like t.TagType='template' in the ON condition of an >> outer join are not going to constrain the rows from t that are created >> but >> instead constrain whether or not a row from d in considered as valid (ie, >> you're saying to extend with NULLs for TagTypes other than 'template'). >> >> I think some of those conditions you want in a WHERE clause, possibly all >> the ones that refer only to t. >> > > SELECT > t.TagName AS "TagName", > t.TagType AS "TagType", > d.RealmName AS "RealmName" > FROM ethereal.Tag t > LEFT OUTER JOIN ethereal.RealmDesign d > ON (t.TagName=d.TagName) > WHERE t.TagType='template' > AND (t.TagName LIKE 'Realm%' > OR t.TagName LIKE 'Offline%') > AND d.RealmName='Horror' > ORDER BY t.TagName; > > Let's try that change which oddly enough does not do an outer join at all. > Here is the sample output: > > TagName | TagType | RealmName > ----------------------+----------+----------- > RealmDice | template | Horror > RealmFrameAutoscroll | template | Horror > RealmFrameNormal | template | Horror > RealmHeader | template | Horror > RealmNotice | template | Horror > RealmPanel | template | Horror > RealmPrivate | template | Horror > RealmRemote | template | Horror > RealmSeperator | template | Horror > RealmWarning | template | Horror > RealmZoom | template | Horror > > Now let's try a variation: > > SELECT > t.TagName AS "TagName", > t.TagType AS "TagType", > d.RealmName AS "RealmName" > FROM ethereal.Tag t > LEFT OUTER JOIN ethereal.RealmDesign d > ON (t.TagName=d.TagName AND d.RealmName='Horror') > WHERE t.TagType='template' > AND (t.TagName LIKE 'Realm%' > OR t.TagName LIKE 'Offline%') > ORDER BY t.TagName; > > Which allows us to get what we need. Which gets rather confusing as to > how to get a join to work exactly like people expect it too. > > TagName | TagType | RealmName > ----------------------+----------+----------- > OfflineInfo | template | > OfflinePage | template | > OfflinePanel | template | > OfflineWarning | template | > RealmBanner | template | > RealmDice | template | Horror > RealmFooter | template | > RealmFrameAutoscroll | template | Horror > RealmFrameNormal | template | Horror > RealmHeader | template | Horror > RealmInfo | template | > RealmJavascript | template | > RealmNotice | template | Horror > RealmPanel | template | Horror > RealmPrivate | template | Horror > RealmRefresh | template | > RealmRemote | template | Horror > RealmSeperator | template | Horror > RealmSupp | template | > RealmWarning | template | Horror > RealmZoom | template | Horror > (21 rows) > > Anyone know good documentation on how to determine exactly where to cram > thing as necesssary? > > Martin Foster > Creator/Designer Ethereal Realms > martin@ethereal-realms.org > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org >
me@alternize.com wrote: > this should work just fine: > > SELECT > t.TagName AS "TagName", > t.TagType AS "TagType", > d.RealmName AS "RealmName" > FROM ethereal.Tag t > LEFT OUTER JOIN ethereal.RealmDesign d > ON (t.TagName=d.TagName) > WHERE t.TagType='template' > AND (t.TagName LIKE 'Realm%' > OR t.TagName LIKE 'Offline%') > AND (d.RealmName='Horror' or d.RealmName IS NULL) > ORDER BY t.TagName; > > cheers, > thomas > What's the difference versus yours above and the one I just corrected? Anything unexpected that I should expect from mine? SELECT t.TagName AS "TagName", t.TagType AS "TagType", d.RealmName AS "RealmName" FROM ethereal.Tag t LEFT OUTER JOIN ethereal.RealmDesign d ON (t.TagName=d.TagName AND d.RealmName='Horror') WHERE t.TagType='template' AND (t.TagName LIKE 'Realm%' OR t.TagName LIKE 'Offline%') ORDER BY t.TagName; Martin Foster martin@ethereal-realms.org
honestly, i don't even know why your version results in rows having "null" as d.RealmName as these would not match the clause (t.TagName=d.TagName AND d.RealmName='Horror') when doing left joins, i'll always stick to these rules: in the ON clause, put the fields that link the tables together (i.e. foreign keys). this will result in a "virtual" result table where the left fields are coming from table1 and the right fields from table2, containing the values if there is an corresponding entry or else containing null. then in the WHERE clauses, i filter this "virtual" result table as if it is a real existing table with null-able fields. of course what the query optimizer does in the background and how the results are really put together is beyond my knowledge. also it *might* be faster to include some of the clauses in one place or another... cheers, thomas ----- Original Message ----- From: "Martin Foster" <martin@ethereal-realms.org> To: "Thomas" <me@alternize.com>; "PostgreSQL Novice List" <pgsql-novice@postgresql.org> Sent: Monday, September 26, 2005 2:40 AM Subject: Re: [NOVICE] Trouble with an outer join > me@alternize.com wrote: >> this should work just fine: >> >> SELECT >> t.TagName AS "TagName", >> t.TagType AS "TagType", >> d.RealmName AS "RealmName" >> FROM ethereal.Tag t >> LEFT OUTER JOIN ethereal.RealmDesign d >> ON (t.TagName=d.TagName) >> WHERE t.TagType='template' >> AND (t.TagName LIKE 'Realm%' >> OR t.TagName LIKE 'Offline%') >> AND (d.RealmName='Horror' or d.RealmName IS NULL) >> ORDER BY t.TagName; >> >> cheers, >> thomas >> > > What's the difference versus yours above and the one I just corrected? > Anything unexpected that I should expect from mine? > > SELECT > t.TagName AS "TagName", > t.TagType AS "TagType", > d.RealmName AS "RealmName" > FROM ethereal.Tag t > LEFT OUTER JOIN ethereal.RealmDesign d > ON (t.TagName=d.TagName AND d.RealmName='Horror') > WHERE t.TagType='template' > AND (t.TagName LIKE 'Realm%' > OR t.TagName LIKE 'Offline%') > ORDER BY t.TagName; > > Martin Foster > martin@ethereal-realms.org > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: explain analyze is your friend >
On Sun, 25 Sep 2005, Martin Foster wrote: > Stephan Szabo wrote: > > >> From the output its pretty clear that the first 10 should have been > >>omitted for more then one reason. However they appear every time and > >>in order to compensate for this, I have the script skip through unneeded > >>entries manually. > >> > >>So what exactly am I doing wrong? > > > > > > AFAIK, conditions like t.TagType='template' in the ON condition of an > > outer join are not going to constrain the rows from t that are created but > > instead constrain whether or not a row from d in considered as valid (ie, > > you're saying to extend with NULLs for TagTypes other than 'template'). > > > > I think some of those conditions you want in a WHERE clause, possibly all > > the ones that refer only to t. > > > > SELECT > t.TagName AS "TagName", > t.TagType AS "TagType", > d.RealmName AS "RealmName" > FROM ethereal.Tag t > LEFT OUTER JOIN ethereal.RealmDesign d > ON (t.TagName=d.TagName) > WHERE t.TagType='template' > AND (t.TagName LIKE 'Realm%' > OR t.TagName LIKE 'Offline%') > AND d.RealmName='Horror' > ORDER BY t.TagName; > > Let's try that change which oddly enough does not do an outer join at > all. Here is the sample output: In the case where the LEFT OUTER JOIN has no row in d to match to a row in t, the t row is extended by NULLs for the d columns. It does an outer join, it's just that after that the set is constrained down such that those rows for which a NULL extended row would not be part of the output. > Now let's try a variation: > > SELECT > t.TagName AS "TagName", > t.TagType AS "TagType", > d.RealmName AS "RealmName" > FROM ethereal.Tag t > LEFT OUTER JOIN ethereal.RealmDesign d > ON (t.TagName=d.TagName AND d.RealmName='Horror') > WHERE t.TagType='template' > AND (t.TagName LIKE 'Realm%' > OR t.TagName LIKE 'Offline%') > ORDER BY t.TagName; > > Which allows us to get what we need. Which gets rather confusing as to > how to get a join to work exactly like people expect it too. > Anyone know good documentation on how to determine exactly where to cram > thing as necesssary? Unfortunately, I don't know of good documentation that's particularly detailed and understandable (but admittedly I've not looked carefully). The spec is fairly precise but mostly incomprehensible. ---- Roughly speaking, t1 Left outer join t2 on (condition) is defined as: select * from tn union all select * from xn1 where tn is the multiset of rows of the cartesian product of t1 and t2 for which condition is true xn1 is the set of rows in t1 that have no row in tn extended with NULLs to the right (ie, rows in t1 for which no joining to a row in t2 on condition returned true). Conditions in WHERE would be then applied to the output of the above. ---- Conditions in the on clause control whether a row from t1 matches to a row of t2 and is part of tn or is extended by NULLs and is part of xn1. Conditions in the where clause then apply and only allow through rows that meet the criteria.
Stephan Szabo wrote: > > > Unfortunately, I don't know of good documentation that's particularly > detailed and understandable (but admittedly I've not looked carefully). > The spec is fairly precise but mostly incomprehensible. > > ---- > > Roughly speaking, > t1 Left outer join t2 on (condition) is defined as: > select * from tn > union all > select * from xn1 > where > tn is the multiset of rows of the cartesian product of t1 and t2 for > which condition is true > xn1 is the set of rows in t1 that have no row in tn extended with NULLs > to the right (ie, rows in t1 for which no joining to a row in t2 on > condition returned true). > > Conditions in WHERE would be then applied to the output of the above. > > ---- > > Conditions in the on clause control whether a row from t1 matches to a > row of t2 and is part of tn or is extended by NULLs and is part of xn1. > Conditions in the where clause then apply and only allow through rows > that meet the criteria. I learned how to do joins on an Oracle 7 server. So the use of *= or =* was the way an outer join was done. Needless to say, the JOIN clause is a bit different and at times seems a little less then obvious. However from what you said, this would explain why the addition of AND d.RealmName='Horror' works as expected. It limits which rows are joined from the RealmDesign table. Thanks! Martin Foster martin@ethereal-realms.org