Thread: multi-table join, final table is outer join count ...
Okay, not sure best way to try and describe this ... have multiple tables, of a form like: table agid intdata text table bgid intdata text table cgid intdata text table dgid intdata text I want to return: a.gid,a.data,b.data,c.data,count(d.data) where a.gid = b.gid = c.gid = d.gid *but* I want count(d.data) to return zero *if* there are no records in table d ... essentially, gid has to exist in tables a,b,c but not d ... So, ignoring table d, i'd have: SELECT a.gid,a.data,b.data,c.data FROM tablea a, tableb b, tablec cWHERE a.gid = b.gid AND b.gid = c.gid; How do I add 'tabled d' to the mix? Thanks ... Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Got it after a bit of fiddling ... actually, not bad code ... SELECT distinct s.gid, s.created, count(i.title) AS images FROM status s LEFT JOIN images i ON (s.gid = i.gid AND i.active), personal_data pd, relationship_wanted rw WHERE s.active AND s.status != 0 AND (s.gid = pd.gid AND pd.gender= 0) AND (s.gid = rw.gid AND rw.gender = 0 ) GROUP BY s.gid,s.created ORDER BY images desc; The part that had confused me was the whole 'ON' part ... once I clued in that that is essentially a WHERE, it actually made sense ... On Sat, 12 May 2001, The Hermit Hacker wrote: > > Okay, not sure best way to try and describe this ... have multiple tables, > of a form like: > > table a > gid int > data text > > table b > gid int > data text > > table c > gid int > data text > > table d > gid int > data text > > I want to return: > > a.gid,a.data,b.data,c.data,count(d.data) > > where > > a.gid = b.gid = c.gid = d.gid > > *but* I want count(d.data) to return zero *if* there are no records in > table d ... > > essentially, gid has to exist in tables a,b,c but not d ... > > So, ignoring table d, i'd have: > > SELECT a.gid,a.data,b.data,c.data > FROM tablea a, tableb b, tablec c > WHERE a.gid = b.gid > AND b.gid = c.gid; > > How do I add 'tabled d' to the mix? > > Thanks ... > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy > Systems Administrator @ hub.org > primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
The Hermit Hacker <scrappy@hub.org> writes: > SELECT distinct s.gid, s.created, count(i.title) AS images > FROM status s LEFT JOIN images i ON (s.gid = i.gid AND i.active), > personal_data pd, relationship_wanted rw > WHERE s.active AND s.status != 0 > AND (s.gid = pd.gid AND pd.gender = 0) > AND (s.gid = rw.gid AND rw.gender = 0 ) > GROUP BY s.gid,s.created > ORDER BY images desc; > The part that had confused me was the whole 'ON' part ... once I clued in > that that is essentially a WHERE, it actually made sense ... Right, but there's some fine points here. When you're dealing with INNER JOINs, ON (or its variant USING) is exactly equivalent to WHERE. Write whichever you like. When you're dealing with OUTER JOINs, ON is *not* quite the same as WHERE, because it determines which rows are considered to "match" and thus which rows will be extended with NULLs. Let's take a simplified version of your above example. If you wrote FROM status s LEFT JOIN images i ON (s.gid = i.gid)WHERE i.active AND ...other conditions... then this would produce the regular inner join of status and images where gid matches, *plus* a row for each unmatched status row (extended with NULLs for the images columns). This collection of rows would then pass through your WHERE clauses, and whichever ones pass all the WHERE conditions get into the result. But, when you write FROM status s LEFT JOIN images i ON (s.gid = i.gid AND i.active)WHERE ...other conditions... then you get the inner join of status and images on gid, minus the rows where i.active is false, plus a null-extended row for each status row that does not have a matching *active* image row. So the set of rows that comes out of the join is different: there could be more null-extended rows in this case than in the other one. In particular, you could see rows having i.active=NULL in the final result, which'd never happen if you had put i.active into the WHERE clause instead of the ON clause. Bottom line: what you put in the ON part should just be the clauses that determine whether you think there's a match between the two tables. The WHERE part is additional restrictions that limit what you want to see, but don't affect the semantics of whether there's a match. In your above example, I'm not sure whether it's right to put i.active in the ON part or in WHERE. It depends on what you want to happen for status rows that match only inactive images, and whether you consider them different from status rows that match no images at all. regards, tom lane