Re: Re: multi-table join, final table is outer join count ... - Mailing list pgsql-sql

From Tom Lane
Subject Re: Re: multi-table join, final table is outer join count ...
Date
Msg-id 21433.989714497@sss.pgh.pa.us
Whole thread Raw
In response to Re: multi-table join, final table is outer join count ...  (The Hermit Hacker <scrappy@hub.org>)
List pgsql-sql
The Hermit Hacker <scrappy@hub.org> writes:
>   SELECT distinct s.gid, s.created, count(i.title) AS images
>     FROM status s LEFT JOIN images i ON (s.gid = i.gid AND i.active),
>          personal_data pd, relationship_wanted rw
>    WHERE s.active AND s.status != 0
>      AND (s.gid = pd.gid AND pd.gender = 0)
>      AND (s.gid = rw.gid AND rw.gender = 0 )
> GROUP BY s.gid,s.created
> ORDER BY  images desc;

> The part that had confused me was the whole 'ON' part ... once I clued in
> that that is essentially a WHERE, it actually made sense ...

Right, but there's some fine points here.

When you're dealing with INNER JOINs, ON (or its variant USING) is
exactly equivalent to WHERE.  Write whichever you like.

When you're dealing with OUTER JOINs, ON is *not* quite the same as
WHERE, because it determines which rows are considered to "match"
and thus which rows will be extended with NULLs.  Let's take a
simplified version of your above example.  If you wrote
FROM status s LEFT JOIN images i ON (s.gid = i.gid)WHERE i.active AND ...other conditions...

then this would produce the regular inner join of status and images
where gid matches, *plus* a row for each unmatched status row (extended
with NULLs for the images columns).  This collection of rows would then
pass through your WHERE clauses, and whichever ones pass all the WHERE
conditions get into the result.  But, when you write
FROM status s LEFT JOIN images i ON (s.gid = i.gid AND i.active)WHERE ...other conditions...

then you get the inner join of status and images on gid, minus the rows
where i.active is false, plus a null-extended row for each status row
that does not have a matching *active* image row.  So the set of rows
that comes out of the join is different: there could be more
null-extended rows in this case than in the other one.  In particular,
you could see rows having i.active=NULL in the final result, which'd
never happen if you had put i.active into the WHERE clause instead of
the ON clause.

Bottom line: what you put in the ON part should just be the clauses that
determine whether you think there's a match between the two tables.
The WHERE part is additional restrictions that limit what you want to
see, but don't affect the semantics of whether there's a match.

In your above example, I'm not sure whether it's right to put i.active
in the ON part or in WHERE.  It depends on what you want to happen for
status rows that match only inactive images, and whether you consider
them different from status rows that match no images at all.
        regards, tom lane


pgsql-sql by date:

Previous
From: Stephan Szabo
Date:
Subject: Re: Constraints...
Next
From: Alexey Nalbat
Date:
Subject: can't get rid of unnesesary SORT step in explain plan for hash join