Re: A fine point about OUTER JOIN semantics - Mailing list pgsql-hackers

From Thomas Swan
Subject Re: A fine point about OUTER JOIN semantics
Date
Msg-id 5.0.0.13.2.20000901160825.01d03ed0@tangent.ics.olemiss.edu
Whole thread Raw
In response to A fine point about OUTER JOIN semantics  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom I'd be happy to go back through any specs for second thoughts... do you 
have a URL to go to?

At 9/1/2000 04:47 PM -0400, Tom Lane wrote:
>Am I right in thinking that the WHERE clause of a query must logically
>be applied *after* any joins specified in the FROM clause?
>
>For example, suppose that we have table t1 (x int) containing the
>values 1, 2, 3, 4, and table t2 (y int) containing the values 1, 2, 4.
>It's clear that the result of
>         SELECT * FROM t1 LEFT JOIN t2 ON (x = y);
>should be
>         x       y
>
>         1       1
>         2       2
>         3       NULL
>         4       4
>
>But suppose we make the query
>         SELECT * FROM t1 LEFT JOIN t2 ON (x = y) WHERE y <> 2;
>It seems to me this should yield
>         x       y
>
>         1       1
>         3       NULL
>         4       4
>
>and not
>         x       y
>
>         1       1
>         2       NULL
>         3       NULL
>         4       4
>
>which is what you'd get if the y=2 tuple were filtered out before
>reaching the left-join stage.  Does anyone read the spec differently,
>or get the latter result from another implementation?
>
>The reason this is interesting is that this example breaks a rather
>fundamental assumption in our planner/optimizer, namely that WHERE
>conditions can be pushed down to the lowest level at which all the
>variables they mention are available.  Thus the planner would normally
>apply "y <> 2" during its bottom-level scan of t2, which would cause the
>LEFT JOIN to decide that x = 2 is an unmatched value, and thus produce
>a "2 NULL" output row.
>
>An even more interesting example is
>         SELECT * FROM t1 FULL JOIN t2 ON (x = y AND y <> 2);
>My interpretation is that this should produce
>         x       y
>
>         1       1
>         2       NULL
>         NULL    2
>         3       NULL
>         4       4
>since both t1's x=2 and t2's y=2 tuple will appear "unmatched".
>This is *not* the same output you'd get from
>         SELECT * FROM t1 FULL JOIN t2 ON (x = y) WHERE y <> 2;
>which I think should yield
>         x       y
>
>         1       1
>         3       NULL
>         4       4
>This shows that JOIN/ON conditions for outer joins are not semantically
>interchangeable with WHERE conditions.
>
>This is going to be a bit of work to fix, so I thought I'd better
>confirm that I'm reading the spec correctly before I dive into it.
>
>Comments?
>
>                         regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: A fine point about OUTER JOIN semantics
Next
From: Jan Wieck
Date:
Subject: Re: Backend-internal SPI operations