Thread: OUTER JOIN with filter
Hello, Can anyone help me with the problem bellow ? I want to have all the row in table "table_main" with the value in the table "table_slave" (value or null if not exist) It is possible to have the same result without sub-select in OUTER JOIN (speed problem on big table) ? Actualy this is what I make... CREATE TABLE "table_main" ( "id" int4, "some_field" varchar(100) ); CREATE TABLE "table_slave" ( "id" int4, "name" varchar(100), "c1" varchar(30) ); SELECT TM.some_field, TS.name FROM table_main TM LEFT OUTER JOIN (SELECT name FROM table_slave WHERE c1 = 'myvalue') TS USING(id) the request bellow is not correct: SELECT TM.some_filed, TS.name FROM table_main TM LEFT OUTER JOIN table_slave TS USING(id) WHERE (TS.c1 = 'myc1' OR TS.c1 IS NULL) Many thanks, Nicolas
On Sat, Mar 01, 2003 at 19:53:27 +0100, Nicolas Fertig <nfertig@swissnow.ch> wrote: > > I want to have all the row in table "table_main" with the value in the table > "table_slave" (value or null if not exist) > > It is possible to have the same result without sub-select in OUTER JOIN > (speed problem on big table) ? Why do you think there will be a speed problem? With proper statistics a good plan for the query below should be possible. As long as both tables have indexes on id a merge join with a filter can be used. If there are only a few rows with c1 = 'myvalue' out of a large number of rows and there is an index on c1, then an alternative plan that does a merge join with the sorted output from the subselect (done with an index scan) might be faster. The planner should be able to choose between these plans as long as has good statistics for the tables. > SELECT TM.some_field, TS.name > FROM table_main TM > LEFT OUTER JOIN (SELECT name FROM table_slave WHERE c1 = 'myvalue') TS > USING(id) Assuming that the above query gives you the results you want, then I expect that it is the most efficient way to write it. You could use a case statement to handle the where c1 = 'myvalue' clause, but doing this is probably going to be slower than doing a join to the subselect.
On Sat, 1 Mar 2003, Nicolas Fertig wrote: > Can anyone help me with the problem bellow ? > > SELECT TM.some_field, TS.name > FROM table_main TM > LEFT OUTER JOIN (SELECT name FROM table_slave WHERE c1 = 'myvalue') TS > USING(id) Maybe, select tm.some_field, ts.name from table_tmain tm left outer join table_slave ts on (tm.id=ts.id and ts.c1='myvalue');
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > On Sat, 1 Mar 2003, Nicolas Fertig wrote: >> Can anyone help me with the problem bellow ? >> >> SELECT TM.some_field, TS.name >> FROM table_main TM >> LEFT OUTER JOIN (SELECT name FROM table_slave WHERE c1 = 'myvalue') TS >> USING(id) > select tm.some_field, ts.name > from table_main tm left outer join table_slave ts > on (tm.id=ts.id and ts.c1='myvalue'); Offhand I believe that these queries should generate identical plans. They do not at the moment --- the second one generates a worse plan (sorry Stephan ;-)) --- because the planner does not realize it could push down the ts.c1='myvalue' JOIN condition into the scan of ts, even though the join is OUTER. But AFAICS it would not change the results to do so; ts rows failing ts.c1='myvalue' will not produce join output anyway, but would allow outer-joined lefthand rows to be produced. (Can anyone see a hole in that logic? It's on my to-do list to change it...) I concur with the other comments that there's nothing wrong with the given query ... other than such trivial details as it doesn't work ... the sub-select doesn't produce an ID column. So I think Nicolas is giving us an over-censored description of his real problem; but without the omitted details, we're not likely to offer much useful help. regards, tom lane
On Sun, 2 Mar 2003, Tom Lane wrote: > Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > > On Sat, 1 Mar 2003, Nicolas Fertig wrote: > >> Can anyone help me with the problem bellow ? > >> > >> SELECT TM.some_field, TS.name > >> FROM table_main TM > >> LEFT OUTER JOIN (SELECT name FROM table_slave WHERE c1 = 'myvalue') TS > >> USING(id) > > > select tm.some_field, ts.name > > from table_main tm left outer join table_slave ts > > on (tm.id=ts.id and ts.c1='myvalue'); > > Offhand I believe that these queries should generate identical plans. > They do not at the moment --- the second one generates a worse plan > (sorry Stephan ;-)) --- because the planner does not realize it could I wasn't really sure if it would or not (wrote it without testing on a running server), actually I didn't realize it wouldn't push down, and figured it'd give pretty much the same plan, but it is less verbose. :) > push down the ts.c1='myvalue' JOIN condition into the scan of ts, even > though the join is OUTER. But AFAICS it would not change the results to > do so; ts rows failing ts.c1='myvalue' will not produce join output > anyway, but would allow outer-joined lefthand rows to be produced. (Can > anyone see a hole in that logic? It's on my to-do list to change it...) I don't see an obvious case where it'd be different either.
Tom, Stephan, > > Offhand I believe that these queries should generate identical plans. > > They do not at the moment --- the second one generates a worse plan > > (sorry Stephan ;-)) --- because the planner does not realize it could > > I wasn't really sure if it would or not (wrote it without testing on a > running server), actually I didn't realize it wouldn't push down, and > figured it'd give pretty much the same plan, but it is less verbose. :) Yeah. I've found that when I'm doing a left outer join to a large table with criteria, it pays to try the query both ways. The way Stephan did it is frequently very speedy with small tables but not so speedy with large ones. For that matter, in a few queries I've found that it pays to force the planner's hand by repeating some of the clauses in the WHERE clause in the JOIN as well, as: SELECT a.x, b.y, c.z FROM a JOIN b ON (a.id = b.a_id AND b.type = 'm')LEFT OUTER JOIN c ON (b.id = c.b_id) WHERE b.type = 'm' This is only useful, of course, when the presense of outer joins forces you into an explicit join order ( And not always then ) -- otherwise the planner will generally do a good job given a free hand. > > push down the ts.c1='myvalue' JOIN condition into the scan of ts, even > > though the join is OUTER. But AFAICS it would not change the results to > > do so; ts rows failing ts.c1='myvalue' will not produce join output > > anyway, but would allow outer-joined lefthand rows to be produced. (Can > > anyone see a hole in that logic? It's on my to-do list to change it...) The logic is fine. The issue comes where the outer joined table is several times larger than the main queried table. The planner should recognize the possibility of filtering the records in the joined table before joining in order to minimize the join operation. If that's what you're asking, please fix it! -- Josh Berkus Aglio Database Solutions San Francisco
Josh Berkus <josh@agliodbs.com> writes: > For that matter, in a few queries I've found that it pays to force the > planner's hand by repeating some of the clauses in the WHERE clause in the > JOIN as well, as: > SELECT a.x, b.y, c.z > FROM a JOIN b ON (a.id = b.a_id AND b.type = 'm') > LEFT OUTER JOIN c ON (b.id = c.b_id) > WHERE b.type = 'm' > The logic is fine. The issue comes where the outer joined table is several > times larger than the main queried table. The planner should recognize the > possibility of filtering the records in the joined table before joining in > order to minimize the join operation. AFAIR, it's always done that; certainly the above query looks quite redundant to me. Note though that pushing down the WHERE clause is only legal because b is on the outer side of the join. A comparable query in the 7.3 regression database is regression=# explain select * from regression-# tenk1 a join tenk1 b on (a.unique1 = b.unique1) regression-# left join tenk1 c on (b.unique2 = c.unique2) regression-# where b.thousand = 999; QUERY PLAN -------------------------------------------------------------------------------------------Nested Loop (cost=0.00..572.49rows=10 width=732) -> Nested Loop (cost=0.00..542.55 rows=10 width=488) -> Seq Scan on tenk1b (cost=0.00..483.00 rows=10 width=244) Filter: (thousand = 999) -> Index Scan using tenk1_unique1on tenk1 a (cost=0.00..6.01 rows=1 width=244) Index Cond: (a.unique1 = "outer".unique1) -> IndexScan using tenk1_unique2 on tenk1 c (cost=0.00..3.01 rows=1 width=244) Index Cond: ("outer".unique2 = c.unique2) (8 rows) You can see that the condition on b.thousand does get pushed down to the bottom scan. (There's no index on b.thousand, so we don't get an indexscan --- but we would if there were an index...) regards, tom lane