Propagating outer join conditions - Mailing list pgsql-performance
From | Aaron Birkland |
---|---|
Subject | Propagating outer join conditions |
Date | |
Msg-id | 19ab0ccd0612030711s70a3e095hceb69509cbd41597@mail.gmail.com Whole thread Raw |
Responses |
Re: Propagating outer join conditions
Re: Propagating outer join conditions |
List | pgsql-performance |
The following left outer join plan puzzles me: EXPLAIN ANALYZE SELECT * from t28 LEFT OUTER JOIN (t1 JOIN t11 ON (t11.o = '<http://example.org>' AND t11.s = t1.o)) ON t28.s = t1.s WHERE t28.o = '"spec"'; t28, t1, and t11 all have indexed columns named 's' and 'o' that contain 'text'; Nested Loop Left Join (cost=794249.26..3289704.61 rows=1 width=301) (actual time=581293.390..581293.492 rows=1 loops=1) Join Filter: (t28.s = t1.s) -> Index Scan using t28_o on t28 (cost=0.00..9.22 rows=1 width=89) (actual time=0.073..0.077 rows=1 loops=1) Index Cond: (o = '"spec"'::text) -> Merge Join (cost=794249.26..3267020.66 rows=1813979 width=212) (actual time=230365.522..577078.266 rows=1894969 loops=1) Merge Cond: (t1.o = t11.s) -> Index Scan using t1_o on t1 (cost=0.00..2390242.10 rows=22225696 width=109) (actual time=0.209..162586.801 rows=22223925 loops=1) -> Sort (cost=794249.26..798784.21 rows=1813979 width=103) (actual time=230365.175..237409.474 rows=1894969 loops=1) Sort Key: t11.s -> Bitmap Heap Scan on t11 (cost=78450.82..605679.55 rows=1813979 width=103) (actual time=3252.103..22782.271 rows=1894969 loops=1) Recheck Cond: (o = '<http://example.org>'::text) -> Bitmap Index Scan on t11_o (cost=0.00..78450.82 rows=1813979 width=0) (actual time=2445.422..2445.422 rows=1894969 loops=1) Index Cond: (o = '<http://example.org>'::text) It seems to me that this plan is not very desirable, since the outer part of the nested loop left join (the merge join node) is very expensive. Is is possible to generate a plan that looks like this: Nested Loop Left Join (cost=???) -> Index Scan using t28_o on t28 (cost=0.00..9.11 rows=1 width=89) Index Cond: (o = '"spec"'::text) -> Nested Loop (cost=???) -> Index Scan using t1_s on t1 (cost=???) Index Cond: (s = t28.s) -> Bitmap Heap Scan on t11 (cost=???) Recheck Cond: (t11.s = t1.o) Filter: (o = '<http://example.org>'::text) -> Bitmap Index Scan on t11_s (cost=??? ) Index Cond: (t11.s = t1.o) I *think* this plan is equivalent to the above if I'm assuming the behaviour of the 'nested loop left join' node correctly. So far, I have been tweaking the statistics, cost estimates, and enabling.disabling certain plans to see if I can get it to propagate the join condition t1.s = t28.s to the outer node of the left join.. but so far, I cannot. So, my questions are: 1) Is my 'desired' query plan logic correct 2) Can the executor execute a plan such as my 'desired' plan 3) If (1) and (2) are 'yes', then how may I get the planner to generate such a plan, or do I just need to look harder into tweaking the statistics and cost estimates -Aaron
pgsql-performance by date: