Propagating outer join conditions - Mailing list pgsql-performance

From Aaron Birkland
Subject Propagating outer join conditions
Date
Msg-id 19ab0ccd0612030711s70a3e095hceb69509cbd41597@mail.gmail.com
Whole thread Raw
Responses Re: Propagating outer join conditions  ("Jonathan Blitz" <jb@anykey.co.il>)
Re: Propagating outer join conditions  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
The following left outer join plan puzzles me:

EXPLAIN ANALYZE SELECT * from t28 LEFT OUTER JOIN (t1 JOIN t11 ON
(t11.o = '<http://example.org>' AND t11.s = t1.o)) ON t28.s = t1.s
WHERE t28.o = '"spec"';

t28, t1, and t11 all have indexed columns named 's' and 'o' that contain 'text';

 Nested Loop Left Join  (cost=794249.26..3289704.61 rows=1 width=301)
(actual time=581293.390..581293.492 rows=1 loops=1)
   Join Filter: (t28.s = t1.s)
   ->  Index Scan using t28_o on t28  (cost=0.00..9.22 rows=1
width=89) (actual time=0.073..0.077 rows=1 loops=1)
         Index Cond: (o = '"spec"'::text)
   ->  Merge Join  (cost=794249.26..3267020.66 rows=1813979 width=212)
(actual time=230365.522..577078.266 rows=1894969 loops=1)
         Merge Cond: (t1.o = t11.s)
         ->  Index Scan using t1_o on t1  (cost=0.00..2390242.10
rows=22225696 width=109) (actual time=0.209..162586.801 rows=22223925
loops=1)
         ->  Sort  (cost=794249.26..798784.21 rows=1813979 width=103)
(actual time=230365.175..237409.474 rows=1894969 loops=1)
               Sort Key: t11.s
               ->  Bitmap Heap Scan on t11  (cost=78450.82..605679.55
rows=1813979 width=103) (actual time=3252.103..22782.271 rows=1894969
loops=1)
                     Recheck Cond: (o = '<http://example.org>'::text)
                     ->  Bitmap Index Scan on t11_o
(cost=0.00..78450.82 rows=1813979 width=0) (actual
time=2445.422..2445.422 rows=1894969 loops=1)
                           Index Cond: (o = '<http://example.org>'::text)


It seems to me that this plan is not very desirable, since the outer
part of the nested loop left join (the merge join node) is very
expensive. Is is possible to generate a plan that looks like this:

 Nested Loop Left Join  (cost=???)
   ->  Index Scan using t28_o on t28  (cost=0.00..9.11 rows=1 width=89)
         Index Cond: (o = '"spec"'::text)
   ->  Nested Loop  (cost=???)
         ->  Index Scan using t1_s on t1  (cost=???)
               Index Cond: (s = t28.s)
         ->  Bitmap Heap Scan on t11  (cost=???)
               Recheck Cond: (t11.s = t1.o)
               Filter: (o = '<http://example.org>'::text)
               ->  Bitmap Index Scan on t11_s  (cost=??? )
                     Index Cond: (t11.s = t1.o)

I *think* this plan is equivalent to the above if I'm assuming the
behaviour of the 'nested loop left join' node correctly.  So far, I
have been tweaking the statistics, cost estimates, and
enabling.disabling certain plans to see if I can get it to propagate
the join condition t1.s = t28.s to the outer node of the left join..
but so far, I cannot.  So, my questions are:

1) Is my 'desired' query plan logic correct
2) Can the executor execute a plan such as my 'desired' plan
3) If (1) and (2) are 'yes', then how may I get the planner to
generate such a plan, or do I just need to look harder into tweaking
the statistics and cost estimates

  -Aaron

pgsql-performance by date:

Previous
From: "Alexandru Coseru"
Date:
Subject: Re: Regex performance issue
Next
From: "Jonathan Blitz"
Date:
Subject: Re: Propagating outer join conditions