Thread: omitting redundant join predicate

omitting redundant join predicate

From

Ehab Galal

Date:

04 November 2007, 09:15:06

<div style="text-align: left; font-family: Tahoma,Helvetica,Sans-Serif;"><p class="MsoPlainText"><font size="3">I tried
thefollowing query </font><p class="MsoPlainText"><font size="3"> </font><p class="MsoPlainText"><font size="3">explain
select* </font><p class="MsoPlainText"><font size="3">from t1, t2, t3 </font><p class="MsoPlainText"><font
size="3">wheret1.f <= t2.f</font><p class="MsoPlainText"><font size="3"><span style="">  </span>and t2.f <=
t3.f</font><pclass="MsoPlainText"><font size="3"><span style="">  </span>and t1.f <= t3.f;</font><p
class="MsoPlainText"><fontsize="3"> </font><p class="MsoPlainText"><font size="3">And that's what I got:</font><p
class="MsoPlainText"><fontsize="3"> </font><p class="MsoPlainText"><font size="3">Nested Loop<span style=""> 
</span>(cost=0.00..3.15rows=1 width=368)</font><p class="MsoPlainText"><font size="3"><span style="">   </span>Join
Filter:(("outer".f <= "inner".f) AND ("inner".f <= "outer".f))</font><p class="MsoPlainText"><font size="3"><span
style="">  </span>-><span style="">  </span>Nested Loop<span style="">  </span>(cost=0.00..2.10 rows=1
width=218)</font><pclass="MsoPlainText"><font size="3"><span style="">         </span>Join Filter: ("outer".f <=
"inner".f)</font><pclass="MsoPlainText"><font size="3"><span style="">         </span>-><span style="">  </span>Seq
Scanon t1<span style="">  </span>(cost=0.00..1.01 rows=1 width=146)</font><p class="MsoPlainText"><font size="3"><span
style="">        </span>-><span style="">  </span>Seq Scan on t3<span style="">  </span>(cost=0.00..1.04 rows=4
width=72)</font><pclass="MsoPlainText"><font size="3"><span style="">   </span>-><span style="">  </span>Seq Scan on
t2<spanstyle="">  </span>(cost=0.00..1.02 rows=2 width=150)</font><p class="MsoPlainText"><font size="3"><span
style="color:black;"> </span></font><p class="MsoPlainText"><font size="3"><span style="color:
black;"> </span></font><pclass="MsoPlainText"><font size="3"><span style="color: black;">I was wondering if there is a
wayto omit the redundant join predicate.</span></font><p class="MsoPlainText"><font size="3"><span style="color:
black;"> </span></font><pclass="MsoPlainText"><font size="3"><span style="color: black;">Thanks,</span></font><p
class="MsoPlainText"><fontsize="3"><span style="color: black;">--h</span></font></div><br /><hr />Windows Live Hotmail
andMicrosoft Office Outlook – together at last. <a
href="http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971033"target="_new">Get it now!</a>

Re: omitting redundant join predicate

From

Tom Lane

Date:

04 November 2007, 12:35:42

Ehab Galal <ehabgalal123@hotmail.com> writes:
> explain select * 
> from t1, t2, t3 
> where t1.f <= t2.f
>   and t2.f <= t3.f
>   and t1.f <= t3.f;

> I was wondering if there is a
> way to omit the redundant join predicate.

You're not being very clear here.  Do you mean will you get the same
answer if you omit "t1.f <= t3.f"?  Yes, of course (ignoring possibly
different output ordering).  Do you mean you think the system should
discard it as redundant?  I disagree --- the more join clauses the
better, as a rule.  Do you mean that the EXPLAIN output looks like
the same comparison is being applied twice?  It isn't --- in a more
modern PG release the output looks like this:
                           QUERY PLAN                            
------------------------------------------------------------------Nested Loop  (cost=33.54..81794021.44 rows=362975624
width=12) Join Filter: ((t1.f <= t2.f) AND (t2.f <= t3.f))  ->  Nested Loop  (cost=0.00..124472.40 rows=1526533
width=8)       Join Filter: (t1.f <= t3.f)        ->  Seq Scan on t1  (cost=0.00..31.40 rows=2140 width=4)        ->
SeqScan on t3  (cost=0.00..31.40 rows=2140 width=4)  ->  Materialize  (cost=33.54..54.94 rows=2140 width=4)        ->
SeqScan on t2  (cost=0.00..31.40 rows=2140 width=4)
 
(8 rows)

This is of course the stupidest possible join plan, but it's hard to do
much better --- both hash and merge joins work only on equality
conditions.  You can do a bit better with an index on t2.f:
                             QUERY PLAN                              
----------------------------------------------------------------------Nested Loop  (cost=0.00..13222230.60
rows=362975624width=12)  ->  Nested Loop  (cost=0.00..124472.40 rows=1526533 width=8)        Join Filter: (t1.f <=
t3.f)       ->  Seq Scan on t1  (cost=0.00..31.40 rows=2140 width=4)        ->  Seq Scan on t3  (cost=0.00..31.40
rows=2140width=4)  ->  Index Scan using t2i on t2  (cost=0.00..5.01 rows=238 width=4)        Index Cond: ((t1.f <=
t2.f)AND (t2.f <= t3.f))
 
(7 rows)
        regards, tom lane

Re: omitting redundant join predicate

From

Ehab Galal

Date:

05 November 2007, 11:15:52

<div style="text-align: left;"><div style="text-align: left;">Sorry for not being clear enough. What i meant is how
awarethe optimizer is about the transitivity of operators. <br /><br /> I agree that the more join clauses a query
gets,the more flexibility the optimizer gets to pick an optimal plan.<br /><br /> what i expected is that the optimizer
willuse the redundant predicates to create the plan, but the execution plan itself will not execute a redundant
predicate.<br /><br /> O! I see, it's my mistake. The example i mentioned was not a good example. I tried the equality
andit is working well :)<br /><br /></div> Nested Loop  (cost=0.00..3.14 rows=1 width=368)<br />    Join Filter:
("outer".username= "inner".username)<br />    ->  Nested Loop  (cost=0.00..2.10 rows=1 width=218)<br />         
JoinFilter: ("outer".username = "inner".username)<br />          ->  Seq Scan on t1  (cost=0.00..1.01 rows=1
width=146)<br/>          ->  Seq Scan on t3  (cost=0.00..1.04 rows=4 width=72)<br />    ->  Seq Scan on t2 
(cost=0.00..1.02rows=2 width=150)<br /><br /> I am using postgresql 8.5.1, I am wondering is there is any patch that i
canrun to enable it to put the actual table names instead of inner/outer. Should i post this to the hackers mailing
list?<br/><br /> Thanks a lot.<br /></div><br /><br /><br /><hr id="stopSpelling" />> To:
ehabgalal123@hotmail.com<br/>> CC: pgsql-sql@postgresql.org<br />> Subject: Re: [SQL] omitting redundant join
predicate<br />> Date: Sun, 4 Nov 2007 11:35:36 -0500<br />> From: tgl@sss.pgh.pa.us<br />> <br />> Ehab
Galal<ehabgalal123@hotmail.com> writes:<br />> > explain select * <br />> > from t1, t2, t3 <br
/>>> where t1.f <= t2.f<br />> > and t2.f <= t3.f<br />> > and t1.f <= t3.f;<br />> <br
/>>> I was wondering if there is a<br />> > way to omit the redundant join predicate.<br />> <br />>
You'renot being very clear here. Do you mean will you get the same<br />> answer if you omit "t1.f <= t3.f"? Yes,
ofcourse (ignoring possibly<br />> different output ordering). Do you mean you think the system should<br />>
discardit as redundant? I disagree --- the more join clauses the<br />> better, as a rule. Do you mean that the
EXPLAINoutput looks like<br />> the same comparison is being applied twice? It isn't --- in a more<br />> modern
PGrelease the output looks like this:<br />> <br />> QUERY PLAN <br />>
------------------------------------------------------------------<br/>> Nested Loop (cost=33.54..81794021.44
rows=362975624width=12)<br />> Join Filter: ((t1.f <= t2.f) AND (t2.f <= t3.f))<br />> -> Nested Loop
(cost=0.00..124472.40rows=1526533 width=8)<br />> Join Filter: (t1.f <= t3.f)<br />> -> Seq Scan on t1
(cost=0.00..31.40rows=2140 width=4)<br />> -> Seq Scan on t3 (cost=0.00..31.40 rows=2140 width=4)<br />> ->
Materialize(cost=33.54..54.94 rows=2140 width=4)<br />> -> Seq Scan on t2 (cost=0.00..31.40 rows=2140 width=4)<br
/>>(8 rows)<br />> <br />> This is of course the stupidest possible join plan, but it's hard to do<br />>
muchbetter --- both hash and merge joins work only on equality<br />> conditions. You can do a bit better with an
indexon t2.f:<br />> <br />> QUERY PLAN <br />>
----------------------------------------------------------------------<br/>> Nested Loop (cost=0.00..13222230.60
rows=362975624width=12)<br />> -> Nested Loop (cost=0.00..124472.40 rows=1526533 width=8)<br />> Join Filter:
(t1.f<= t3.f)<br />> -> Seq Scan on t1 (cost=0.00..31.40 rows=2140 width=4)<br />> -> Seq Scan on t3
(cost=0.00..31.40rows=2140 width=4)<br />> -> Index Scan using t2i on t2 (cost=0.00..5.01 rows=238 width=4)<br
/>>Index Cond: ((t1.f <= t2.f) AND (t2.f <= t3.f))<br />> (7 rows)<br />> <br />> regards, tom
lane<br/><br /><hr />Help yourself to FREE treats served up daily at the Messenger Café. <a
href="http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWLtagline"target="_new">Stop by
today!</a>

Re: omitting redundant join predicate

From

Tom Lane

Date:

05 November 2007, 12:08:19

Ehab Galal <ehabgalal123@hotmail.com> writes:
> what i expected is that the optimizer will use the redundant predicates
> to create the plan, but the execution plan itself will not execute a
> redundant predicate. 

> O! I see, it's my mistake. The example i mentioned was not a good example. I tried the equality and it is working
well:)
 

The planner has a great deal more smarts about equality conditions than
inequality conditions.  There's more you can do with equalities, and
it's more useful for typical queries.

> I am using postgresql 8.5.1, I am wondering is there is any patch that
> i can run to enable it to put the actual table names instead of
> inner/outer.

Update to 8.2.
        regards, tom lane

materialize

From

Ehab Galal

Date:

26 November 2007, 17:17:27

Greetings,<br /><br />I was wondering why do we need the Materialize node in the plan below when i explain a query? <br
/><br/><br />1: QUERY PLAN = "Nested Loop  (cost=10.99..24.34 rows=1 width=846)"    (typeid = 25, len = -1, typmod =
-1,byval = f)<br />----<br />1: QUERY PLAN = "      Join Filter: (("inner".cover)::text = ("outer".cover)::text)"  
 (typeid= 25, len = -1, typmod = -1, byval = f)<br />----<br />1: QUERY PLAN = "      ->  Index Scan using
idx_cover_ftn_on c_cover_ftn  (cost=0.00..9.30 rows=2 width=86)"    (typeid = 25, len = -1, typmod = -1, byval = f)<br
/>----<br/>1: QUERY PLAN = "               Index Cond: (username = 'user1'::name)"    (typeid = 25, len = -1, typmod =
-1,byval = f)<br />----<br />1: QUERY PLAN = "      ->  Materialize  (cost=10.99..11.89 rows=90 width=842)"  
 (typeid= 25, len = -1, typmod = -1, byval = f)<br />----<br />1: QUERY PLAN = "               ->  Seq Scan on
books (cost=0.00..10.90 rows=90 width=842)"    (typeid = 25, len = -1, typmod = -1, byval = f)<br />----<br /><br /><br
/>Thanks,<br/>Ehab<br /><br /><hr />Your smile counts. The more smiles you share, the more we donate. <a
href="www.windowslive.com/smile?ocid=TXT_TAGLM_Wave2_oprsmilewlhmtagline"target="_new">Join in!</a>

Re: materialize

From

Tom Lane

Date:

26 November 2007, 17:53:51

Ehab Galal <ehabgalal123@hotmail.com> writes:
> I was wondering why do we need the Materialize node in the plan below when i explain a query? 

It's cheaper to scan a materialized rowset many times than a raw table
--- we don't need to re-access shared memory nor re-check row visibility.
        regards, tom lane