Thread: omitting redundant join predicate
<div style="text-align: left; font-family: Tahoma,Helvetica,Sans-Serif;"><p class="MsoPlainText"><font size="3">I tried thefollowing query </font><p class="MsoPlainText"><font size="3"> </font><p class="MsoPlainText"><font size="3">explain select* </font><p class="MsoPlainText"><font size="3">from t1, t2, t3 </font><p class="MsoPlainText"><font size="3">wheret1.f <= t2.f</font><p class="MsoPlainText"><font size="3"><span style=""> </span>and t2.f <= t3.f</font><pclass="MsoPlainText"><font size="3"><span style=""> </span>and t1.f <= t3.f;</font><p class="MsoPlainText"><fontsize="3"> </font><p class="MsoPlainText"><font size="3">And that's what I got:</font><p class="MsoPlainText"><fontsize="3"> </font><p class="MsoPlainText"><font size="3">Nested Loop<span style=""> </span>(cost=0.00..3.15rows=1 width=368)</font><p class="MsoPlainText"><font size="3"><span style=""> </span>Join Filter:(("outer".f <= "inner".f) AND ("inner".f <= "outer".f))</font><p class="MsoPlainText"><font size="3"><span style=""> </span>-><span style=""> </span>Nested Loop<span style=""> </span>(cost=0.00..2.10 rows=1 width=218)</font><pclass="MsoPlainText"><font size="3"><span style=""> </span>Join Filter: ("outer".f <= "inner".f)</font><pclass="MsoPlainText"><font size="3"><span style=""> </span>-><span style=""> </span>Seq Scanon t1<span style=""> </span>(cost=0.00..1.01 rows=1 width=146)</font><p class="MsoPlainText"><font size="3"><span style=""> </span>-><span style=""> </span>Seq Scan on t3<span style=""> </span>(cost=0.00..1.04 rows=4 width=72)</font><pclass="MsoPlainText"><font size="3"><span style=""> </span>-><span style=""> </span>Seq Scan on t2<spanstyle=""> </span>(cost=0.00..1.02 rows=2 width=150)</font><p class="MsoPlainText"><font size="3"><span style="color:black;"> </span></font><p class="MsoPlainText"><font size="3"><span style="color: black;"> </span></font><pclass="MsoPlainText"><font size="3"><span style="color: black;">I was wondering if there is a wayto omit the redundant join predicate.</span></font><p class="MsoPlainText"><font size="3"><span style="color: black;"> </span></font><pclass="MsoPlainText"><font size="3"><span style="color: black;">Thanks,</span></font><p class="MsoPlainText"><fontsize="3"><span style="color: black;">--h</span></font></div><br /><hr />Windows Live Hotmail andMicrosoft Office Outlook – together at last. <a href="http://office.microsoft.com/en-us/outlook/HA102225181033.aspx?pid=CL100626971033"target="_new">Get it now!</a>
Ehab Galal <ehabgalal123@hotmail.com> writes: > explain select * > from t1, t2, t3 > where t1.f <= t2.f > and t2.f <= t3.f > and t1.f <= t3.f; > I was wondering if there is a > way to omit the redundant join predicate. You're not being very clear here. Do you mean will you get the same answer if you omit "t1.f <= t3.f"? Yes, of course (ignoring possibly different output ordering). Do you mean you think the system should discard it as redundant? I disagree --- the more join clauses the better, as a rule. Do you mean that the EXPLAIN output looks like the same comparison is being applied twice? It isn't --- in a more modern PG release the output looks like this: QUERY PLAN ------------------------------------------------------------------Nested Loop (cost=33.54..81794021.44 rows=362975624 width=12) Join Filter: ((t1.f <= t2.f) AND (t2.f <= t3.f)) -> Nested Loop (cost=0.00..124472.40 rows=1526533 width=8) Join Filter: (t1.f <= t3.f) -> Seq Scan on t1 (cost=0.00..31.40 rows=2140 width=4) -> SeqScan on t3 (cost=0.00..31.40 rows=2140 width=4) -> Materialize (cost=33.54..54.94 rows=2140 width=4) -> SeqScan on t2 (cost=0.00..31.40 rows=2140 width=4) (8 rows) This is of course the stupidest possible join plan, but it's hard to do much better --- both hash and merge joins work only on equality conditions. You can do a bit better with an index on t2.f: QUERY PLAN ----------------------------------------------------------------------Nested Loop (cost=0.00..13222230.60 rows=362975624width=12) -> Nested Loop (cost=0.00..124472.40 rows=1526533 width=8) Join Filter: (t1.f <= t3.f) -> Seq Scan on t1 (cost=0.00..31.40 rows=2140 width=4) -> Seq Scan on t3 (cost=0.00..31.40 rows=2140width=4) -> Index Scan using t2i on t2 (cost=0.00..5.01 rows=238 width=4) Index Cond: ((t1.f <= t2.f)AND (t2.f <= t3.f)) (7 rows) regards, tom lane
<div style="text-align: left;"><div style="text-align: left;">Sorry for not being clear enough. What i meant is how awarethe optimizer is about the transitivity of operators. <br /><br /> I agree that the more join clauses a query gets,the more flexibility the optimizer gets to pick an optimal plan.<br /><br /> what i expected is that the optimizer willuse the redundant predicates to create the plan, but the execution plan itself will not execute a redundant predicate.<br /><br /> O! I see, it's my mistake. The example i mentioned was not a good example. I tried the equality andit is working well :)<br /><br /></div> Nested Loop (cost=0.00..3.14 rows=1 width=368)<br /> Join Filter: ("outer".username= "inner".username)<br /> -> Nested Loop (cost=0.00..2.10 rows=1 width=218)<br /> JoinFilter: ("outer".username = "inner".username)<br /> -> Seq Scan on t1 (cost=0.00..1.01 rows=1 width=146)<br/> -> Seq Scan on t3 (cost=0.00..1.04 rows=4 width=72)<br /> -> Seq Scan on t2 (cost=0.00..1.02rows=2 width=150)<br /><br /> I am using postgresql 8.5.1, I am wondering is there is any patch that i canrun to enable it to put the actual table names instead of inner/outer. Should i post this to the hackers mailing list?<br/><br /> Thanks a lot.<br /></div><br /><br /><br /><hr id="stopSpelling" />> To: ehabgalal123@hotmail.com<br/>> CC: pgsql-sql@postgresql.org<br />> Subject: Re: [SQL] omitting redundant join predicate<br />> Date: Sun, 4 Nov 2007 11:35:36 -0500<br />> From: tgl@sss.pgh.pa.us<br />> <br />> Ehab Galal<ehabgalal123@hotmail.com> writes:<br />> > explain select * <br />> > from t1, t2, t3 <br />>> where t1.f <= t2.f<br />> > and t2.f <= t3.f<br />> > and t1.f <= t3.f;<br />> <br />>> I was wondering if there is a<br />> > way to omit the redundant join predicate.<br />> <br />> You'renot being very clear here. Do you mean will you get the same<br />> answer if you omit "t1.f <= t3.f"? Yes, ofcourse (ignoring possibly<br />> different output ordering). Do you mean you think the system should<br />> discardit as redundant? I disagree --- the more join clauses the<br />> better, as a rule. Do you mean that the EXPLAINoutput looks like<br />> the same comparison is being applied twice? It isn't --- in a more<br />> modern PGrelease the output looks like this:<br />> <br />> QUERY PLAN <br />> ------------------------------------------------------------------<br/>> Nested Loop (cost=33.54..81794021.44 rows=362975624width=12)<br />> Join Filter: ((t1.f <= t2.f) AND (t2.f <= t3.f))<br />> -> Nested Loop (cost=0.00..124472.40rows=1526533 width=8)<br />> Join Filter: (t1.f <= t3.f)<br />> -> Seq Scan on t1 (cost=0.00..31.40rows=2140 width=4)<br />> -> Seq Scan on t3 (cost=0.00..31.40 rows=2140 width=4)<br />> -> Materialize(cost=33.54..54.94 rows=2140 width=4)<br />> -> Seq Scan on t2 (cost=0.00..31.40 rows=2140 width=4)<br />>(8 rows)<br />> <br />> This is of course the stupidest possible join plan, but it's hard to do<br />> muchbetter --- both hash and merge joins work only on equality<br />> conditions. You can do a bit better with an indexon t2.f:<br />> <br />> QUERY PLAN <br />> ----------------------------------------------------------------------<br/>> Nested Loop (cost=0.00..13222230.60 rows=362975624width=12)<br />> -> Nested Loop (cost=0.00..124472.40 rows=1526533 width=8)<br />> Join Filter: (t1.f<= t3.f)<br />> -> Seq Scan on t1 (cost=0.00..31.40 rows=2140 width=4)<br />> -> Seq Scan on t3 (cost=0.00..31.40rows=2140 width=4)<br />> -> Index Scan using t2i on t2 (cost=0.00..5.01 rows=238 width=4)<br />>Index Cond: ((t1.f <= t2.f) AND (t2.f <= t3.f))<br />> (7 rows)<br />> <br />> regards, tom lane<br/><br /><hr />Help yourself to FREE treats served up daily at the Messenger Café. <a href="http://www.cafemessenger.com/info/info_sweetstuff2.html?ocid=TXT_TAGLM_OctWLtagline"target="_new">Stop by today!</a>
Ehab Galal <ehabgalal123@hotmail.com> writes: > what i expected is that the optimizer will use the redundant predicates > to create the plan, but the execution plan itself will not execute a > redundant predicate. > O! I see, it's my mistake. The example i mentioned was not a good example. I tried the equality and it is working well:) The planner has a great deal more smarts about equality conditions than inequality conditions. There's more you can do with equalities, and it's more useful for typical queries. > I am using postgresql 8.5.1, I am wondering is there is any patch that > i can run to enable it to put the actual table names instead of > inner/outer. Update to 8.2. regards, tom lane
Greetings,<br /><br />I was wondering why do we need the Materialize node in the plan below when i explain a query? <br /><br/><br />1: QUERY PLAN = "Nested Loop (cost=10.99..24.34 rows=1 width=846)" (typeid = 25, len = -1, typmod = -1,byval = f)<br />----<br />1: QUERY PLAN = " Join Filter: (("inner".cover)::text = ("outer".cover)::text)" (typeid= 25, len = -1, typmod = -1, byval = f)<br />----<br />1: QUERY PLAN = " -> Index Scan using idx_cover_ftn_on c_cover_ftn (cost=0.00..9.30 rows=2 width=86)" (typeid = 25, len = -1, typmod = -1, byval = f)<br />----<br/>1: QUERY PLAN = " Index Cond: (username = 'user1'::name)" (typeid = 25, len = -1, typmod = -1,byval = f)<br />----<br />1: QUERY PLAN = " -> Materialize (cost=10.99..11.89 rows=90 width=842)" (typeid= 25, len = -1, typmod = -1, byval = f)<br />----<br />1: QUERY PLAN = " -> Seq Scan on books (cost=0.00..10.90 rows=90 width=842)" (typeid = 25, len = -1, typmod = -1, byval = f)<br />----<br /><br /><br />Thanks,<br/>Ehab<br /><br /><hr />Your smile counts. The more smiles you share, the more we donate. <a href="www.windowslive.com/smile?ocid=TXT_TAGLM_Wave2_oprsmilewlhmtagline"target="_new">Join in!</a>
Ehab Galal <ehabgalal123@hotmail.com> writes: > I was wondering why do we need the Materialize node in the plan below when i explain a query? It's cheaper to scan a materialized rowset many times than a raw table --- we don't need to re-access shared memory nor re-check row visibility. regards, tom lane