Potential Join Performance Issue - Mailing list pgsql-hackers

From Lawrence, Ramon
Subject Potential Join Performance Issue
Date
Msg-id 6EEA43D22289484890D119821101B1DF2C1563@exchange20.mercury.ad.ubc.ca
Whole thread Raw
Responses Re: Potential Join Performance Issue  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Potential Join Performance Issue  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
<div class="Section1"><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt;
font-family:Arial">PostgreSQL development community:</span></font><p class="MsoNormal"><font face="Arial"
size="2"><spanstyle="font-size:10.0pt; 
font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt;
font-family:Arial">Our research group has been using the PostgreSQL code base to test new join algorithms.  During
testing,we noticed that the planner is not pushing down projections to the outer relation in a hash join.  Although
thismakes sense for in-memory (1 batch) joins, for joins larger than memory (such as for TPC-H DSS), this causes the
systemto perform significantly more disk I/Os when reading/writing batches of the outer relation.</span></font><p
class="MsoNormal"><fontface="Arial" size="2"><span style="font-size:10.0pt; 
font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt;
font-family:Arial">A simple solution is to add a single line of code to src\backend\optimizer\plan\createplan.c after
line1771:</span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt; 
font-family:Arial"> </span></font><p class="MsoNormal"><font face="Courier New" size="2"><span style="font-size:10.0pt;
font-family:"Courier New"">disuse_physical_tlist(outer_plan, best_path->jpath.outerjoinpath);</span></font><p
class="MsoNormal"><fontface="Courier New" size="2"><span style="font-size:10.0pt; 
font-family:"Courier New""> </span></font><p class="MsoNormal"><font face="Courier New" size="2"><span
style="font-size:10.0pt;
font-family:"Courier New"">This will always force the projection on the outer relation.</span></font><p
class="MsoNormal"><fontface="Courier New" size="2"><span style="font-size:10.0pt; 
font-family:"Courier New""> </span></font><p class="MsoNormal"><font face="Courier New" size="2"><span
style="font-size:10.0pt;
font-family:"Courier New"">A more complicated modification alternative is to add a state variable to allow the planner
toknow how many batches the hash join expects and only push down the projection if it is greater than one.  However,
pushingthe projection on the outer relation is almost always the best choice as it eliminates unneeded attributes for
operatorsabove the hash join in the plan and will be robust in the case of poor estimates.</span></font><p
class="MsoNormal"><fontface="Courier New" size="2"><span style="font-size:10.0pt; 
font-family:"Courier New""> </span></font><p class="MsoNormal"><font face="Courier New" size="2"><span
style="font-size:10.0pt;
font-family:"Courier New"">We have been testing using TPC-H scale factor 1 GB.  A sample query that demonstrates the
behavioris:</span></font><p class="MsoNormal"><font face="Courier New" size="2"><span style="font-size:10.0pt; 
font-family:"Courier New""> </span></font><p class="MsoNormal"><font face="Courier New" size="2"><span
style="font-size:10.0pt;
font-family:"Courier New"">SELECT c_custkey, c_name, o_orderkey, o_orderdate</span></font><p class="MsoNormal"><font
face="CourierNew" size="2"><span style="font-size:10.0pt; 
font-family:"Courier New"">FROM Customer, Orders</span></font><p class="MsoNormal"><font face="Courier New"
size="2"><spanstyle="font-size:10.0pt; 
font-family:"Courier New"">WHERE c_custkey = o_custkey</span></font><p class="MsoNormal"><font face="Courier New"
size="2"><spanstyle="font-size:10.0pt; 
font-family:"Courier New""> </span></font><p class="MsoNormal"><font face="Courier New" size="2"><span
style="font-size:10.0pt;
font-family:"Courier New"">Note that EXPLAIN on this query will indicate that the projection is performed on the outer
relationeven though it is not done.  We found the difference by modifying our code to track tuples and bytes output to
disk,but it also can be detected by watching the size of the temporary files produced during the
join.</span></font><fontface="Arial" size="2"><span style="font-size:10.0pt;font-family:Arial"></span></font><p
class="MsoNormal"><fontface="Arial" size="2"><span style="font-size:10.0pt; 
font-family:Arial"> </span></font><p class="MsoNormal"><font face="Arial" size="2"><span style="font-size:10.0pt;
font-family:Arial">Sincerely,</span></font><p class="MsoAutoSig"><font face="Times New Roman" size="3"><span
style="font-size:
12.0pt"> </span></font><p class="MsoAutoSig"><font face="Times New Roman" size="3"><span style="font-size:
12.0pt">Dr. Ramon Lawrence</span></font><p class="MsoAutoSig"><font face="Times New Roman" size="3"><span
style="font-size:
12.0pt">Assistant Professor, Department of Computer Science, University of British Columbia Okanagan</span></font><p
class="MsoAutoSig"><fontface="Times New Roman" size="3"><span style="font-size: 
12.0pt"><a href="http://people.ok.ubc.ca/rlawrenc/">http://people.ok.ubc.ca/rlawrenc/</a></span></font><p
class="MsoAutoSig"><fontface="Times New Roman" size="3"><span lang="PT-BR" style="font-size:12.0pt">E-mail: </span><a
href="mailto:ramon.lawrence@ubc.ca"><spanlang="PT-BR">ramon.lawrence@ubc.ca</span></a></font><span
lang="PT-BR"></span><pclass="MsoNormal"><font face="Times New Roman" size="3"><span lang="PT-BR"
style="font-size:12.0pt"> </span></font></div>

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: pg_regress inputdir
Next
From: Zeugswetter Andreas OSB sIT
Date:
Subject: Re: Synchronous Log Shipping Replication