Home > mailing lists

Re: Internal operations when the planner makes a hash join. - Mailing list pgsql-performance

From	negora
Subject	Re: Internal operations when the planner makes a hash join.
Date	February 23, 2010 12:40:20
Msg-id	4B8404CC.4060607@negora.com Whole thread Raw
In response to	Re: Internal operations when the planner makes a hash join. ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses	Re: Internal operations when the planner makes a hash join.
List	pgsql-performance

Tree view

<font face="Verdana">First of all, thank you for your fast answer, Kevin :) .<br /><br /> However I still wonder if on
thesearch into the hashed table (stored in the RAM, as you're pointing out), it checks for fathers as many times as
studentsare selected, or if the engine uses some kind of intelligent heuristic to avoid searching for the same father
morethan once.<br /><br /> For example:<br /><br /> students<br /> ----------------------------------------<br />
id_student| name | id_father<br /> ----------------------------------------<br /> 1 | James | 1<br /> 2 | Laura | 2<br
/>3 | Anthony | 1<br /><br /><br /> fathers (hashed table into RAM)<br /></font><font
face="Verdana">----------------------------------------<br/></font><font face="Verdana">id_father</font><font
face="Verdana">| name<br /> ----------------------------------------<br /> 1 | John<br /> 2 | Michael<br /><br /><br />
Accordingto how I understood the process, the engine would get the name from the student with ID 1 and would look for
thename of the father with ID 1 in the hashed table. It'd do exactly the same with the student #2 and father #2. But my
bigdoubt is about the 3rd one (Anthony). Would the engine "know" that it already had retrieved the father's name for
thestudent 1 and would avoid searching for it into the hashed table (using some kind of internal mechanism which allows
to"re-utilize" the name)? Or would it search into the hashed table again?<br /><br /> Thanks a lot for your patience :)
.<br/><br /></font><br /> Kevin Grittner wrote: <blockquote cite="mid:4B83A03E020000250002F4FD@gw.wicourts.gov"
type="cite"><prewrap="">negora <a class="moz-txt-link-rfc2396E"
href="mailto:negora@negora.com"><negora@negora.com></a>wrote: </pre><blockquote type="cite"><pre wrap="">I've a
doubtabout how the PostgreSQL planner makes a hash join.   </pre></blockquote><pre wrap="">  </pre><blockquote
type="cite"><prewrap="">Let's suppose that I've 2 tables, one of students and the other
 
one of parents in a many-to-one relation. I want to do something
like this:
       SELECT s.complete_name, f.complete_name       FROM students AS s       JOIN fathers AS f ON f.id_father =
s.id_father;

Using the ANALYZE command, I've checked that the planner firstly
scans and extracts the required information from "fathers", builds
a temporary hash table from it, then scans "students", and finally
joins the information from this table and the temporary one
employing the relation "f.id_father = s.id_father".   </pre></blockquote><pre wrap=""> 
This sort of plan is sometimes used when the optimizer expects the
hash table to fit into RAM, based on statistics and your work_mem
setting.  If it does fit, that's one sequential scan of the father
table's heap, and a hashed lookup into RAM to find the father to
match each student.  For the sort of query you're showing, that's
typically a very good plan.
-Kevin
 </pre></blockquote>

pgsql-performance by date:

From: "Kevin Grittner"
Date: 23 February 2010, 11:30:53
Subject: Re: Internal operations when the planner makes a hash join.

From: Alvaro Herrera
Date: 23 February 2010, 12:53:55
Subject: Re: Internal operations when the planner makes a hash join.

Re: Internal operations when the planner makes a hash join. - Mailing list pgsql-performance

Previous

Next