Implementation Proposal For Add Free Behind Capability For Large Sequential Scan - Mailing list pgsql-hackers

From Amit Kumar Khare
Subject Implementation Proposal For Add Free Behind Capability For Large Sequential Scan
Date
Msg-id 20020223133036.93175.qmail@web10104.mail.yahoo.com
Whole thread Raw
Responses Re: Implementation Proposal For Add Free Behind  (Neil Padgett <npadgett@redhat.com>)
List pgsql-hackers
<div class="Section1"><p class="MsoNormal"><span style="mso-bidi-font-size:11.5pt;font-family:"Courier New"">Hi
All,</span><pclass="MsoNormal"><span style="mso-bidi-font-size:11.5pt;font-family:"Courier New""> </span><p
class="MsoNormal"><spanstyle="mso-bidi-font-size:11.5pt;font-family:"Courier New"">Here is at your disposal my
implementationIdea of a TODO item titled �Add Free-Behind Capability For Large Sequential Scans�. I cordially invite
yourcomments.</span><p class="MsoNormal"><span style="mso-bidi-font-size:11.5pt;font-family:"Courier New""> </span><p
class="MsoNormal"><b><u><spanstyle="mso-bidi-font-size:11.5pt;font-family:
 
"Courier New"">Part I:</span></u></b><p class="MsoNormal"><b><u><span style="font-size:11.5pt"> </span></u></b><p
class="MsoNormal"><b><u><spanstyle="font-size:11.5pt">Problem:</span></u></b><span style="font-size:11.5pt"> The
presentdefault page replacement policy of PostgreSQL is Least Recently Used (LRU). This policy may be good for any
otherdatabase access patterns but it is not efficient for Large Sequential Scan, since it makes cache useless. For
example:</span><pclass="MsoNormal"><span style="font-size:11.5pt">Let shared buffer size be 1 MB and the size of
relationis 2MB. Lets assume sequential scan is applied on this relation over and over again. The cache becomes useless
<spanstyle="color:black">because by the time the second read of the table happens the first 1mb has already been forced
outof the cache. </span></span><p class="MsoNormal"><span style="font-size:11.5pt;color:black"> </span><p
class="MsoNormal"><spanstyle="font-size:11.5pt;color:black">The problem appeared in the TODO list of PostgreSQL under
sectionCACHE titled:</span><p class="MsoNormal"><span style="font-size:11.5pt;color:black">�<i>Add Free-Behind
CapabilityFor Large Sequential Scans</i> �</span><p class="MsoNormal"><span
style="font-size:11.5pt;color:black"> </span><pclass="MsoNormal"><span style="font-size:11.5pt;color:black">Solution:
Inhis paper �<i>Operating System Support For Database Management</i>�, Professor Michael Stonebraker identifies various
databaseaccess patterns in INGRES (grand father of PostgreSQL) suggest that for sequential access to blocks, which will
becyclically rereferenced, the buffer management strategy should be to Toss Immediately, unless all n pages of relation
canbe kept in the cache. He further suggest that for buffer management to be of some use to application like DBMS there
shouldbe some provision that it accepts �advice� from an application concerning replacement strategy.</span><p
class="MsoNormal"><spanstyle="font-size:11.5pt;color:black"> </span><p class="MsoNormal"><span
style="font-size:11.5pt;color:black">Consideringthese suggestions as our baseline we have decided to implement Most
RecentlyUsed (MRU) cache replacement policy. Our task will be to identify a large sequential scan and then advice the
buffermanager to apply MRU for cache replacement.</span><p class="MsoNormal"><span
style="font-size:11.5pt;color:black"> </span><pclass="MsoNormal"><b><u><span style="font-size:11.5pt">Detailed Plan of
Implementation:</span></u></b><pclass="MsoNormal"><span style="font-size:11.5pt"> </span><p class="MsoNormal"
style="margin-left:.75in;text-indent:-.5in;mso-list:l2level1 lfo2;
 
tab-stops:list .75in"><b><span style="font-size:11.5pt">(I)<span style="font:7.0pt "Times New Roman"">               
</span></span></b><b><u><spanstyle="font-size:11.5pt">Identifying the query execution flow control</span></u></b><p
class="MsoNormal"style="margin-left:.25in"><span style="font-size:11.5pt"> </span><p class="MsoNormal"
style="margin-left:.5in;text-indent:.25in"><b><i><u><spanstyle="font-size:11.5pt">Goal of identifying the query
executionflow:</span></u></i></b><span style="font-size:11.5pt"> (i) This is necessary as we have to figure out a spot
wherewe will identify that the block access is actually a large sequential scan and from this point onward we think to
sendadvice (in form of parameter ) to buffer manager.<span style="mso-spacerun: yes">� </span></span><p
class="MsoNormal"style="margin-left:.5in;text-indent:.5in"><span style="font-size:11.5pt">(ii) Through this query
executiontrace our aim is to reach at one of the files listed below named �freelist.c�. This is the file where all the
funcitionsrelated to replacement policy are implemented. Hence our trace ends after reaching �freelist.c�.</span><p
class="MsoNormal"style="margin-left:.75in"><span style="font-size:11.5pt"> </span><p class="MsoNormal"
style="margin-left:.5in"><spanstyle="font-size:11.5pt"> </span><p class="MsoNormal"
style="margin-left:1.0in;text-indent:-.25in;mso-list:l2level2 lfo2;
 
tab-stops:list 1.0in"><b><span style="font-size:11.5pt">(1)<span style="font:7.0pt "Times New Roman"">  
</span></span></b><b><u><spanstyle="font-size:11.5pt">Files Involved (Important files listed):</span></u></b><p
class="MsoNormal"style="margin-left:.75in"><b><u><span style="font-size:11.5pt"> </span></u></b><p class="MsoNormal"
style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4;
 
tab-stops:list 1.25in"><span style="font-size:11.5pt">(1)<span style="font:7.0pt "Times New Roman"">   
</span></span><spanstyle="font-size:11.5pt">bin/psql/mainloop.c</span><p class="MsoNormal"
style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4;
 
tab-stops:list 1.25in"><span style="font-size:11.5pt">(2)<span style="font:7.0pt "Times New Roman"">   
</span></span><spanstyle="font-size:11.5pt">bin/psql/common.c</span><p class="MsoNormal"
style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4;
 
tab-stops:list 1.25in"><span style="font-size:11.5pt">(3)<span style="font:7.0pt "Times New Roman"">   
</span></span><spanstyle="font-size:11.5pt">src/interfaces/libpq/fe-exec.c</span><p class="MsoNormal"
style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4;
 
tab-stops:list 1.25in"><span style="font-size:11.5pt">(4)<span style="font:7.0pt "Times New Roman"">   
</span></span><spanstyle="font-size:11.5pt">src/backend/tcop/postgres.c</span><p class="MsoNormal"
style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4;
 
tab-stops:list 1.25in"><span style="font-size:11.5pt">(5)<span style="font:7.0pt "Times New Roman"">   
</span></span><spanstyle="font-size:11.5pt">src/backend/tcop/pquery.c</span><p class="MsoNormal"
style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4;
 
tab-stops:list 1.25in"><span style="font-size:11.5pt">(6)<span style="font:7.0pt "Times New Roman"">   
</span></span><spanstyle="font-size:11.5pt">src/backend/executor/execMain.c</span><p class="MsoNormal"
style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4;
 
tab-stops:list 1.25in"><span style="font-size:11.5pt">(7)<span style="font:7.0pt "Times New Roman"">   
</span></span><spanstyle="font-size:11.5pt">src/backend/executor/execProcnode.c</span><p class="MsoNormal"
style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4;
 
tab-stops:list 1.25in"><span style="font-size:11.5pt">(8)<span style="font:7.0pt "Times New Roman"">   
</span></span><spanstyle="font-size:11.5pt">src/backend/access/heap/heapam.c</span><p class="MsoNormal"
style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4;
 
tab-stops:list 1.25in"><span style="font-size:11.5pt">(9)<span style="font:7.0pt "Times New Roman"">   
</span></span><spanstyle="font-size:11.5pt">src/backend/utils/cache/relcache.c</span><p class="MsoNormal"
style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4;
 
tab-stops:list 1.25in"><span style="font-size:11.5pt">(10)</span><span
style="font-size:11.5pt">src/backend/utils/cache/relcache.c</span><pclass="MsoNormal"
style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4;
 
tab-stops:list 1.25in"><span style="font-size:11.5pt">(11)</span><span
style="font-size:11.5pt">src/backend/utils/hash/dynahash.c</span><pclass="MsoNormal"
style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4;
 
tab-stops:list 1.25in"><span style="font-size:11.5pt">(12)</span><span
style="font-size:11.5pt">src/backend/storage/smgr/smgr.c</span><pclass="MsoNormal"
style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4;
 
tab-stops:list 1.25in"><span style="font-size:11.5pt">(13)</span><span
style="font-size:11.5pt">src/include/utils/rel.h</span><pclass="MsoNormal"
style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4;
 
tab-stops:list 1.25in"><span style="font-size:11.5pt">(14)</span><span
style="font-size:11.5pt">src/backend/access/common/scankey.c</span><pclass="MsoNormal"
style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4;
 
tab-stops:list 1.25in"><span style="font-size:11.5pt">(15)</span><span
style="font-size:11.5pt">src/backend/storage/buffer/bufmgr.c</span><pclass="MsoNormal"
style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4;
 
tab-stops:list 1.25in"><span style="font-size:11.5pt">(16)</span><span
style="font-size:11.5pt">src/backend/storage/buffer/buf_table.c</span><pclass="MsoNormal"
style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4;
 
tab-stops:list 1.25in"><span style="font-size:11.5pt">(17)</span><span
style="font-size:11.5pt">src/backend/storage/buffer/freelist.c</span><pclass="MsoNormal"
style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4;
 
tab-stops:list 1.25in"><span style="font-size:11.5pt">(18)</span><span
style="font-size:11.5pt">src/backend/executor/nodeSeqscan.c</span><pclass="MsoNormal"
style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4;
 
tab-stops:list 1.25in"><span style="font-size:11.5pt">(19)</span><span
style="font-size:11.5pt">src/backend/executor/execScan.c</span><pclass="MsoNormal"
style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4;
 
tab-stops:list 1.25in"><span style="font-size:11.5pt">(20)</span><span
style="font-size:11.5pt">src/backend/executor/execTuples.c</span><pclass="MsoNormal"><span
style="font-size:11.5pt"> </span><pclass="MsoNormal" style="margin-left:1.0in"><span
style="font-size:11.5pt"> </span><pclass="MsoNormal" style="margin-left:1.0in;text-indent:-.25in;mso-list:l2 level2
lfo2;
tab-stops:list 1.0in"><b><span style="font-size:11.5pt">(2)<span style="font:7.0pt "Times New Roman"">  
</span></span></b><b><u><spanstyle="font-size:11.5pt">Function Call Trace:</span></u></b><p class="MsoNormal"
style="margin-left:.75in"><spanstyle="font-size:11.5pt"> </span><p class="MsoNormal" style="margin-left:.75in"><span
style="font-size:11.5pt"> </span><pclass="MsoNormal" style="margin-left:1.0in"><b><i><u><span style="font-size:
 
11.5pt">Notation Convention</span></u></i></b><i><u><span style="font-size:
11.5pt">:</span></u></i><span style="font-size:11.5pt"> (i) Arrow �</span><span
style="font-size:11.5pt;font-family:Wingdings;mso-ascii-font-family:"TimesNew Roman";
 
mso-hansi-font-family:"Times New Roman";mso-char-type:symbol;mso-symbol-font-family:
Wingdings"><span style="mso-char-type:symbol;mso-symbol-font-family:Wingdings">�</span></span><span
style="font-size:11.5pt">�will represent call</span><p class="MsoNormal" style="margin-left:1.0in"><span
style="font-size:11.5pt"><spanstyle="mso-tab-count:3">����������������������������������� </span>(ii) Functions are
listedas</span><p class="MsoNormal" style="margin-left:1.0in"><span style="font-size:11.5pt"><span style="mso-spacerun:
yes">�</span><FileName></span><b><spanstyle="font-size:15.5pt">.</span></b><span style="font-size:11.5pt">
<FunctionName>for example common. SendQuery<span style="mso-spacerun:
 
yes">� </span>i.e. SendQuery function is defined in file name common</span><p class="MsoNormal"
style="margin-left:1.0in"><spanstyle="font-size:11.5pt"> </span><p class="MsoNormal"
style="margin-left:2.0in;text-indent:-.5in;mso-list:l5level2 lfo4;
 
tab-stops:list 2.0in"><b><span style="font-size:11.5pt">(i)<span style="font:7.0pt "Times New Roman"">                
</span></span></b><b><u><spanstyle="font-size:11.5pt">Query submitted to psql</span></u></b><p class="MsoNormal"
style="margin-left:1.5in"><b><u><spanstyle="font-size:11.5pt"> </span></u></b><p class="MsoNormal"
style="margin-left:1.5in"><spanstyle="font-size:11.5pt;
 
color:blue">mainloop.Mainloop</span><span style="font-size:11.5pt;font-family:
Wingdings;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman";
color:blue;mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span
style="mso-char-type:symbol;mso-symbol-font-family:Wingdings">�</span></span><spanstyle="font-size:11.5pt;color:blue">
common.SendQuery</span><pclass="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt;
 
color:blue">common.SendQuery </span><span style="font-size:11.5pt;font-family:
Wingdings;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman";
color:blue;mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span
style="mso-char-type:symbol;mso-symbol-font-family:Wingdings">�</span></span><spanstyle="font-size:11.5pt;color:blue">
fe-exec.PQexec</span><pclass="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt"> </span><p
class="MsoNormal"style="margin-left:1.75in;text-indent:-.25in;mso-list:l7 level1 lfo6;
 
tab-stops:list 1.75in"><span style="font-size:11.5pt">-<span style="font:7.0pt "Times New Roman"">         
</span></span><spanstyle="font-size:11.5pt">SendQuery send the query string to the backen and is a �front door� way to
senda query.</span><p class="MsoNormal" style="margin-left:1.75in;text-indent:-.25in;mso-list:l7 level1 lfo6;
 
tab-stops:list 1.75in"><span style="font-size:11.5pt">-<span style="font:7.0pt "Times New Roman"">         
</span></span><spanstyle="font-size:11.5pt">PQexec sends the query to the backend and package up the results.</span><p
class="MsoNormal"style="margin-left:1.5in"><span style="font-size:11.5pt"> </span><p class="MsoNormal"
style="margin-left:2.0in;text-indent:-.5in;mso-list:l5level2 lfo4;
 
tab-stops:list 2.0in"><b><span style="font-size:11.5pt">(ii)<span style="font:7.0pt "Times New Roman"">              
</span></span></b><b><u><spanstyle="font-size:11.5pt">Query Execution from �postgres� backend process</span></u></b><p
class="MsoNormal"style="margin-left:1.5in"><span style="font-size:11.5pt"> </span><p class="MsoNormal"
style="margin-left:1.5in"><spanstyle="font-size:11.5pt;
 
color:blue">postgres. pg_exec_query_string </span><span style="font-size:11.5pt;
font-family:Wingdings;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman";color:blue;mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span
style="mso-char-type:symbol;mso-symbol-font-family:Wingdings">�</span></span><spanstyle="font-size:11.5pt;color:blue">
pquery.ProcessQuery</span><pclass="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt;
 
color:blue">pquery.ProcessQuery </span><span style="font-size:11.5pt;
font-family:Wingdings;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman";color:blue;mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span
style="mso-char-type:symbol;mso-symbol-font-family:Wingdings">�</span></span><spanstyle="font-size:11.5pt;color:blue">
execMain.ExecutorStart</span><p class="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt"> </span><p
class="MsoNormal"style="margin-left:1.5in"><span style="font-size:11.5pt">(This routine is called at the beginning of
anyexecution of any</span><p class="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt">query
plan)</span><pclass="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt;
 
color:blue">pquery.ProcessQuery </span><span style="font-size:11.5pt;
font-family:Wingdings;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman";color:blue;mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span
style="mso-char-type:symbol;mso-symbol-font-family:Wingdings">�</span></span><spanstyle="font-size:11.5pt;color:blue">
execMain.ExecutorRun</span><p class="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt">( This is the
mainroutine of the executor module. It accepts</span><p class="MsoNormal" style="margin-left:1.5in"><span
style="font-size:11.5pt"><spanstyle="mso-spacerun: yes">�� </span>the query descriptor from the traffic cop and
executesthe</span><p class="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt"><span
style="mso-spacerun:yes">�� </span>query plan.)</span><span style="font-size:
 
11.0pt;mso-bidi-font-size:11.5pt"></span><p class="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt;
color:blue">pquery.ProcessQuery </span><span style="font-size:11.5pt;
font-family:Wingdings;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:
"Times New Roman";color:blue;mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span
style="mso-char-type:symbol;mso-symbol-font-family:Wingdings">�</span></span><spanstyle="font-size:11.5pt;color:blue">
execMain.ExecutorEnd</span><p class="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt">(This routine
mustbe called at the end of execution of any</span><p class="MsoNormal" style="margin-left:1.5in"><span
style="font-size:11.5pt"><spanstyle="mso-spacerun: yes">�</span>query plan)</span><p class="MsoNormal"
style="margin-left:1.5in"><spanstyle="font-size:11.5pt"> </span><p class="MsoNormal"><span
style="font-size:11.5pt"> </span><pclass="MsoNormal"><span style="font-size:11.5pt">we will follow the execution trace
fromexecMain.ExecutorRun</span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><p
class="MsoBodyText">execMain.ExecutorRun<span style="font-family:Wingdings;
 
mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman";
mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span style="mso-char-type:
symbol;mso-symbol-font-family:Wingdings">�</span></span> execMain.ExecutePlan <span
style="font-family:Wingdings;mso-ascii-font-family:"TimesNew Roman";
 
mso-hansi-font-family:"Times New Roman";mso-char-type:symbol;mso-symbol-font-family:
Wingdings"><span style="mso-char-type:symbol;mso-symbol-font-family:Wingdings">�</span></span>
execProcnode.ExecProcNode<span style="font-family:Wingdings;mso-ascii-font-family:
 
"Times New Roman";mso-hansi-font-family:"Times New Roman";mso-char-type:symbol;
mso-symbol-font-family:Wingdings"><span style="mso-char-type:symbol;mso-symbol-font-family:
Wingdings">�</span></span> nodeSeqscan.ExecSeqScan <span style="font-family:
Wingdings;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman";
mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span style="mso-char-type:
symbol;mso-symbol-font-family:Wingdings">�</span></span> execScan.ExecScan <p class="MsoNormal"><span
style="font-size:11.5pt"> </span><pclass="MsoNormal"><span style="font-size:11.5pt">ExecSeqScan, scans the relation
sequentiallyand returns the next qualifying</span><p class="MsoNormal"><span style="font-size:11.5pt">tuple. It
passes<spanstyle="mso-spacerun: yes">� </span>�<b><i>SeqNex</i></b><i>t</i>� as the access method to ExecScan as one of
theparameters.</span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><p class="MsoNormal"><span
style="font-size:11.5pt">SeqNextis actually a function defined in the file nodeSeqscan this function is called through
afunction pointer (<b><i>slot=(*accessMtd) (node)</i></b>) in an infinite for loop untill the slot returned by it
containsNULL.</span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><p
class="MsoBodyText">execScan.ExecScan<span style="font-family:Wingdings;
 
mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman";
mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span style="mso-char-type:
symbol;mso-symbol-font-family:Wingdings">�</span></span> nodeSeqscan.SeqNext <span
style="font-family:Wingdings;mso-ascii-font-family:"TimesNew Roman";
 
mso-hansi-font-family:"Times New Roman";mso-char-type:symbol;mso-symbol-font-family:
Wingdings"><span style="mso-char-type:symbol;mso-symbol-font-family:Wingdings">�</span></span> heapam.heap_getnext
<spanstyle="font-family:Wingdings;mso-ascii-font-family:
 
"Times New Roman";mso-hansi-font-family:"Times New Roman";mso-char-type:symbol;
mso-symbol-font-family:Wingdings"><span style="mso-char-type:symbol;mso-symbol-font-family:
Wingdings">�</span></span> heapam.heapgettup<p class="MsoNormal"><span style="font-size:11.5pt"> </span><p
class="MsoNormal"><spanstyle="font-size:11.5pt">heapgettup returns immediately if the relation is empty after making a
callto bufmgr.ReleaseBuffer. Otherwise bufmgr.ReleaseAndReadBuffer is called.</span><p class="MsoNormal"><span
style="font-size:11.5pt"> </span><pclass="MsoNormal"><span
style="font-size:11.5pt;color:blue">bufmgr.ReleaseAndReadBuffer</span><span
style="font-size:11.5pt;font-family:Wingdings;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman";color:blue;
mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span style="mso-char-type:
symbol;mso-symbol-font-family:Wingdings">�</span></span><span style="font-size:
11.5pt;color:blue"> bufmgr.ReadBufferInternal</span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><p
class="MsoNormal"><spanstyle="font-size:11.5pt">ReadBufferInternal calls smgr.smgrnblocks to get the accurate number of
POSTGRESblocks in the supplied relation if caller asks for <b><i>P_NEW</i></b>.</span><p class="MsoNormal"><span
style="font-size:11.5pt"> </span><pclass="MsoNormal"><span
style="font-size:11.5pt;color:blue">bufmgr.ReadBufferInternal</span><span
style="font-size:11.5pt;font-family:Wingdings;mso-ascii-font-family:
"Times New Roman";mso-hansi-font-family:"Times New Roman";color:blue;
mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span style="mso-char-type:
symbol;mso-symbol-font-family:Wingdings">�</span></span><span style="font-size:
11.5pt;color:blue"> bufmgr.BufferAlloc</span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><p
class="MsoNormal"><spanstyle="font-size:11.5pt">BufferAlloc creates a new buffer tag and search
(<b><i>buf_table.BufTableLookup</i></b>)it in the buffer pool whether it is already present in it. If found in the
bufferpool then it pins the buffer and starts the buffer IO (<b><i>bufmgr.StartBufferIO</i></b>)</span><p
class="MsoNormal"><spanstyle="font-size:11.5pt"> </span><p class="MsoNormal"><span style="font-size:11.5pt"><span
style="mso-spacerun:
yes">�</span>If it is not found in the buffer pool. It<span style="mso-spacerun: yes">� </span>initializes a new
buffer.Grabs one from the free list by calling freelist.GetFreeBuffer which returns NULL if all the buffers are already
pinned.</span><pclass="MsoNormal"><span style="font-size:11.5pt"> </span><p class="MsoNormal"><span
style="font-size:11.5pt">BufferAllocat times calls freelist.UnpinBuffer for example if somebody pins the buffer while
thecurrent transaction was doing I/O. it clears its I/O flags, remove its pin and start all over again. (see comments
andcode in bufmgr.c from line 456 onward).</span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><p
class="MsoBodyText">bufmgr.BufferAlloc<span style="font-family:Wingdings;
 
mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman";
mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span style="mso-char-type:
symbol;mso-symbol-font-family:Wingdings">�</span></span> freelist.UnpinBuffer <span
style="font-family:Wingdings;mso-ascii-font-family:"TimesNew Roman";
 
mso-hansi-font-family:"Times New Roman";mso-char-type:symbol;mso-symbol-font-family:
Wingdings"><span style="mso-char-type:symbol;mso-symbol-font-family:Wingdings">�</span></span>
freelist.AddBufferToFreelist.<pclass="MsoNormal"><span style="font-size:11.5pt"> </span><p class="MsoNormal"><span
style="font-size:11.5pt"> </span><pclass="MsoNormal"><span style="font-size:11.5pt">( 3) <b><u>Analysis of<span
style="mso-spacerun:yes">� </span>Execution trace:</u></b></span><p class="MsoNormal"><span
style="font-size:11.5pt"> </span><pclass="MsoNormal" style="margin-left:39.0pt;text-indent:-.5in;mso-list:l0 level1
lfo8;
tab-stops:list 39.0pt"><span style="font-size:11.5pt">(i)<span style="font:7.0pt "Times New Roman"">                 
</span></span><b><i><spanstyle="font-size:11.5pt">freelist.AddBufferToFreelist</span></i></b><span
style="font-size:11.5pt">is the right place to implement any buffer replacement policy. Look at the comments of this
functionit clearly states </span><p class="MsoNormal" style="margin-left:3.0pt;text-indent:33.0pt"><span
style="font-size:11.5pt"> </span><pclass="MsoNormal" style="margin-left:3.0pt;text-indent:33.0pt"><span
style="font-size:11.5pt">�<i>Intheory, this is the only routine that needs to be changed if the buffer replacement
strategychanges. Just change</i></span><p class="MsoNormal" style="text-indent:.5in"><i><span
style="font-size:11.5pt">themanner in which buffers are added to the freelist queue. Currently, they are added on an
LRUbasis.</span></i><span style="font-size:11.5pt">�</span><p class="MsoNormal"><span
style="font-size:11.5pt"> </span><pclass="MsoNormal"><span style="font-size:11.5pt">(ii)We feel
<b><i>bufmgr.ReadBufferInternal</i></b>is the right place where we can guess whether the buffer access is Large
SequentialScan or not. We say this because </span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><p
class="MsoNormal"style="margin-left:27.75pt;text-indent:-18.75pt;mso-list:
 
l6 level1 lfo10;tab-stops:list 27.75pt"><span style="font-size:11.5pt">(a)<span style="font:7.0pt "Times New
Roman"">   </span></span><span style="font-size:11.5pt">We get a pointer (<b><i>reln</i></b>) to the
�<b><i>Relation</i></b>�as one of the parameter through which we can get the size of the relation (<b><i>size = reln
->rd_nblocks</i></b>) </span><p class="MsoNormal" style="margin-left:9.0pt"><span
style="font-size:11.5pt"> </span><pclass="MsoNormal" style="margin-left:27.75pt;text-indent:-18.75pt;mso-list:
 
l6 level1 lfo10;tab-stops:list 27.75pt"><span style="font-size:11.5pt">(b)<span style="font:7.0pt "Times New
Roman"">   </span></span><span style="font-size:11.5pt">Secondly if<span style="mso-spacerun: yes">� </span>the
�<b><i>blockNum</i></b>�(one of the parameters passesed to it) is equal to <b><i>P_NEW</i></b> (this contains
�<b><i>InvalidBlockNumber</i></b>�aconstant defined in block.h, means grow the file to get a new page),
<b><i>ReadBufferInternal</i></b>makes a call to the function <b><i>smgr.smgrnblocks</i></b> to get accurate file
length.</span><pclass="MsoNormal" style="margin-left:9.0pt"><span style="font-size:11.5pt"> </span><p class="MsoNormal"
style="margin-left:9.0pt"><spanstyle="font-size:11.5pt">Hence at this point we have accurate knowledge of size of the
relationbeing scaned.</span><p class="MsoNormal" style="margin-left:9.0pt"><span style="font-size:11.5pt"> </span><p
class="MsoNormal"><spanstyle="font-size:11.5pt">(iii)<span style="mso-spacerun: yes">� </span>An external varable
�<b><i>NBuffers</i></b>�declared in <b><i>bufmgr.h</i></b> and defined in <b><i>globals.c</i></b>, contains the size of
sharedbuffer. </span><p class="MsoNormal" style="margin-left:9.0pt"><span style="font-size:11.5pt"> </span><p
class="MsoNormal"><spanstyle="font-size:11.5pt">(iv) Hence in this function we can make a guess whether the scan is
largesequential or not by comparing relation size and shared buffer size. If relation�s size is<span
style="mso-spacerun:yes">������ </span>greater than shared buffer�s size clearly it is a large sequential
scan.</span><pclass="MsoNormal" style="margin-left:9.0pt"><span style="font-size:11.5pt"> </span><p
class="MsoNormal"><spanstyle="font-size:11.5pt">(4) <b><u>First Probable Implementation Details:</u></b></span><p
class="MsoNormal"><spanstyle="font-size:11.5pt"> </span><p class="MsoNormal"
style="margin-left:1.0in;text-indent:-.5in;mso-list:l4level1 lfo12;
 
tab-stops:list 1.0in"><span style="font-size:11.5pt">(i)<span style="font:7.0pt "Times New Roman"">                 
</span></span><spanstyle="font-size:11.5pt">We are thinking of declaring constant in bufmgr.h as following</span><p
class="MsoNormal"style="margin-left:1.0in"><span style="font-size:11.5pt;
 
color:blue">#define LRU <span style="mso-spacerun: yes">�</span>1</span><p class="MsoNormal"
style="margin-left:1.0in"><spanstyle="font-size:11.5pt;
 
color:blue">#define MRU<span style="mso-spacerun: yes">� </span>2</span><p class="MsoNormal"
style="margin-left:1.0in;text-indent:-.5in;mso-list:l4level1 lfo12;
 
tab-stops:list 1.0in"><span style="font-size:11.5pt">(ii)<span style="font:7.0pt "Times New Roman"">               
</span></span><spanstyle="font-size:11.5pt">These constant can be passes further upto
<b><i>freelist.AddBufferToFreelist</i></b>.In this function after making a check appropriate policy can be
applied.</span><pclass="MsoNormal" style="margin-left:1.0in;text-indent:-.5in;mso-list:l4 level1 lfo12;
 
tab-stops:list 1.0in"><b><span style="font-size:11.5pt">(iii)<span style="font:7.0pt "Times New Roman"">            
</span></span></b><b><u><spanstyle="font-size:11.5pt">Functions affected by the implementation:</span></u></b><p
class="MsoNormal"style="margin-left:1.0in"><span style="font-size:11.5pt">Nearly all the functions in the freelist.c
willbe changed to implement MRU in addition two functions <b><i>bufmgr.ReadBufferInternal</i></b> and
<b><i>bufmgr.BufferAlloc</i></b>are most likely to be changed.</span><p class="MsoNormal"><span
style="font-size:11.5pt"> </span><pclass="MsoNormal"><b><u><span style="font-size:11.5pt">Second Probable
ImplementationDetails:</span></u></b><p class="MsoNormal"><span style="font-size:11.5pt"> </span><p
class="MsoNormal"><spanstyle="font-size:11.5pt">We assume that LRU�s �<b><i>SharedFreeList</i></b>� circular queue
keepsall the least recently used buffers at the head of the queue and most recently used buffers at the tail. If this
isthe case then there is absolutely no need to change the AddBufferToFreelist function in freelist.c, let it add the
buffersto shared free list as it is presently doing. We just change the <b><i>freelist.GetFreeBuffer</i></b> so that
insteadof removing (LRU) buffers from head of the queue for MRU scheme we will remove the (MRU) buffers from its tail.
(<i>we are afraid whether GetFreeBuffer is the only function that gives out the buffer from free list. Just in case if
thereis some other function too, then, yet we have to trace it out </i>)<span style="mso-spacerun: yes">�
</span></span><pclass="MsoNormal"><span style="font-size:11.5pt"> </span><p class="MsoNormal"><b><u><span
style="font-size:14.0pt;mso-bidi-font-size:11.5pt;
font-family:"Courier New"">PART II:</span></u></b><p class="MsoNormal"><b><u><span
style="font-size:14.0pt;mso-bidi-font-size:11.5pt;
font-family:"Courier New""> </span></u></b><p class="MsoNormal"><b><u>Testing Objective: </u></b><p
class="MsoNormal"><spanstyle="font-size:14.0pt;mso-bidi-font-size:12.0pt"> </span><p class="MsoNormal"
style="margin-left:1.0in;text-indent:-.5in;mso-list:l1level1 lfo13;
 
tab-stops:list 1.0in"><span style="font-size:14.0pt;
mso-bidi-font-size:12.0pt">(i)<span style="font:7.0pt "Times New Roman"">                </span></span>First, We have
totest performance of new alogrithm MRU that we intend to implement. Then we have to look whether we need specific
sequentialscan code or not.<span style="font-size:14.0pt;mso-bidi-font-size:
 
12.0pt"></span><p class="MsoNormal" style="margin-left:1.0in;text-indent:-.5in;mso-list:l1 level1 lfo13;
tab-stops:list 1.0in"><span style="font-size:14.0pt;
mso-bidi-font-size:12.0pt">(ii)<span style="font:7.0pt "Times New Roman"">              </span></span>Second, We would
liketo compare the performance of MRU with that of LRU-K algorithm and more specifically LRU-2.<span
style="font-size:14.0pt;mso-bidi-font-size:12.0pt"></span><pclass="MsoNormal"> <p class="MsoNormal"><b><u>LRU-K a brief
introduction:</u></b><pclass="MsoNormal"> <p class="MsoNormal">LRU-K is a new approach to database disk buffering
proposedby Elizabeth J. O�Neil, Patrick E. O�Neil and Gerhard Weikum. The basic idea of LRU-K is to keep track of the
timesof the last K references to popular database pages, using this information to statistically estimate the
interarrivaltimes of references on a page to page basis. The algorithm is self-tuning and adaptive to evolving access
patterns.<p class="MsoNormal"> <p class="MsoNormal">A patch for LRU-2 was provided to us by Mr. Bruce Momjian. The
algorithmwas implemented and tested by Mr Tom Lane. Both are from PostgreSQL development group. Mr. Tom Lane had tested
thealgorithm but his test were not that good and the tests did not contain any sequential scan. So it would be an
interestingexercise.<p class="MsoNormal"> <p class="MsoNormal"><b><u>Testing Plan: </u></b><p class="MsoNormal"> <p
class="MsoNormal"style="margin-left:1.0in;text-indent:-.5in;mso-list:l3 level1 lfo14;
 
tab-stops:list 1.0in">(i)<span style="font:7.0pt "Times New Roman"">                  </span>We plan to do a big
sequentialscan and at the same time do a small sequential scan<span style="font-family:Helvetica"> </span>What should
happenis that the small sequntial scan should keep its pages in the buffer while the large scan is happening<span
style="color:black"></span>if it doesn't, we have a problem.<p class="MsoNormal"
style="margin-left:1.0in;text-indent:-.5in;mso-list:l3level1 lfo14;
 
tab-stops:list 1.0in">(ii)<span style="font:7.0pt "Times New Roman"">                </span>We have populated two
tablesnamely simple and simple1 with only one integer field. We have populated these tables with 8192 values. We have
tomake one of the table even more bigger so that one will be used for large sequential scan and another will be used
forsmall sequential scan.<p class="MsoNormal" style="margin-left:1.0in;text-indent:-.5in;mso-list:l3 level1 lfo14;
 
tab-stops:list 1.0in">(iii)<span style="font:7.0pt "Times New Roman"">               </span>Then we plan to put some
simplequeries like given below in a file say �samplequeries.txt�<p class="MsoNormal" style="margin-left:1.0in"> <p
class="MsoNormal"style="margin-left:1.0in"><span style="color:blue">Select * from simple where x = 98765432;</span><p
class="MsoNormal"style="margin-left:1.0in"><span style="color:blue">Select * from simple1 where y = 567890432;</span><p
class="MsoNormal"style="margin-left:1.0in"> <p class="MsoNormal" style="margin-left:1.0in">Where lets assume simple is
largetable and simple1 is smaller table. The large file is should have to be larger than shared buffer size and smaller
fileshould have to be say 10% smaller than shared buffer size. <p class="MsoNormal" style="margin-left:1.0in"> <p
class="MsoNormal"style="margin-left:1.0in;text-indent:-.5in;mso-list:l3 level1 lfo14;
 
tab-stops:list 1.0in">(iv)<span style="font:7.0pt "Times New Roman"">              </span>Then we have to run
<b>psql</b>with <b>�f </b>flag giving the file as parameter. While doing that we will keep track of the time.<p
class="MsoNormal"style="margin-left:.5in"> <p class="MsoNormal" style="margin-left:1.0in;text-indent:-.5in;mso-list:l3
level1lfo14;
 
tab-stops:list 1.0in">(v)<span style="font:7.0pt "Times New Roman"">                </span>The other way is that we
haveset the various statistics flags true like parser statistics, planner statistics, executor statistics etc. All the
statisticsgets collected into a logfile. <p class="MsoNormal" style="margin-left:1.0in">Following is one of the sample
statisticsgenerated during our sample query test run on default LRU algorithm:<p class="MsoNormal"
style="margin-left:1.0in"> <pclass="MsoNormal" style="margin-left:1.0in"><span style="color:blue">2002-02-22 18:40:58
DEBUG:<spanstyle="mso-spacerun: yes">� </span>EXECUTOR STATISTICS</span><p class="MsoNormal"
style="margin-left:1.0in"><spanstyle="color:blue">! system usage stats:</span><p class="MsoNormal"
style="margin-left:1.0in"><spanstyle="color:blue">!<span style="mso-spacerun: yes">������ </span>0.050108 elapsed
0.000000user 0.000000 system sec</span><p class="MsoNormal" style="margin-left:1.0in"><span style="color:blue">!<span
style="mso-spacerun:yes">������ </span>[0.050000 user 0.130000 sys total]</span><p class="MsoNormal"
style="margin-left:1.0in"><spanstyle="color:blue">!<span style="mso-spacerun: yes">������ </span>4/0 [50/0] filesystem
blocksin/out</span><p class="MsoNormal" style="margin-left:1.0in"><span style="color:blue">!<span style="mso-spacerun:
yes">������</span>4/0 [50/0] page faults/reclaims, 0 [0] swaps</span><p class="MsoNormal"
style="margin-left:1.0in"><spanstyle="color:blue">!<span style="mso-spacerun: yes">������ </span>0 [0] signals rcvd,
0/0[2/5] messages rcvd/sent</span><p class="MsoNormal" style="margin-left:1.0in"><span style="color:blue">!<span
style="mso-spacerun:yes">������ </span>4/0 [149/17] voluntary/involuntary context switches</span><p class="MsoNormal"
style="margin-left:1.0in"><spanstyle="color:blue">! postgres usage stats:</span><p class="MsoNormal"
style="margin-left:1.0in"><spanstyle="color:blue">!<span style="mso-spacerun: yes">������ </span>Shared blocks:<span
style="mso-spacerun:yes">��������� </span>3 read,<span style="mso-spacerun:
 
yes">��������� </span>0 written, buffer hit rate = 25.00%</span><p class="MsoNormal" style="margin-left:1.0in"><span
style="color:blue">!<spanstyle="mso-spacerun: yes">������ </span>Local<span style="mso-spacerun: yes">�
</span>blocks:<spanstyle="mso-spacerun: yes">��������� </span>0 read,<span style="mso-spacerun: yes">��������� </span>0
written,buffer hit rate = 0.00%</span><p class="MsoNormal" style="margin-left:1.0in"><span style="color:blue">!<span
style="mso-spacerun:yes">������ </span>Direct blocks:<span style="mso-spacerun: yes">��������� </span>0 read,<span
style="mso-spacerun:
yes">��������� </span>0 written</span><p class="MsoNormal" style="margin-left:1.0in"> <p class="MsoNormal"
style="margin-left:1.0in;text-indent:-.5in;mso-list:l3level1 lfo14;
 
tab-stops:list 1.0in">(vi)<span style="font:7.0pt "Times New Roman"">              </span>Another way is to run the
querywith <b>EXPLAIN</b> command in <b>VERBOSE</b> and <b>ANALYZE</b> mode. This prints the total execution time of the
queryalong with other matrics like cost of sequential scan startup and total cost etc.<p class="MsoNormal"
style="margin-left:.5in"> <pclass="MsoNormal"> <p class="MsoNormal"><b><u>Yardstick of Measurement:</u></b><p
class="MsoNormal"style="margin-left:1.0in"> <p class="MsoNormal" style="margin-left:1.5in;text-indent:-.5in;mso-list:l3
level2lfo14;
 
tab-stops:list 1.5in">(i)<span style="font:7.0pt "Times New Roman"">                  </span>We will run the test
querieson MRU and LRU-2 implementation. If the execution time for MRU is less than that of LRU-2 then We have to think
ofhaving a specific code for sequential scan.<p class="MsoNormal" style="margin-left:1.0in"> <p class="MsoNormal"
style="margin-left:1.5in;text-indent:-.5in;mso-list:l3level2 lfo14;
 
tab-stops:list 1.5in">(ii)<span style="font:7.0pt "Times New Roman"">                </span>Otherwise if the execution
timeof LRU-2 is less than or close to the MRU execution time than we would suggest replacing the default LRU code of
PostgreSQLwith LRU-2 algorithm. Since than there will be only one alogrithm for all the access patterns and no need of
�advicing�the buffer manager for replacement policy and no need of having different algorithms for large sequential
access.<pclass="MsoNormal"><span style="font-size:14.0pt;mso-bidi-font-size:12.0pt"><span
style="mso-tab-count:1">���������</span></span><p class="MsoNormal"> <p class="MsoNormal"><b><u>Conclusion:</u></b><p
class="MsoNormal"><b><u> </u></b><pclass="MsoNormal"><span style="mso-tab-count:1">����������� </span>To conclude,
fetchingdata from disk requires at least a factor of 1000 more time than fetching data from a RAM buffer. For this
reason,good use of the buffer can significantly improve the throughput and response time fo any data-intensive system.
Equallyimportant is choosing an efficient buffer replacement algorithm. Thus we aim at improving the performance of
PostgreSQLby studying which algorithm among MRU and LRU-2 will prove to be the best bet.<p class="MsoNormal"> <p
class="MsoNormal">Wethank Professor Kevin Chang and TAs for introducing us to such a life time opportunity and for
encouragingus to enhance PostgreSQL.<p class="MsoNormal"> <p class="MsoNormal">We thank Mr. Bruce Momjian for all his
timelyhelp and advices that<span style="mso-spacerun: yes">� </span>he had rendered to us whole heartedly. It is his
encouragementthat our efforts are directed twords really contributing to PostgreSQL development.<p> Amit Kumar Khare <p
class="MsoNormal"style="margin-left:.5in"><span style="font-size:14.0pt;
 
mso-bidi-font-size:12.0pt"> </span><p class="MsoNormal"><span style="font-size:14.0pt;mso-bidi-font-size:11.5pt;
font-family:"Courier New""> </span><p class="MsoNormal" style="margin-left:.5in"><span
style="font-size:11.5pt"> </span></div><p><br/><hr size="1" /><b>Do You Yahoo!?</b><br /><a
href="http://sports.yahoo.com/oly">Yahoo!Sports</a> - Coverage of the 2002 Olympic Games 

pgsql-hackers by date:

Previous
From: "Dave Cramer"
Date:
Subject: Open magazine article on open source rdbms
Next
From: Thomas Lockhart
Date:
Subject: Re: elog() proposal