Thread: Implementation Proposal For Add Free Behind Capability For Large Sequential Scan
Implementation Proposal For Add Free Behind Capability For Large Sequential Scan
From
Amit Kumar Khare
Date:
<div class="Section1"><p class="MsoNormal"><span style="mso-bidi-font-size:11.5pt;font-family:"Courier New"">Hi All,</span><pclass="MsoNormal"><span style="mso-bidi-font-size:11.5pt;font-family:"Courier New""> </span><p class="MsoNormal"><spanstyle="mso-bidi-font-size:11.5pt;font-family:"Courier New"">Here is at your disposal my implementationIdea of a TODO item titled �Add Free-Behind Capability For Large Sequential Scans�. I cordially invite yourcomments.</span><p class="MsoNormal"><span style="mso-bidi-font-size:11.5pt;font-family:"Courier New""> </span><p class="MsoNormal"><b><u><spanstyle="mso-bidi-font-size:11.5pt;font-family: "Courier New"">Part I:</span></u></b><p class="MsoNormal"><b><u><span style="font-size:11.5pt"> </span></u></b><p class="MsoNormal"><b><u><spanstyle="font-size:11.5pt">Problem:</span></u></b><span style="font-size:11.5pt"> The presentdefault page replacement policy of PostgreSQL is Least Recently Used (LRU). This policy may be good for any otherdatabase access patterns but it is not efficient for Large Sequential Scan, since it makes cache useless. For example:</span><pclass="MsoNormal"><span style="font-size:11.5pt">Let shared buffer size be 1 MB and the size of relationis 2MB. Lets assume sequential scan is applied on this relation over and over again. The cache becomes useless <spanstyle="color:black">because by the time the second read of the table happens the first 1mb has already been forced outof the cache. </span></span><p class="MsoNormal"><span style="font-size:11.5pt;color:black"> </span><p class="MsoNormal"><spanstyle="font-size:11.5pt;color:black">The problem appeared in the TODO list of PostgreSQL under sectionCACHE titled:</span><p class="MsoNormal"><span style="font-size:11.5pt;color:black">�<i>Add Free-Behind CapabilityFor Large Sequential Scans</i> �</span><p class="MsoNormal"><span style="font-size:11.5pt;color:black"> </span><pclass="MsoNormal"><span style="font-size:11.5pt;color:black">Solution: Inhis paper �<i>Operating System Support For Database Management</i>�, Professor Michael Stonebraker identifies various databaseaccess patterns in INGRES (grand father of PostgreSQL) suggest that for sequential access to blocks, which will becyclically rereferenced, the buffer management strategy should be to Toss Immediately, unless all n pages of relation canbe kept in the cache. He further suggest that for buffer management to be of some use to application like DBMS there shouldbe some provision that it accepts �advice� from an application concerning replacement strategy.</span><p class="MsoNormal"><spanstyle="font-size:11.5pt;color:black"> </span><p class="MsoNormal"><span style="font-size:11.5pt;color:black">Consideringthese suggestions as our baseline we have decided to implement Most RecentlyUsed (MRU) cache replacement policy. Our task will be to identify a large sequential scan and then advice the buffermanager to apply MRU for cache replacement.</span><p class="MsoNormal"><span style="font-size:11.5pt;color:black"> </span><pclass="MsoNormal"><b><u><span style="font-size:11.5pt">Detailed Plan of Implementation:</span></u></b><pclass="MsoNormal"><span style="font-size:11.5pt"> </span><p class="MsoNormal" style="margin-left:.75in;text-indent:-.5in;mso-list:l2level1 lfo2; tab-stops:list .75in"><b><span style="font-size:11.5pt">(I)<span style="font:7.0pt "Times New Roman""> </span></span></b><b><u><spanstyle="font-size:11.5pt">Identifying the query execution flow control</span></u></b><p class="MsoNormal"style="margin-left:.25in"><span style="font-size:11.5pt"> </span><p class="MsoNormal" style="margin-left:.5in;text-indent:.25in"><b><i><u><spanstyle="font-size:11.5pt">Goal of identifying the query executionflow:</span></u></i></b><span style="font-size:11.5pt"> (i) This is necessary as we have to figure out a spot wherewe will identify that the block access is actually a large sequential scan and from this point onward we think to sendadvice (in form of parameter ) to buffer manager.<span style="mso-spacerun: yes">� </span></span><p class="MsoNormal"style="margin-left:.5in;text-indent:.5in"><span style="font-size:11.5pt">(ii) Through this query executiontrace our aim is to reach at one of the files listed below named �freelist.c�. This is the file where all the funcitionsrelated to replacement policy are implemented. Hence our trace ends after reaching �freelist.c�.</span><p class="MsoNormal"style="margin-left:.75in"><span style="font-size:11.5pt"> </span><p class="MsoNormal" style="margin-left:.5in"><spanstyle="font-size:11.5pt"> </span><p class="MsoNormal" style="margin-left:1.0in;text-indent:-.25in;mso-list:l2level2 lfo2; tab-stops:list 1.0in"><b><span style="font-size:11.5pt">(1)<span style="font:7.0pt "Times New Roman""> </span></span></b><b><u><spanstyle="font-size:11.5pt">Files Involved (Important files listed):</span></u></b><p class="MsoNormal"style="margin-left:.75in"><b><u><span style="font-size:11.5pt"> </span></u></b><p class="MsoNormal" style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4; tab-stops:list 1.25in"><span style="font-size:11.5pt">(1)<span style="font:7.0pt "Times New Roman""> </span></span><spanstyle="font-size:11.5pt">bin/psql/mainloop.c</span><p class="MsoNormal" style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4; tab-stops:list 1.25in"><span style="font-size:11.5pt">(2)<span style="font:7.0pt "Times New Roman""> </span></span><spanstyle="font-size:11.5pt">bin/psql/common.c</span><p class="MsoNormal" style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4; tab-stops:list 1.25in"><span style="font-size:11.5pt">(3)<span style="font:7.0pt "Times New Roman""> </span></span><spanstyle="font-size:11.5pt">src/interfaces/libpq/fe-exec.c</span><p class="MsoNormal" style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4; tab-stops:list 1.25in"><span style="font-size:11.5pt">(4)<span style="font:7.0pt "Times New Roman""> </span></span><spanstyle="font-size:11.5pt">src/backend/tcop/postgres.c</span><p class="MsoNormal" style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4; tab-stops:list 1.25in"><span style="font-size:11.5pt">(5)<span style="font:7.0pt "Times New Roman""> </span></span><spanstyle="font-size:11.5pt">src/backend/tcop/pquery.c</span><p class="MsoNormal" style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4; tab-stops:list 1.25in"><span style="font-size:11.5pt">(6)<span style="font:7.0pt "Times New Roman""> </span></span><spanstyle="font-size:11.5pt">src/backend/executor/execMain.c</span><p class="MsoNormal" style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4; tab-stops:list 1.25in"><span style="font-size:11.5pt">(7)<span style="font:7.0pt "Times New Roman""> </span></span><spanstyle="font-size:11.5pt">src/backend/executor/execProcnode.c</span><p class="MsoNormal" style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4; tab-stops:list 1.25in"><span style="font-size:11.5pt">(8)<span style="font:7.0pt "Times New Roman""> </span></span><spanstyle="font-size:11.5pt">src/backend/access/heap/heapam.c</span><p class="MsoNormal" style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4; tab-stops:list 1.25in"><span style="font-size:11.5pt">(9)<span style="font:7.0pt "Times New Roman""> </span></span><spanstyle="font-size:11.5pt">src/backend/utils/cache/relcache.c</span><p class="MsoNormal" style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4; tab-stops:list 1.25in"><span style="font-size:11.5pt">(10)</span><span style="font-size:11.5pt">src/backend/utils/cache/relcache.c</span><pclass="MsoNormal" style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4; tab-stops:list 1.25in"><span style="font-size:11.5pt">(11)</span><span style="font-size:11.5pt">src/backend/utils/hash/dynahash.c</span><pclass="MsoNormal" style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4; tab-stops:list 1.25in"><span style="font-size:11.5pt">(12)</span><span style="font-size:11.5pt">src/backend/storage/smgr/smgr.c</span><pclass="MsoNormal" style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4; tab-stops:list 1.25in"><span style="font-size:11.5pt">(13)</span><span style="font-size:11.5pt">src/include/utils/rel.h</span><pclass="MsoNormal" style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4; tab-stops:list 1.25in"><span style="font-size:11.5pt">(14)</span><span style="font-size:11.5pt">src/backend/access/common/scankey.c</span><pclass="MsoNormal" style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4; tab-stops:list 1.25in"><span style="font-size:11.5pt">(15)</span><span style="font-size:11.5pt">src/backend/storage/buffer/bufmgr.c</span><pclass="MsoNormal" style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4; tab-stops:list 1.25in"><span style="font-size:11.5pt">(16)</span><span style="font-size:11.5pt">src/backend/storage/buffer/buf_table.c</span><pclass="MsoNormal" style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4; tab-stops:list 1.25in"><span style="font-size:11.5pt">(17)</span><span style="font-size:11.5pt">src/backend/storage/buffer/freelist.c</span><pclass="MsoNormal" style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4; tab-stops:list 1.25in"><span style="font-size:11.5pt">(18)</span><span style="font-size:11.5pt">src/backend/executor/nodeSeqscan.c</span><pclass="MsoNormal" style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4; tab-stops:list 1.25in"><span style="font-size:11.5pt">(19)</span><span style="font-size:11.5pt">src/backend/executor/execScan.c</span><pclass="MsoNormal" style="margin-left:1.25in;text-indent:-.25in;mso-list:l5level1 lfo4; tab-stops:list 1.25in"><span style="font-size:11.5pt">(20)</span><span style="font-size:11.5pt">src/backend/executor/execTuples.c</span><pclass="MsoNormal"><span style="font-size:11.5pt"> </span><pclass="MsoNormal" style="margin-left:1.0in"><span style="font-size:11.5pt"> </span><pclass="MsoNormal" style="margin-left:1.0in;text-indent:-.25in;mso-list:l2 level2 lfo2; tab-stops:list 1.0in"><b><span style="font-size:11.5pt">(2)<span style="font:7.0pt "Times New Roman""> </span></span></b><b><u><spanstyle="font-size:11.5pt">Function Call Trace:</span></u></b><p class="MsoNormal" style="margin-left:.75in"><spanstyle="font-size:11.5pt"> </span><p class="MsoNormal" style="margin-left:.75in"><span style="font-size:11.5pt"> </span><pclass="MsoNormal" style="margin-left:1.0in"><b><i><u><span style="font-size: 11.5pt">Notation Convention</span></u></i></b><i><u><span style="font-size: 11.5pt">:</span></u></i><span style="font-size:11.5pt"> (i) Arrow �</span><span style="font-size:11.5pt;font-family:Wingdings;mso-ascii-font-family:"TimesNew Roman"; mso-hansi-font-family:"Times New Roman";mso-char-type:symbol;mso-symbol-font-family: Wingdings"><span style="mso-char-type:symbol;mso-symbol-font-family:Wingdings">�</span></span><span style="font-size:11.5pt">�will represent call</span><p class="MsoNormal" style="margin-left:1.0in"><span style="font-size:11.5pt"><spanstyle="mso-tab-count:3">����������������������������������� </span>(ii) Functions are listedas</span><p class="MsoNormal" style="margin-left:1.0in"><span style="font-size:11.5pt"><span style="mso-spacerun: yes">�</span><FileName></span><b><spanstyle="font-size:15.5pt">.</span></b><span style="font-size:11.5pt"> <FunctionName>for example common. SendQuery<span style="mso-spacerun: yes">� </span>i.e. SendQuery function is defined in file name common</span><p class="MsoNormal" style="margin-left:1.0in"><spanstyle="font-size:11.5pt"> </span><p class="MsoNormal" style="margin-left:2.0in;text-indent:-.5in;mso-list:l5level2 lfo4; tab-stops:list 2.0in"><b><span style="font-size:11.5pt">(i)<span style="font:7.0pt "Times New Roman""> </span></span></b><b><u><spanstyle="font-size:11.5pt">Query submitted to psql</span></u></b><p class="MsoNormal" style="margin-left:1.5in"><b><u><spanstyle="font-size:11.5pt"> </span></u></b><p class="MsoNormal" style="margin-left:1.5in"><spanstyle="font-size:11.5pt; color:blue">mainloop.Mainloop</span><span style="font-size:11.5pt;font-family: Wingdings;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"; color:blue;mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span style="mso-char-type:symbol;mso-symbol-font-family:Wingdings">�</span></span><spanstyle="font-size:11.5pt;color:blue"> common.SendQuery</span><pclass="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt; color:blue">common.SendQuery </span><span style="font-size:11.5pt;font-family: Wingdings;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"; color:blue;mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span style="mso-char-type:symbol;mso-symbol-font-family:Wingdings">�</span></span><spanstyle="font-size:11.5pt;color:blue"> fe-exec.PQexec</span><pclass="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt"> </span><p class="MsoNormal"style="margin-left:1.75in;text-indent:-.25in;mso-list:l7 level1 lfo6; tab-stops:list 1.75in"><span style="font-size:11.5pt">-<span style="font:7.0pt "Times New Roman""> </span></span><spanstyle="font-size:11.5pt">SendQuery send the query string to the backen and is a �front door� way to senda query.</span><p class="MsoNormal" style="margin-left:1.75in;text-indent:-.25in;mso-list:l7 level1 lfo6; tab-stops:list 1.75in"><span style="font-size:11.5pt">-<span style="font:7.0pt "Times New Roman""> </span></span><spanstyle="font-size:11.5pt">PQexec sends the query to the backend and package up the results.</span><p class="MsoNormal"style="margin-left:1.5in"><span style="font-size:11.5pt"> </span><p class="MsoNormal" style="margin-left:2.0in;text-indent:-.5in;mso-list:l5level2 lfo4; tab-stops:list 2.0in"><b><span style="font-size:11.5pt">(ii)<span style="font:7.0pt "Times New Roman""> </span></span></b><b><u><spanstyle="font-size:11.5pt">Query Execution from �postgres� backend process</span></u></b><p class="MsoNormal"style="margin-left:1.5in"><span style="font-size:11.5pt"> </span><p class="MsoNormal" style="margin-left:1.5in"><spanstyle="font-size:11.5pt; color:blue">postgres. pg_exec_query_string </span><span style="font-size:11.5pt; font-family:Wingdings;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family: "Times New Roman";color:blue;mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span style="mso-char-type:symbol;mso-symbol-font-family:Wingdings">�</span></span><spanstyle="font-size:11.5pt;color:blue"> pquery.ProcessQuery</span><pclass="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt; color:blue">pquery.ProcessQuery </span><span style="font-size:11.5pt; font-family:Wingdings;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family: "Times New Roman";color:blue;mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span style="mso-char-type:symbol;mso-symbol-font-family:Wingdings">�</span></span><spanstyle="font-size:11.5pt;color:blue"> execMain.ExecutorStart</span><p class="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt"> </span><p class="MsoNormal"style="margin-left:1.5in"><span style="font-size:11.5pt">(This routine is called at the beginning of anyexecution of any</span><p class="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt">query plan)</span><pclass="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt; color:blue">pquery.ProcessQuery </span><span style="font-size:11.5pt; font-family:Wingdings;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family: "Times New Roman";color:blue;mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span style="mso-char-type:symbol;mso-symbol-font-family:Wingdings">�</span></span><spanstyle="font-size:11.5pt;color:blue"> execMain.ExecutorRun</span><p class="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt">( This is the mainroutine of the executor module. It accepts</span><p class="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt"><spanstyle="mso-spacerun: yes">�� </span>the query descriptor from the traffic cop and executesthe</span><p class="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt"><span style="mso-spacerun:yes">�� </span>query plan.)</span><span style="font-size: 11.0pt;mso-bidi-font-size:11.5pt"></span><p class="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt; color:blue">pquery.ProcessQuery </span><span style="font-size:11.5pt; font-family:Wingdings;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family: "Times New Roman";color:blue;mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span style="mso-char-type:symbol;mso-symbol-font-family:Wingdings">�</span></span><spanstyle="font-size:11.5pt;color:blue"> execMain.ExecutorEnd</span><p class="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt">(This routine mustbe called at the end of execution of any</span><p class="MsoNormal" style="margin-left:1.5in"><span style="font-size:11.5pt"><spanstyle="mso-spacerun: yes">�</span>query plan)</span><p class="MsoNormal" style="margin-left:1.5in"><spanstyle="font-size:11.5pt"> </span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><pclass="MsoNormal"><span style="font-size:11.5pt">we will follow the execution trace fromexecMain.ExecutorRun</span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><p class="MsoBodyText">execMain.ExecutorRun<span style="font-family:Wingdings; mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"; mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span style="mso-char-type: symbol;mso-symbol-font-family:Wingdings">�</span></span> execMain.ExecutePlan <span style="font-family:Wingdings;mso-ascii-font-family:"TimesNew Roman"; mso-hansi-font-family:"Times New Roman";mso-char-type:symbol;mso-symbol-font-family: Wingdings"><span style="mso-char-type:symbol;mso-symbol-font-family:Wingdings">�</span></span> execProcnode.ExecProcNode<span style="font-family:Wingdings;mso-ascii-font-family: "Times New Roman";mso-hansi-font-family:"Times New Roman";mso-char-type:symbol; mso-symbol-font-family:Wingdings"><span style="mso-char-type:symbol;mso-symbol-font-family: Wingdings">�</span></span> nodeSeqscan.ExecSeqScan <span style="font-family: Wingdings;mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"; mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span style="mso-char-type: symbol;mso-symbol-font-family:Wingdings">�</span></span> execScan.ExecScan <p class="MsoNormal"><span style="font-size:11.5pt"> </span><pclass="MsoNormal"><span style="font-size:11.5pt">ExecSeqScan, scans the relation sequentiallyand returns the next qualifying</span><p class="MsoNormal"><span style="font-size:11.5pt">tuple. It passes<spanstyle="mso-spacerun: yes">� </span>�<b><i>SeqNex</i></b><i>t</i>� as the access method to ExecScan as one of theparameters.</span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><p class="MsoNormal"><span style="font-size:11.5pt">SeqNextis actually a function defined in the file nodeSeqscan this function is called through afunction pointer (<b><i>slot=(*accessMtd) (node)</i></b>) in an infinite for loop untill the slot returned by it containsNULL.</span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><p class="MsoBodyText">execScan.ExecScan<span style="font-family:Wingdings; mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"; mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span style="mso-char-type: symbol;mso-symbol-font-family:Wingdings">�</span></span> nodeSeqscan.SeqNext <span style="font-family:Wingdings;mso-ascii-font-family:"TimesNew Roman"; mso-hansi-font-family:"Times New Roman";mso-char-type:symbol;mso-symbol-font-family: Wingdings"><span style="mso-char-type:symbol;mso-symbol-font-family:Wingdings">�</span></span> heapam.heap_getnext <spanstyle="font-family:Wingdings;mso-ascii-font-family: "Times New Roman";mso-hansi-font-family:"Times New Roman";mso-char-type:symbol; mso-symbol-font-family:Wingdings"><span style="mso-char-type:symbol;mso-symbol-font-family: Wingdings">�</span></span> heapam.heapgettup<p class="MsoNormal"><span style="font-size:11.5pt"> </span><p class="MsoNormal"><spanstyle="font-size:11.5pt">heapgettup returns immediately if the relation is empty after making a callto bufmgr.ReleaseBuffer. Otherwise bufmgr.ReleaseAndReadBuffer is called.</span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><pclass="MsoNormal"><span style="font-size:11.5pt;color:blue">bufmgr.ReleaseAndReadBuffer</span><span style="font-size:11.5pt;font-family:Wingdings;mso-ascii-font-family: "Times New Roman";mso-hansi-font-family:"Times New Roman";color:blue; mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span style="mso-char-type: symbol;mso-symbol-font-family:Wingdings">�</span></span><span style="font-size: 11.5pt;color:blue"> bufmgr.ReadBufferInternal</span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><p class="MsoNormal"><spanstyle="font-size:11.5pt">ReadBufferInternal calls smgr.smgrnblocks to get the accurate number of POSTGRESblocks in the supplied relation if caller asks for <b><i>P_NEW</i></b>.</span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><pclass="MsoNormal"><span style="font-size:11.5pt;color:blue">bufmgr.ReadBufferInternal</span><span style="font-size:11.5pt;font-family:Wingdings;mso-ascii-font-family: "Times New Roman";mso-hansi-font-family:"Times New Roman";color:blue; mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span style="mso-char-type: symbol;mso-symbol-font-family:Wingdings">�</span></span><span style="font-size: 11.5pt;color:blue"> bufmgr.BufferAlloc</span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><p class="MsoNormal"><spanstyle="font-size:11.5pt">BufferAlloc creates a new buffer tag and search (<b><i>buf_table.BufTableLookup</i></b>)it in the buffer pool whether it is already present in it. If found in the bufferpool then it pins the buffer and starts the buffer IO (<b><i>bufmgr.StartBufferIO</i></b>)</span><p class="MsoNormal"><spanstyle="font-size:11.5pt"> </span><p class="MsoNormal"><span style="font-size:11.5pt"><span style="mso-spacerun: yes">�</span>If it is not found in the buffer pool. It<span style="mso-spacerun: yes">� </span>initializes a new buffer.Grabs one from the free list by calling freelist.GetFreeBuffer which returns NULL if all the buffers are already pinned.</span><pclass="MsoNormal"><span style="font-size:11.5pt"> </span><p class="MsoNormal"><span style="font-size:11.5pt">BufferAllocat times calls freelist.UnpinBuffer for example if somebody pins the buffer while thecurrent transaction was doing I/O. it clears its I/O flags, remove its pin and start all over again. (see comments andcode in bufmgr.c from line 456 onward).</span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><p class="MsoBodyText">bufmgr.BufferAlloc<span style="font-family:Wingdings; mso-ascii-font-family:"Times New Roman";mso-hansi-font-family:"Times New Roman"; mso-char-type:symbol;mso-symbol-font-family:Wingdings"><span style="mso-char-type: symbol;mso-symbol-font-family:Wingdings">�</span></span> freelist.UnpinBuffer <span style="font-family:Wingdings;mso-ascii-font-family:"TimesNew Roman"; mso-hansi-font-family:"Times New Roman";mso-char-type:symbol;mso-symbol-font-family: Wingdings"><span style="mso-char-type:symbol;mso-symbol-font-family:Wingdings">�</span></span> freelist.AddBufferToFreelist.<pclass="MsoNormal"><span style="font-size:11.5pt"> </span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><pclass="MsoNormal"><span style="font-size:11.5pt">( 3) <b><u>Analysis of<span style="mso-spacerun:yes">� </span>Execution trace:</u></b></span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><pclass="MsoNormal" style="margin-left:39.0pt;text-indent:-.5in;mso-list:l0 level1 lfo8; tab-stops:list 39.0pt"><span style="font-size:11.5pt">(i)<span style="font:7.0pt "Times New Roman""> </span></span><b><i><spanstyle="font-size:11.5pt">freelist.AddBufferToFreelist</span></i></b><span style="font-size:11.5pt">is the right place to implement any buffer replacement policy. Look at the comments of this functionit clearly states </span><p class="MsoNormal" style="margin-left:3.0pt;text-indent:33.0pt"><span style="font-size:11.5pt"> </span><pclass="MsoNormal" style="margin-left:3.0pt;text-indent:33.0pt"><span style="font-size:11.5pt">�<i>Intheory, this is the only routine that needs to be changed if the buffer replacement strategychanges. Just change</i></span><p class="MsoNormal" style="text-indent:.5in"><i><span style="font-size:11.5pt">themanner in which buffers are added to the freelist queue. Currently, they are added on an LRUbasis.</span></i><span style="font-size:11.5pt">�</span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><pclass="MsoNormal"><span style="font-size:11.5pt">(ii)We feel <b><i>bufmgr.ReadBufferInternal</i></b>is the right place where we can guess whether the buffer access is Large SequentialScan or not. We say this because </span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><p class="MsoNormal"style="margin-left:27.75pt;text-indent:-18.75pt;mso-list: l6 level1 lfo10;tab-stops:list 27.75pt"><span style="font-size:11.5pt">(a)<span style="font:7.0pt "Times New Roman""> </span></span><span style="font-size:11.5pt">We get a pointer (<b><i>reln</i></b>) to the �<b><i>Relation</i></b>�as one of the parameter through which we can get the size of the relation (<b><i>size = reln ->rd_nblocks</i></b>) </span><p class="MsoNormal" style="margin-left:9.0pt"><span style="font-size:11.5pt"> </span><pclass="MsoNormal" style="margin-left:27.75pt;text-indent:-18.75pt;mso-list: l6 level1 lfo10;tab-stops:list 27.75pt"><span style="font-size:11.5pt">(b)<span style="font:7.0pt "Times New Roman""> </span></span><span style="font-size:11.5pt">Secondly if<span style="mso-spacerun: yes">� </span>the �<b><i>blockNum</i></b>�(one of the parameters passesed to it) is equal to <b><i>P_NEW</i></b> (this contains �<b><i>InvalidBlockNumber</i></b>�aconstant defined in block.h, means grow the file to get a new page), <b><i>ReadBufferInternal</i></b>makes a call to the function <b><i>smgr.smgrnblocks</i></b> to get accurate file length.</span><pclass="MsoNormal" style="margin-left:9.0pt"><span style="font-size:11.5pt"> </span><p class="MsoNormal" style="margin-left:9.0pt"><spanstyle="font-size:11.5pt">Hence at this point we have accurate knowledge of size of the relationbeing scaned.</span><p class="MsoNormal" style="margin-left:9.0pt"><span style="font-size:11.5pt"> </span><p class="MsoNormal"><spanstyle="font-size:11.5pt">(iii)<span style="mso-spacerun: yes">� </span>An external varable �<b><i>NBuffers</i></b>�declared in <b><i>bufmgr.h</i></b> and defined in <b><i>globals.c</i></b>, contains the size of sharedbuffer. </span><p class="MsoNormal" style="margin-left:9.0pt"><span style="font-size:11.5pt"> </span><p class="MsoNormal"><spanstyle="font-size:11.5pt">(iv) Hence in this function we can make a guess whether the scan is largesequential or not by comparing relation size and shared buffer size. If relation�s size is<span style="mso-spacerun:yes">������ </span>greater than shared buffer�s size clearly it is a large sequential scan.</span><pclass="MsoNormal" style="margin-left:9.0pt"><span style="font-size:11.5pt"> </span><p class="MsoNormal"><spanstyle="font-size:11.5pt">(4) <b><u>First Probable Implementation Details:</u></b></span><p class="MsoNormal"><spanstyle="font-size:11.5pt"> </span><p class="MsoNormal" style="margin-left:1.0in;text-indent:-.5in;mso-list:l4level1 lfo12; tab-stops:list 1.0in"><span style="font-size:11.5pt">(i)<span style="font:7.0pt "Times New Roman""> </span></span><spanstyle="font-size:11.5pt">We are thinking of declaring constant in bufmgr.h as following</span><p class="MsoNormal"style="margin-left:1.0in"><span style="font-size:11.5pt; color:blue">#define LRU <span style="mso-spacerun: yes">�</span>1</span><p class="MsoNormal" style="margin-left:1.0in"><spanstyle="font-size:11.5pt; color:blue">#define MRU<span style="mso-spacerun: yes">� </span>2</span><p class="MsoNormal" style="margin-left:1.0in;text-indent:-.5in;mso-list:l4level1 lfo12; tab-stops:list 1.0in"><span style="font-size:11.5pt">(ii)<span style="font:7.0pt "Times New Roman""> </span></span><spanstyle="font-size:11.5pt">These constant can be passes further upto <b><i>freelist.AddBufferToFreelist</i></b>.In this function after making a check appropriate policy can be applied.</span><pclass="MsoNormal" style="margin-left:1.0in;text-indent:-.5in;mso-list:l4 level1 lfo12; tab-stops:list 1.0in"><b><span style="font-size:11.5pt">(iii)<span style="font:7.0pt "Times New Roman""> </span></span></b><b><u><spanstyle="font-size:11.5pt">Functions affected by the implementation:</span></u></b><p class="MsoNormal"style="margin-left:1.0in"><span style="font-size:11.5pt">Nearly all the functions in the freelist.c willbe changed to implement MRU in addition two functions <b><i>bufmgr.ReadBufferInternal</i></b> and <b><i>bufmgr.BufferAlloc</i></b>are most likely to be changed.</span><p class="MsoNormal"><span style="font-size:11.5pt"> </span><pclass="MsoNormal"><b><u><span style="font-size:11.5pt">Second Probable ImplementationDetails:</span></u></b><p class="MsoNormal"><span style="font-size:11.5pt"> </span><p class="MsoNormal"><spanstyle="font-size:11.5pt">We assume that LRU�s �<b><i>SharedFreeList</i></b>� circular queue keepsall the least recently used buffers at the head of the queue and most recently used buffers at the tail. If this isthe case then there is absolutely no need to change the AddBufferToFreelist function in freelist.c, let it add the buffersto shared free list as it is presently doing. We just change the <b><i>freelist.GetFreeBuffer</i></b> so that insteadof removing (LRU) buffers from head of the queue for MRU scheme we will remove the (MRU) buffers from its tail. (<i>we are afraid whether GetFreeBuffer is the only function that gives out the buffer from free list. Just in case if thereis some other function too, then, yet we have to trace it out </i>)<span style="mso-spacerun: yes">� </span></span><pclass="MsoNormal"><span style="font-size:11.5pt"> </span><p class="MsoNormal"><b><u><span style="font-size:14.0pt;mso-bidi-font-size:11.5pt; font-family:"Courier New"">PART II:</span></u></b><p class="MsoNormal"><b><u><span style="font-size:14.0pt;mso-bidi-font-size:11.5pt; font-family:"Courier New""> </span></u></b><p class="MsoNormal"><b><u>Testing Objective: </u></b><p class="MsoNormal"><spanstyle="font-size:14.0pt;mso-bidi-font-size:12.0pt"> </span><p class="MsoNormal" style="margin-left:1.0in;text-indent:-.5in;mso-list:l1level1 lfo13; tab-stops:list 1.0in"><span style="font-size:14.0pt; mso-bidi-font-size:12.0pt">(i)<span style="font:7.0pt "Times New Roman""> </span></span>First, We have totest performance of new alogrithm MRU that we intend to implement. Then we have to look whether we need specific sequentialscan code or not.<span style="font-size:14.0pt;mso-bidi-font-size: 12.0pt"></span><p class="MsoNormal" style="margin-left:1.0in;text-indent:-.5in;mso-list:l1 level1 lfo13; tab-stops:list 1.0in"><span style="font-size:14.0pt; mso-bidi-font-size:12.0pt">(ii)<span style="font:7.0pt "Times New Roman""> </span></span>Second, We would liketo compare the performance of MRU with that of LRU-K algorithm and more specifically LRU-2.<span style="font-size:14.0pt;mso-bidi-font-size:12.0pt"></span><pclass="MsoNormal"> <p class="MsoNormal"><b><u>LRU-K a brief introduction:</u></b><pclass="MsoNormal"> <p class="MsoNormal">LRU-K is a new approach to database disk buffering proposedby Elizabeth J. O�Neil, Patrick E. O�Neil and Gerhard Weikum. The basic idea of LRU-K is to keep track of the timesof the last K references to popular database pages, using this information to statistically estimate the interarrivaltimes of references on a page to page basis. The algorithm is self-tuning and adaptive to evolving access patterns.<p class="MsoNormal"> <p class="MsoNormal">A patch for LRU-2 was provided to us by Mr. Bruce Momjian. The algorithmwas implemented and tested by Mr Tom Lane. Both are from PostgreSQL development group. Mr. Tom Lane had tested thealgorithm but his test were not that good and the tests did not contain any sequential scan. So it would be an interestingexercise.<p class="MsoNormal"> <p class="MsoNormal"><b><u>Testing Plan: </u></b><p class="MsoNormal"> <p class="MsoNormal"style="margin-left:1.0in;text-indent:-.5in;mso-list:l3 level1 lfo14; tab-stops:list 1.0in">(i)<span style="font:7.0pt "Times New Roman""> </span>We plan to do a big sequentialscan and at the same time do a small sequential scan<span style="font-family:Helvetica"> </span>What should happenis that the small sequntial scan should keep its pages in the buffer while the large scan is happening<span style="color:black"></span>if it doesn't, we have a problem.<p class="MsoNormal" style="margin-left:1.0in;text-indent:-.5in;mso-list:l3level1 lfo14; tab-stops:list 1.0in">(ii)<span style="font:7.0pt "Times New Roman""> </span>We have populated two tablesnamely simple and simple1 with only one integer field. We have populated these tables with 8192 values. We have tomake one of the table even more bigger so that one will be used for large sequential scan and another will be used forsmall sequential scan.<p class="MsoNormal" style="margin-left:1.0in;text-indent:-.5in;mso-list:l3 level1 lfo14; tab-stops:list 1.0in">(iii)<span style="font:7.0pt "Times New Roman""> </span>Then we plan to put some simplequeries like given below in a file say �samplequeries.txt�<p class="MsoNormal" style="margin-left:1.0in"> <p class="MsoNormal"style="margin-left:1.0in"><span style="color:blue">Select * from simple where x = 98765432;</span><p class="MsoNormal"style="margin-left:1.0in"><span style="color:blue">Select * from simple1 where y = 567890432;</span><p class="MsoNormal"style="margin-left:1.0in"> <p class="MsoNormal" style="margin-left:1.0in">Where lets assume simple is largetable and simple1 is smaller table. The large file is should have to be larger than shared buffer size and smaller fileshould have to be say 10% smaller than shared buffer size. <p class="MsoNormal" style="margin-left:1.0in"> <p class="MsoNormal"style="margin-left:1.0in;text-indent:-.5in;mso-list:l3 level1 lfo14; tab-stops:list 1.0in">(iv)<span style="font:7.0pt "Times New Roman""> </span>Then we have to run <b>psql</b>with <b>�f </b>flag giving the file as parameter. While doing that we will keep track of the time.<p class="MsoNormal"style="margin-left:.5in"> <p class="MsoNormal" style="margin-left:1.0in;text-indent:-.5in;mso-list:l3 level1lfo14; tab-stops:list 1.0in">(v)<span style="font:7.0pt "Times New Roman""> </span>The other way is that we haveset the various statistics flags true like parser statistics, planner statistics, executor statistics etc. All the statisticsgets collected into a logfile. <p class="MsoNormal" style="margin-left:1.0in">Following is one of the sample statisticsgenerated during our sample query test run on default LRU algorithm:<p class="MsoNormal" style="margin-left:1.0in"> <pclass="MsoNormal" style="margin-left:1.0in"><span style="color:blue">2002-02-22 18:40:58 DEBUG:<spanstyle="mso-spacerun: yes">� </span>EXECUTOR STATISTICS</span><p class="MsoNormal" style="margin-left:1.0in"><spanstyle="color:blue">! system usage stats:</span><p class="MsoNormal" style="margin-left:1.0in"><spanstyle="color:blue">!<span style="mso-spacerun: yes">������ </span>0.050108 elapsed 0.000000user 0.000000 system sec</span><p class="MsoNormal" style="margin-left:1.0in"><span style="color:blue">!<span style="mso-spacerun:yes">������ </span>[0.050000 user 0.130000 sys total]</span><p class="MsoNormal" style="margin-left:1.0in"><spanstyle="color:blue">!<span style="mso-spacerun: yes">������ </span>4/0 [50/0] filesystem blocksin/out</span><p class="MsoNormal" style="margin-left:1.0in"><span style="color:blue">!<span style="mso-spacerun: yes">������</span>4/0 [50/0] page faults/reclaims, 0 [0] swaps</span><p class="MsoNormal" style="margin-left:1.0in"><spanstyle="color:blue">!<span style="mso-spacerun: yes">������ </span>0 [0] signals rcvd, 0/0[2/5] messages rcvd/sent</span><p class="MsoNormal" style="margin-left:1.0in"><span style="color:blue">!<span style="mso-spacerun:yes">������ </span>4/0 [149/17] voluntary/involuntary context switches</span><p class="MsoNormal" style="margin-left:1.0in"><spanstyle="color:blue">! postgres usage stats:</span><p class="MsoNormal" style="margin-left:1.0in"><spanstyle="color:blue">!<span style="mso-spacerun: yes">������ </span>Shared blocks:<span style="mso-spacerun:yes">��������� </span>3 read,<span style="mso-spacerun: yes">��������� </span>0 written, buffer hit rate = 25.00%</span><p class="MsoNormal" style="margin-left:1.0in"><span style="color:blue">!<spanstyle="mso-spacerun: yes">������ </span>Local<span style="mso-spacerun: yes">� </span>blocks:<spanstyle="mso-spacerun: yes">��������� </span>0 read,<span style="mso-spacerun: yes">��������� </span>0 written,buffer hit rate = 0.00%</span><p class="MsoNormal" style="margin-left:1.0in"><span style="color:blue">!<span style="mso-spacerun:yes">������ </span>Direct blocks:<span style="mso-spacerun: yes">��������� </span>0 read,<span style="mso-spacerun: yes">��������� </span>0 written</span><p class="MsoNormal" style="margin-left:1.0in"> <p class="MsoNormal" style="margin-left:1.0in;text-indent:-.5in;mso-list:l3level1 lfo14; tab-stops:list 1.0in">(vi)<span style="font:7.0pt "Times New Roman""> </span>Another way is to run the querywith <b>EXPLAIN</b> command in <b>VERBOSE</b> and <b>ANALYZE</b> mode. This prints the total execution time of the queryalong with other matrics like cost of sequential scan startup and total cost etc.<p class="MsoNormal" style="margin-left:.5in"> <pclass="MsoNormal"> <p class="MsoNormal"><b><u>Yardstick of Measurement:</u></b><p class="MsoNormal"style="margin-left:1.0in"> <p class="MsoNormal" style="margin-left:1.5in;text-indent:-.5in;mso-list:l3 level2lfo14; tab-stops:list 1.5in">(i)<span style="font:7.0pt "Times New Roman""> </span>We will run the test querieson MRU and LRU-2 implementation. If the execution time for MRU is less than that of LRU-2 then We have to think ofhaving a specific code for sequential scan.<p class="MsoNormal" style="margin-left:1.0in"> <p class="MsoNormal" style="margin-left:1.5in;text-indent:-.5in;mso-list:l3level2 lfo14; tab-stops:list 1.5in">(ii)<span style="font:7.0pt "Times New Roman""> </span>Otherwise if the execution timeof LRU-2 is less than or close to the MRU execution time than we would suggest replacing the default LRU code of PostgreSQLwith LRU-2 algorithm. Since than there will be only one alogrithm for all the access patterns and no need of �advicing�the buffer manager for replacement policy and no need of having different algorithms for large sequential access.<pclass="MsoNormal"><span style="font-size:14.0pt;mso-bidi-font-size:12.0pt"><span style="mso-tab-count:1">���������</span></span><p class="MsoNormal"> <p class="MsoNormal"><b><u>Conclusion:</u></b><p class="MsoNormal"><b><u> </u></b><pclass="MsoNormal"><span style="mso-tab-count:1">����������� </span>To conclude, fetchingdata from disk requires at least a factor of 1000 more time than fetching data from a RAM buffer. For this reason,good use of the buffer can significantly improve the throughput and response time fo any data-intensive system. Equallyimportant is choosing an efficient buffer replacement algorithm. Thus we aim at improving the performance of PostgreSQLby studying which algorithm among MRU and LRU-2 will prove to be the best bet.<p class="MsoNormal"> <p class="MsoNormal">Wethank Professor Kevin Chang and TAs for introducing us to such a life time opportunity and for encouragingus to enhance PostgreSQL.<p class="MsoNormal"> <p class="MsoNormal">We thank Mr. Bruce Momjian for all his timelyhelp and advices that<span style="mso-spacerun: yes">� </span>he had rendered to us whole heartedly. It is his encouragementthat our efforts are directed twords really contributing to PostgreSQL development.<p> Amit Kumar Khare <p class="MsoNormal"style="margin-left:.5in"><span style="font-size:14.0pt; mso-bidi-font-size:12.0pt"> </span><p class="MsoNormal"><span style="font-size:14.0pt;mso-bidi-font-size:11.5pt; font-family:"Courier New""> </span><p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.5pt"> </span></div><p><br/><hr size="1" /><b>Do You Yahoo!?</b><br /><a href="http://sports.yahoo.com/oly">Yahoo!Sports</a> - Coverage of the 2002 Olympic Games
On Sat, 2002-02-23 at 08:30, Amit Kumar Khare wrote: > Considering thesesuggestions as our baseline we have decided to implement Most Recently Used(MRU) cache replacement policy. Our task will be to identify a large sequentialscan and then advice the buffer manager to apply MRU for cache replacement. This could lead to poor buffer performance in the case of multiple backends simultaneously executing queries. How will you handle this? Neil -- Neil Padgett Red Hat Canada Ltd. E-Mail: npadgett@redhat.com 2323 Yonge Street, Suite #300, Toronto, ON M4P 2C9
Re: Implementation Proposal For Add Free Behind Capability For Large Sequential Scan
From
Amit Kumar Khare
Date:
--- Neil Padgett <npadgett@redhat.com> wrote: > On Sat, 2002-02-23 at 08:30, Amit Kumar Khare wrote: > > > Considering thesesuggestions as our baseline we > have decided to > implement Most Recently Used(MRU) cache replacement > policy. Our task > will be to identify a large sequentialscan and then > advice the buffer > manager to apply MRU for cache replacement. Neil wrote: > This could lead to poor buffer performance in the > case of multiple > backends simultaneously executing queries. How will > you handle this? Certainly it's cause of concern. I am looking for literature regarding it. I kindly request you all to inform me if any body finds some thing related to it. Thanks and Regards Amit Khare > > > -- > Neil Padgett > Red Hat Canada Ltd. E-Mail: > npadgett@redhat.com > 2323 Yonge Street, Suite #300, > Toronto, ON M4P 2C9 > __________________________________________________ Do You Yahoo!? Yahoo! Sports - Coverage of the 2002 Olympic Games http://sports.yahoo.com
Amit Kumar Khare wrote: > --- Neil Padgett <npadgett@redhat.com> wrote: > > On Sat, 2002-02-23 at 08:30, Amit Kumar Khare wrote: > > > > > Considering thesesuggestions as our baseline we > > have decided to > > implement Most Recently Used(MRU) cache replacement > > policy. Our task > > will be to identify a large sequentialscan and then > > advice the buffer > > manager to apply MRU for cache replacement. > > Neil wrote: > > This could lead to poor buffer performance in the > > case of multiple > > backends simultaneously executing queries. How will > > you handle this? > > Certainly it's cause of concern. I am looking for > literature regarding it. I kindly request you all to > inform me if any body finds some thing related to it. My guess is that we should test MRU for large sequenatial scan vs. LRU-K and see how the compare in various tests. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026