Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Date
Msg-id CAA4eK1J4bJKz_Gr4y3E4N10y=G3xe0XWUoAh6fUxkT7yK67dSg@mail.gmail.com
Whole thread Raw
In response to Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]  (Dilip kumar <dilip.kumar@huawei.com>)
Responses Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
List pgsql-hackers
<div dir="ltr">On Mon, Aug 4, 2014 at 11:41 AM, Dilip kumar <<a
href="mailto:dilip.kumar@huawei.com">dilip.kumar@huawei.com</a>>wrote:<br />><br />> On 31 July 2014 10:59,
Amitkapila Wrote,<br />><br />>  <br />><br /> > Thanks for the review and valuable comments.<br />> I
havefixed all the comments and attached the revised patch.<br /><br />I have again looked into your revised patch and
wouldlike<br />to share my findings with you.<br /><br />1. <br />+        Number of parallel connections to perform
theoperation. This option will enable the vacuum<br />+        operation to run on parallel connections, at a time one
tablewill be operated on one connection.<br /><br />a. How about describing w.r.t asynchronous connections<br />instead
ofparallel connections?<br />b. It is better to have line length as lesser than 80.<br />c. As you are using multiple
connectionsto achieve parallelism,<br /> I suggest you add a line in your description indicating user should<br
/>verifymax_connections parameters.  Something similar to pg_dump:<br /><br />"pg_dump will open njobs + 1 connections
tothe database, so make<br />  sure your max_connections setting is high enough to accommodate<br />   all
connections."<br/> <br /><br />2. <br />+        So at one time as many tables will be vacuumed parallely as number of
jobs.<br/><br />can you briefly mention about the case when number of jobs<br />is more than number of tables?<br /><br
/>3.<br/>+ /* When user is giving the table list, and list is smaller then<br />+ * number of tables<br />+ */<br />+
if(tbl_count && (parallel > tbl_count))<br />+ parallel = tbl_count;<br />+ <br /><br />Again here multiline
commentsare wrong.<br /><br />Some other instances are as below:<br />a.<br />/* This will give the free connection
slot,if no slot is free it will<br />* wait for atleast one slot to get free.<br />*/<br />b.<br />/* if table list is
notprovided then we need to do vaccum for whole DB<br /> * get the list of all tables and prpare the list<br />*/<br
/>c.<br/>/* Some of the slot are free, Process the results for slots whichever<br />*  are free<br />*/<br /><br /><br
/>4.<br/>src/bin/scripts/vacuumdb.c:51: indent with spaces.<br /> +                     bool analyze_only, bool freeze,
PQExpBuffersql);<br />src/bin/scripts/vacuumdb.c:116: indent with spaces.<br />+    int parallel = 0;<br
/>src/bin/scripts/vacuumdb.c:198:indent with spaces.<br />+    optind++;<br /> src/bin/scripts/vacuumdb.c:299: space
beforetab in indent.<br />+                       vacuum_one_database(dbname, full, verbose, and_analyze,<br /><br
/>Thereare lot of redundant whitespaces, check with<br />git diff --check and fix them.<br /><br /><br />5.<br />res =
executeQuery(conn,<br/>            "select relname, nspname from pg_class c, pg_namespace ns"<br />            " where
(relkind= \'r\' or relkind = \'m\')"<br />            " and c.relnamespace = ns.oid order by relpages desc",<br />    
       progname, echo);<br /><br />a. Here you need to use SQL keywords in capital letters, refer one<br />of the other
callerof executeQuery() in vacuumdb.c<br />b. Why do you need this condition c.relnamespace = ns.oid in above<br />
query?<br/>I think to get the list of required objects from pg_class, you don't<br />need to have a join with
pg_namespace.<br/><br />6.<br />vacuum_parallel()<br />{<br />..<br />if (!tables || !tables->head)<br />{<br
/>..<br/>tbl_count++;<br /> }<br />..<br />}<br /><br />a. Here why you need a separate variable (tbl_count) to verify
number<br/>asynchronous/parallel connections you want, why can't we use ntuple?<br />b. there is a warning in code (I
havecompiled it on windows) as well<br /> related to this variable.<br />vacuumdb.c(695): warning C4700: uninitialized
localvariable 'tbl_count' used<br /><br /><br />7.<br />Fix for one of my previous comment is as below:<br
/>GetQueryResult()<br/>{<br />..<br />if (!r && !completedb)<br /> ..<br />}<br /><br />Here I think some
genericerrors like connection broken or others<br />will also get ignored.  Is it possible that we can ignore
particular<br/>error which we want to ignore without complicating the code?<br /><br /> Also in anycase add comments to
explainwhy you are ignoring<br />error for particular case.<br /><br /><br />8.<br />+ fprintf(stderr, _("%s: Number of
parallel\"jobs\" should be at least 1\n"),<br />+ progname);<br /> formatting of 2nd line progname is not as per
standard(you can refer other fprintf in the same file).<br /><br />9. +     int parallel = 0;<br />I think it is better
toname it as numAsyncCons or something similar.<br /><br />10. It is better if you can add function header for newly
added<br/> functions. <br />  <br />><br />> Test2: (as per your scenario, where actual vacuum time is very
less)<br/>><br />>             Vacuum done for complete DB<br />><br />>             8 tables all with
10000records and few dead tuples<br /><br />I think this test is missing in attached file.  Few means?<br />Can you try
with0.1%, 1% of dead tuples in table and try to<br />take time in milliseconds if it is taking less time to complete<br
/>thetest.<br /><br /><br />PS - <br /> It is better if you mention against each review comment/suggestion<br />what
youhave done, because in some cases it will help reviewer to<br />understand your fix easily and as author you will
alsobe sure that<br />all got fixed.<br /><br />With Regards,<br />Amit Kapila.<br />EnterpriseDB: <a
href="http://www.enterprisedb.com">http://www.enterprisedb.com</a></div>

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Support for N synchronous standby servers
Next
From: Michael Paquier
Date:
Subject: Re: Support for N synchronous standby servers