Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] |
Date | |
Msg-id | CAA4eK1J4bJKz_Gr4y3E4N10y=G3xe0XWUoAh6fUxkT7yK67dSg@mail.gmail.com Whole thread Raw |
In response to | Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] (Dilip kumar <dilip.kumar@huawei.com>) |
Responses |
Re: TODO : Allow parallel cores to be used by vacuumdb [
WIP ]
Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] |
List | pgsql-hackers |
<div dir="ltr">On Mon, Aug 4, 2014 at 11:41 AM, Dilip kumar <<a href="mailto:dilip.kumar@huawei.com">dilip.kumar@huawei.com</a>>wrote:<br />><br />> On 31 July 2014 10:59, Amitkapila Wrote,<br />><br />> <br />><br /> > Thanks for the review and valuable comments.<br />> I havefixed all the comments and attached the revised patch.<br /><br />I have again looked into your revised patch and wouldlike<br />to share my findings with you.<br /><br />1. <br />+ Number of parallel connections to perform theoperation. This option will enable the vacuum<br />+ operation to run on parallel connections, at a time one tablewill be operated on one connection.<br /><br />a. How about describing w.r.t asynchronous connections<br />instead ofparallel connections?<br />b. It is better to have line length as lesser than 80.<br />c. As you are using multiple connectionsto achieve parallelism,<br /> I suggest you add a line in your description indicating user should<br />verifymax_connections parameters. Something similar to pg_dump:<br /><br />"pg_dump will open njobs + 1 connections tothe database, so make<br /> sure your max_connections setting is high enough to accommodate<br /> all connections."<br/> <br /><br />2. <br />+ So at one time as many tables will be vacuumed parallely as number of jobs.<br/><br />can you briefly mention about the case when number of jobs<br />is more than number of tables?<br /><br />3.<br/>+ /* When user is giving the table list, and list is smaller then<br />+ * number of tables<br />+ */<br />+ if(tbl_count && (parallel > tbl_count))<br />+ parallel = tbl_count;<br />+ <br /><br />Again here multiline commentsare wrong.<br /><br />Some other instances are as below:<br />a.<br />/* This will give the free connection slot,if no slot is free it will<br />* wait for atleast one slot to get free.<br />*/<br />b.<br />/* if table list is notprovided then we need to do vaccum for whole DB<br /> * get the list of all tables and prpare the list<br />*/<br />c.<br/>/* Some of the slot are free, Process the results for slots whichever<br />* are free<br />*/<br /><br /><br />4.<br/>src/bin/scripts/vacuumdb.c:51: indent with spaces.<br /> + bool analyze_only, bool freeze, PQExpBuffersql);<br />src/bin/scripts/vacuumdb.c:116: indent with spaces.<br />+ int parallel = 0;<br />src/bin/scripts/vacuumdb.c:198:indent with spaces.<br />+ optind++;<br /> src/bin/scripts/vacuumdb.c:299: space beforetab in indent.<br />+ vacuum_one_database(dbname, full, verbose, and_analyze,<br /><br />Thereare lot of redundant whitespaces, check with<br />git diff --check and fix them.<br /><br /><br />5.<br />res = executeQuery(conn,<br/> "select relname, nspname from pg_class c, pg_namespace ns"<br /> " where (relkind= \'r\' or relkind = \'m\')"<br /> " and c.relnamespace = ns.oid order by relpages desc",<br /> progname, echo);<br /><br />a. Here you need to use SQL keywords in capital letters, refer one<br />of the other callerof executeQuery() in vacuumdb.c<br />b. Why do you need this condition c.relnamespace = ns.oid in above<br /> query?<br/>I think to get the list of required objects from pg_class, you don't<br />need to have a join with pg_namespace.<br/><br />6.<br />vacuum_parallel()<br />{<br />..<br />if (!tables || !tables->head)<br />{<br />..<br/>tbl_count++;<br /> }<br />..<br />}<br /><br />a. Here why you need a separate variable (tbl_count) to verify number<br/>asynchronous/parallel connections you want, why can't we use ntuple?<br />b. there is a warning in code (I havecompiled it on windows) as well<br /> related to this variable.<br />vacuumdb.c(695): warning C4700: uninitialized localvariable 'tbl_count' used<br /><br /><br />7.<br />Fix for one of my previous comment is as below:<br />GetQueryResult()<br/>{<br />..<br />if (!r && !completedb)<br /> ..<br />}<br /><br />Here I think some genericerrors like connection broken or others<br />will also get ignored. Is it possible that we can ignore particular<br/>error which we want to ignore without complicating the code?<br /><br /> Also in anycase add comments to explainwhy you are ignoring<br />error for particular case.<br /><br /><br />8.<br />+ fprintf(stderr, _("%s: Number of parallel\"jobs\" should be at least 1\n"),<br />+ progname);<br /> formatting of 2nd line progname is not as per standard(you can refer other fprintf in the same file).<br /><br />9. + int parallel = 0;<br />I think it is better toname it as numAsyncCons or something similar.<br /><br />10. It is better if you can add function header for newly added<br/> functions. <br /> <br />><br />> Test2: (as per your scenario, where actual vacuum time is very less)<br/>><br />> Vacuum done for complete DB<br />><br />> 8 tables all with 10000records and few dead tuples<br /><br />I think this test is missing in attached file. Few means?<br />Can you try with0.1%, 1% of dead tuples in table and try to<br />take time in milliseconds if it is taking less time to complete<br />thetest.<br /><br /><br />PS - <br /> It is better if you mention against each review comment/suggestion<br />what youhave done, because in some cases it will help reviewer to<br />understand your fix easily and as author you will alsobe sure that<br />all got fixed.<br /><br />With Regards,<br />Amit Kapila.<br />EnterpriseDB: <a href="http://www.enterprisedb.com">http://www.enterprisedb.com</a></div>
pgsql-hackers by date: