Thread: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
This patch implementing the following TODO item
Allow parallel cores to be used by vacuumdb
http://www.postgresql.org/message-id/4F10A728.7090403@agliodbs.com
Like Parallel pg_dump, vacuumdb is provided with the option to run the vacuum of multiple tables in parallel. [ vacuumdb –j ]
1. One new option is provided with vacuumdb to give the number of workers.
2. All worker will be started in beginning and all will be waiting for the vacuum instruction from the master.
3. Now, if table list is provided in vacuumdb command using –t then, it will send the vacuum of one table to one of the IDLE worker, next table to next IDLE worker and so on.
4. If vacuum is given for one DB then, it will execute select on pg_class to get the table list and fetch the table name one by one and also assign the vacuum responsibility to IDLE workers.
Performance Data by parallel vacuumdb:
Machine Configuration:
Core : 8
RAM: 24GB
Test Scenario:
16 tables all with 4M records. [many records are deleted and inserted using some pattern, (files is attached in the mail)]
Test Result
{Base Code} Time(s) %CPU Usage Avg Read(kB/s) Avg Write(kB/s)
521 3% 12000 20000
{With Parallel Vacuum Patch}
worker Time(s) %CPU Usage Avg Read(kB/s) Avg Write(kB/s)
1 518 3% 12000 20000 --> this will take the same path as base code
2 390 5% 14000 30000
8 235 7% 18000 40000
16 197 8% 20000 50000
Conclusion:
By running the vacuumdb in parallel, CPU and I/O throughput is increasing and it can give >50% performance improvement.
Work to be Done:
1. Documentations of the new command.
2. Parallel support for vacuum all db.
Is it required to move the common code for parallel operation of pg_dump and vacuumdb to one place and reuse it ?
Prototype patch is attached in the mail, please provide your feedback/Suggestions…
Thanks & Regards,
Dilip Kumar
Attachment
On 07-11-2013 09:42, Dilip kumar wrote: Dilip, this is on my TODO for 9.4. I've already had a half-backed patch for it. Let's see what I can come up with. > Is it required to move the common code for parallel operation of pg_dump and vacuumdb to one place and reuse it ? > I'm not sure about that because the pg_dump parallel code is tight to TOC entry. Also, dependency matters for pg_dump while in the scripts case, an order to be choosen will be used. However, vacuumdb can share the parallel code with clusterdb and reindexdb (my patch does it). Of course, a refactor to unify parallel code (pg_dump and scripts) can be done in a separate patch. > Prototype patch is attached in the mail, please provide your feedback/Suggestions... > I'll try to merge your patch with the one I have here until the next CF. -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte24x7 e Treinamento
Am 07.11.2013 12:42, schrieb Dilip kumar:
This patch implementing the following TODO item
Allow parallel cores to be used by vacuumdb
http://www.postgresql.org/message-id/4F10A728.7090403@agliodbs.com
Like Parallel pg_dump, vacuumdb is provided with the option to run the vacuum of multiple tables in parallel. [ vacuumdb –j ]
1. One new option is provided with vacuumdb to give the number of workers.
2. All worker will be started in beginning and all will be waiting for the vacuum instruction from the master.
3. Now, if table list is provided in vacuumdb command using –t then, it will send the vacuum of one table to one of the IDLE worker, next table to next IDLE worker and so on.
4. If vacuum is given for one DB then, it will execute select on pg_class to get the table list and fetch the table name one by one and also assign the vacuum responsibility to IDLE workers.
[...]
For this use case, would it make sense to queue work (tables) in order of their size, starting on the largest one?
For the case where you have tables of varying size this would lead to a reduced overall processing time as it prevents large (read: long processing time) tables to be processed in the last step. While processing large tables at first and filling up "processing slots/jobs" when they get free with smaller tables one after the other would safe overall execution time.
Regards
Jan
--
professional: http://www.oscar-consult.de
On 08 November 2013 03:22, Euler Taveira Wrote > On 07-11-2013 09:42, Dilip kumar wrote: > > Dilip, this is on my TODO for 9.4. I've already had a half-backed patch > for it. Let's see what I can come up with. Ok, Let me know if I can contribute to this.. > > Is it required to move the common code for parallel operation of > pg_dump and vacuumdb to one place and reuse it ? > > > I'm not sure about that because the pg_dump parallel code is tight to > TOC entry. Also, dependency matters for pg_dump while in the scripts > case, an order to be choosen will be used. However, vacuumdb can share > the parallel code with clusterdb and reindexdb (my patch does it). +1 > > Of course, a refactor to unify parallel code (pg_dump and scripts) can > be done in a separate patch. > > > Prototype patch is attached in the mail, please provide your > feedback/Suggestions... > > > I'll try to merge your patch with the one I have here until the next CF. Regards, Dilip
On 08 November 2013 13:38, Jan Lentfer
> For this use case, would it make sense to queue work (tables) in order of their size, starting on the largest one?
> For the case where you have tables of varying size this would lead to a reduced overall processing time as it prevents large (read: long processing time) tables to be processed in the last step. While processing large tables at first and filling up "processing slots/jobs" when they get free with smaller tables one after the other would safe overall execution time.
Good point, I have made the change and attached the modified patch.
Regards,
Dilip
Attachment
On 08-11-2013 05:07, Jan Lentfer wrote: > For the case where you have tables of varying size this would lead to > a reduced overall processing time as it prevents large (read: long > processing time) tables to be processed in the last step. While > processing large tables at first and filling up "processing > slots/jobs" when they get free with smaller tables one after the > other would safe overall execution time. > That is certainly a good strategy (not the optimal [1] -- that is hard to achieve). Also, the strategy must: (i) consider the relation age before size (for vacuum); (ii) consider that you can't pick indexes for the same relation (for reindex). [1] http://www.postgresql.org/message-id/CA+TgmobwxqsagXKtyQ1S8+gMpqxF_MLXv=4350tFZVqAwKEqgQ@mail.gmail.com -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte24x7 e Treinamento
On Thu, Nov 7, 2013 at 8:42 PM, Dilip kumar <dilip.kumar@huawei.com> wrote: > This patch implementing the following TODO item > > Allow parallel cores to be used by vacuumdb > http://www.postgresql.org/message-id/4F10A728.7090403@agliodbs.com > > > > Like Parallel pg_dump, vacuumdb is provided with the option to run the > vacuum of multiple tables in parallel. [ vacuumdb –j ] > > > > 1. One new option is provided with vacuumdb to give the number of > workers. > > 2. All worker will be started in beginning and all will be waiting for > the vacuum instruction from the master. > > 3. Now, if table list is provided in vacuumdb command using –t then, > it will send the vacuum of one table to one of the IDLE worker, next table > to next IDLE worker and so on. > > 4. If vacuum is given for one DB then, it will execute select on > pg_class to get the table list and fetch the table name one by one and also > assign the vacuum responsibility to IDLE workers. > > > > Performance Data by parallel vacuumdb: > > Machine Configuration: > > Core : 8 > > RAM: 24GB > > Test Scenario: > > 16 tables all with 4M records. [many records > are deleted and inserted using some pattern, (files is attached in the > mail)] > > > > Test Result > > > > {Base Code} Time(s) %CPU Usage Avg Read(kB/s) Avg Write(kB/s) > > 521 3% 12000 > 20000 > > > > > > {With Parallel Vacuum Patch} > > worker Time(s) %CPU Usage Avg Read(kB/s) Avg > Write(kB/s) > > 1 518 3% 12000 > 20000 --> this will take the same path as base code > > 2 390 5% 14000 > 30000 > > 8 235 7% 18000 > 40000 > > 16 197 8% 20000 > 50000 > > > > Conclusion: > > By running the vacuumdb in parallel, CPU and I/O throughput > is increasing and it can give >50% performance improvement. > > > > Work to be Done: > > 1. Documentations of the new command. > > 2. Parallel support for vacuum all db. > > > > Is it required to move the common code for parallel operation of pg_dump and > vacuumdb to one place and reuse it ? > > > > Prototype patch is attached in the mail, please provide your > feedback/Suggestions… > > > > Thanks & Regards, > > Dilip Kumar > > > > > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Michael
On Thu, Nov 7, 2013 at 8:42 PM, Dilip kumar <dilip.kumar@huawei.com> wrote: > This patch implementing the following TODO item > > Allow parallel cores to be used by vacuumdb > http://www.postgresql.org/message-id/4F10A728.7090403@agliodbs.com Cool. Could you add this patch to the next commit fest for 9.4? It begins officially in a couple of days. Here is the URL to it: https://commitfest.postgresql.org/action/commitfest_view?id=20 Regards, -- Michael
On 08-11-2013 06:20, Dilip kumar wrote: > On 08 November 2013 13:38, Jan Lentfer > > >> For this use case, would it make sense to queue work (tables) in order of their size, starting on the largest one? > >> For the case where you have tables of varying size this would lead to a reduced overall processing time as it preventslarge (read: long processing time) tables to be processed in the last step. While processing large tables at firstand filling up "processing slots/jobs" when they get free with smaller tables one after the other would safe overallexecution time. > Good point, I have made the change and attached the modified patch. > Don't you submit it for a CF, do you? Is it too late for this CF? -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte24x7 e Treinamento
Euler Taveira wrote: > On 08-11-2013 06:20, Dilip kumar wrote: > > On 08 November 2013 13:38, Jan Lentfer > > > > > >> For this use case, would it make sense to queue work (tables) in order of their size, starting on the largest one? > > > >> For the case where you have tables of varying size this would lead to a reduced overall processing time as it preventslarge (read: long processing time) tables to be processed in the last step. While processing large tables at firstand filling up "processing slots/jobs" when they get free with smaller tables one after the other would safe overallexecution time. > > Good point, I have made the change and attached the modified patch. > > > Don't you submit it for a CF, do you? Is it too late for this CF? Not too late. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 16 January 2014 19:53, Euler Taveira Wrote, > > > >> For the case where you have tables of varying size this would lead > to a reduced overall processing time as it prevents large (read: long > processing time) tables to be processed in the last step. While > processing large tables at first and filling up "processing slots/jobs" > when they get free with smaller tables one after the other would safe > overall execution time. > > Good point, I have made the change and attached the modified patch. > > > Don't you submit it for a CF, do you? Is it too late for this CF? Attached the latest updated patch 1. Rebased the patch to current GIT head. 2. Doc is updated. 3. Supported parallel execution for all db option also. Same I will add to current open commitfest.. Regards, Dilip
Attachment
On Fri, Mar 21, 2014 at 12:48 AM, Dilip kumar <dilip.kumar@huawei.com> wrote: > On 16 January 2014 19:53, Euler Taveira Wrote, > >> > >> >> For the case where you have tables of varying size this would lead >> to a reduced overall processing time as it prevents large (read: long >> processing time) tables to be processed in the last step. While >> processing large tables at first and filling up "processing slots/jobs" >> when they get free with smaller tables one after the other would safe >> overall execution time. >> > Good point, I have made the change and attached the modified patch. >> > >> Don't you submit it for a CF, do you? Is it too late for this CF? > > Attached the latest updated patch > 1. Rebased the patch to current GIT head. > 2. Doc is updated. > 3. Supported parallel execution for all db option also. This patch needs to be rebased after the analyze-in-stages patch, c92c3d50d7fbe7391b5fc864b44434. Although that patch still needs to some work itself, despite being committed, as still loops over the stages for each db, rather than the dbs for each stage. So I don't know if this patch is really reviewable at this point, as it is not clear how those things are going to interact with each other. Cheers, Jeff
On 24 June 2014 03:31, Jeff Wrote, > > Attached the latest updated patch > > 1. Rebased the patch to current GIT head. > > 2. Doc is updated. > > 3. Supported parallel execution for all db option also. > > This patch needs to be rebased after the analyze-in-stages patch, > c92c3d50d7fbe7391b5fc864b44434. Thank you for giving your attention to this, I will rebase this.. > Although that patch still needs to some work itself, despite being > committed, as still loops over the stages for each db, rather than the > dbs for each stage. If I understood your comment properly, Here you mean to say that In vacuum_all_databases instead to running all DB's in parallel, we are running db by db in parallel? I think we can fix this.. > > So I don't know if this patch is really reviewable at this point, as it > is not clear how those things are going to interact with each other. Exactly what points you want to mention here ? Regards, Dilip Kumar
On 24 June 2014 03:31, Jeff Wrote,
> > Attached the latest updated patch
> > 1. Rebased the patch to current GIT head.
> > 2. Doc is updated.
> > 3. Supported parallel execution for all db option also.
>
> This patch needs to be rebased after the analyze-in-stages patch,
> c92c3d50d7fbe7391b5fc864b44434.
Thank you for giving your attention to this, I will rebase this..
> Although that patch still needs to some work itself, despite being
> committed, as still loops over the stages for each db, rather than the
> dbs for each stage.
If I understood your comment properly, Here you mean to say that
In vacuum_all_databases instead to running all DB's in parallel, we are running db by db in parallel?
<div class="WordSection1"><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">On</span><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"></span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">24June 2014 11:02 Jeff Wrote,</span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span><p class="MsoNormal">>I meanthat the other commit, the one conflicting with your patch, is still not finished. It probably would not have been committedif we realized the problem at the time. That other patch runs analyze in stages at <p class="MsoNormal">> differentsettings of default_statistics_target, but it has the loops in the wrong order, so it analyzes one database in allthree stages, then moves to the next database. I think that these two changes are going to<p class="MsoNormal">> interactwith each other. But I can't predict right now what that interaction will look like. So it is hard for me to evaluateyour patch, until the other one is resolved.<p class="MsoNormal"> <p class="MsoNormal">>Normally I would evaluateyour patch in isolation, but since the conflicting patch is already committed (and is in the 9.4 branch) that wouldprobably not be very useful in this case.<p class="MsoNormal"> <p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Ohk, Got your point, I will also try to think how these two patchcan interact together.. </span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Regards,</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Dilip</span></div>
$ patch -d. -p1 < ../patch/vacuumdb_parallel_v3.patch
patching file doc/src/sgml/ref/vacuumdb.sgml
Hunk #1 succeeded at 224 (offset 20 lines).
patching file src/bin/scripts/Makefile
Hunk #2 succeeded at 65 with fuzz 2 (offset -1 lines).
patching file src/bin/scripts/vac_parallel.c
patching file src/bin/scripts/vac_parallel.h
patching file src/bin/scripts/vacuumdb.c
Hunk #3 succeeded at 61 with fuzz 2.
Hunk #4 succeeded at 87 (offset 2 lines).
Hunk #5 succeeded at 143 (offset 2 lines).
Hunk #6 succeeded at 158 (offset 5 lines).
Hunk #7 succeeded at 214 with fuzz 2 (offset 5 lines).
Hunk #8 FAILED at 223.
Hunk #9 succeeded at 374 with fuzz 1 (offset 35 lines).
Hunk #10 FAILED at 360.
Hunk #11 FAILED at 387.
3 out of 11 hunks FAILED -- saving rejects to file src/bin/scripts/vacuumdb.c.rej
On Friday, March 21, 2014, Dilip kumar <dilip.kumar@huawei.com> wrote:
On 16 January 2014 19:53, Euler Taveira Wrote,
> >
> >> For the case where you have tables of varying size this would lead
> to a reduced overall processing time as it prevents large (read: long
> processing time) tables to be processed in the last step. While
> processing large tables at first and filling up "processing slots/jobs"
> when they get free with smaller tables one after the other would safe
> overall execution time.
> > Good point, I have made the change and attached the modified patch.
> >
> Don't you submit it for a CF, do you? Is it too late for this CF?
Attached the latest updated patch
1. Rebased the patch to current GIT head.
2. Doc is updated.
3. Supported parallel execution for all db option also.
Same I will add to current open commitfest..
Regards,
Dilip
--
Regards,
-------
Sawada Masahiko
On 25 June 2014 23:37 Sawada Masahiko Wrote
>I got following FAILED when I patched v3 to HEAD.
>$ patch -d. -p1 < ../patch/vacuumdb_parallel_v3.patch
>patching file doc/src/sgml/ref/vacuumdb.sgml
>Hunk #1 succeeded at 224 (offset 20 lines).
>patching file src/bin/scripts/Makefile
>Hunk #2 succeeded at 65 with fuzz 2 (offset -1 lines).
>patching file src/bin/scripts/vac_parallel.c
>patching file src/bin/scripts/vac_parallel.h
>patching file src/bin/scripts/vacuumdb.c
>Hunk #3 succeeded at 61 with fuzz 2.
>Hunk #4 succeeded at 87 (offset 2 lines).
>Hunk #5 succeeded at 143 (offset 2 lines).
>Hunk #6 succeeded at 158 (offset 5 lines).
>Hunk #7 succeeded at 214 with fuzz 2 (offset 5 lines).
>Hunk #8 FAILED at 223.
>Hunk #9 succeeded at 374 with fuzz 1 (offset 35 lines).
>Hunk #10 FAILED at 360.
>Hunk #11 FAILED at 387.
>3 out of 11 hunks FAILED -- saving rejects to file src/bin/scripts/vacuumdb.c.rej
Thank you for giving your time, Please review the updated patch attached in the mail.
1. Rebased the patch
2. Implemented parallel execution for new option --analyze-in-stages
Regards,
Dilip Kumar
Attachment
On Thu, Jun 26, 2014 at 2:35 AM, Dilip kumar <dilip.kumar@huawei.com> wrote: > > Thank you for giving your time, Please review the updated patch attached in > the mail. > > > > 1. Rebased the patch > > 2. Implemented parallel execution for new option --analyze-in-stages Hi Dilip, Thanks for rebasing. I haven't done an architectural or code review on it, I just applied it and used it a little on Linux. Based on that, I find most importantly that it doesn't seem to correctly vacuum tables which have upper case letters in the name, because it does not quote the table names when they need quotes. Of course that needs to be fixed, but taking it as it is, the resulting error message to the console is just: : Execute failed Which is not very informative. I get the same error if I do a "pg_ctl shutdown -mi" while running the parallel vacuumdb. Without the -j option it produces a more descriptive error message "FATAL: terminating connection due to administrator command", so something about the new feature suppresses the informative error messages. I get some compiler warnings with the new patch: vac_parallel.c: In function 'parallel_msg_master': vac_parallel.c:147: warning: function might be possible candidate for 'gnu_printf' format attribute vac_parallel.c:147: warning: function might be possible candidate for 'gnu_printf' format attribute vac_parallel.c: In function 'exit_horribly': vac_parallel.c:1071: warning: 'noreturn' function does return In the usage message, the string has a tab embedded within it (immediately before "use") that should be converted to literal spaces, otherwise the output of --help gets misaligned: printf(_(" -j, --jobs=NUM use this many parallel jobs to vacuum\n")); Thanks for the work on this. Cheers, Jeff
On 27 June 2014 02:57, Jeff Wrote, > Based on that, I find most importantly that it doesn't seem to > correctly vacuum tables which have upper case letters in the name, > because it does not quote the table names when they need quotes. Thanks for your comments.... There are two problem First -> When doing the vacuum of complete database that time if any table with upper case letter, it was giving error --FIXED by adding quotes for table name Second -> When user pass the table using -t option, and if it has uppercase letter --This is the existing problem (without parallel implementation), One solution to this is, always add Quote to the relation name passed by user, but this can break existing applications forsome users.. > Of course that needs to be fixed, but taking it as it is, the resulting > error message to the console is just: FIXED > > Which is not very informative. I get the same error if I do a "pg_ctl > shutdown -mi" while running the parallel vacuumdb. Without the -j > option it produces a more descriptive error message "FATAL: > terminating connection due to administrator command", so something > about the new feature suppresses the informative error messages. > > I get some compiler warnings with the new patch: > > vac_parallel.c: In function 'parallel_msg_master': > vac_parallel.c:147: warning: function might be possible candidate for > 'gnu_printf' format attribute > vac_parallel.c:147: warning: function might be possible candidate for > 'gnu_printf' format attribute > vac_parallel.c: In function 'exit_horribly': > vac_parallel.c:1071: warning: 'noreturn' function does return FIXED > In the usage message, the string has a tab embedded within it > (immediately before "use") that should be converted to literal spaces, > otherwise the output of --help gets misaligned: > > printf(_(" -j, --jobs=NUM use this many parallel > jobs to vacuum\n")); > FIXED Updated patch is attached in the mail.. Thanks & Regards, Dilip Kumar
Attachment
On Fri, Jun 27, 2014 at 4:10 AM, Dilip kumar <dilip.kumar@huawei.com> wrote: ... > > Updated patch is attached in the mail.. Thanks Dilip. I get a compiler warning when building on Windows. When I started looking into that, I see that two files have too much code duplication between them: src/bin/scripts/vac_parallel.c (new file) src/bin/pg_dump/parallel.c (existing file) In particular, pgpipe is almost an exact duplicate between them, except the copy in vac_parallel.c has fallen behind changes made to parallel.c. (Those changes would have fixed the Windows warnings). I think that this function (and perhaps other parts as well--"exit_horribly" for example) need to refactored into a common file that both files can include. I don't know where the best place for that would be, though. (I haven't done this type of refactoring myself.) Also, there are several places in the patch which use spaces for indentation where tabs are called for by the coding style. It looks like you may have copied the code from one terminal window and copied it into another one, converting tabs to spaces in the process. This makes it hard to evaluate the amount of code duplication. In some places the code spins in a tight loop while waiting for a worker process to become free. If I strace the process, I got a long list of selects with 0 time outs: select(13, [6 8 10 12], NULL, NULL, {0, 0}) = 0 (Timeout) I have not tried to track down the code that causes it. I did notice that vacuumdb spends an awful lot of time at the top of the Linux "top" output, and this is probably why. Cheers, Jeff
Jeff Janes wrote: > In particular, pgpipe is almost an exact duplicate between them, > except the copy in vac_parallel.c has fallen behind changes made to > parallel.c. (Those changes would have fixed the Windows warnings). I > think that this function (and perhaps other parts as > well--"exit_horribly" for example) need to refactored into a common > file that both files can include. I don't know where the best place > for that would be, though. (I haven't done this type of refactoring > myself.) I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos. Maybe we should move pgpipe back to src/port and have pg_dump and this new thing use that. I'm not sure about the rest of duplication in vac_parallel.c; there might be a lot in common with what pg_dump/parallel.c does too. Having two copies of code is frowned upon for good reasons. This patch introduces 1200 lines of new code in vac_parallel.c, ugh. If we really require 1200 lines to get parallel vacuum working for vacuumdb, I would question the wisdom of this effort. To me, it seems better spent improving autovacuum to cover whatever it is that this patch is supposed to be good for --- or maybe just enable having a shell script that launches multiple vacuumdb instances in parallel ... -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 01 July 2014 03:31, Jeff Janes Wrote, > > I get a compiler warning when building on Windows. When I started > looking into that, I see that two files have too much code duplication > between them: Thanks for Reviewing, > > src/bin/scripts/vac_parallel.c (new file) > src/bin/pg_dump/parallel.c (existing file) > > In particular, pgpipe is almost an exact duplicate between them, except > the copy in vac_parallel.c has fallen behind changes made to parallel.c. > (Those changes would have fixed the Windows warnings). I think that > this function (and perhaps other parts as well--"exit_horribly" for > example) need to refactored into a common file that both files can > include. I don't know where the best place for that would be, though. > (I haven't done this type of refactoring > myself.) When I started doing this patch, I thought of sharing the common code b/w vacuumdb and pg_dump, But if we notice Pg_dump code is tightly coupled with ArchiveHandle, almost all function take this parameter as input or they operate on this,and other functions uses some structure like ParallelState or ParallelSlot which has ArchiveHandle member. I think making this code common mayneed to change complete code of Parallel pg_dump. However there are some function which are independent of Archive Handle and can directly move to common code, As you mention pg_pipe, piperead, readMessageFromPipe, select_loop. For moving them to common place we need to decide where the common file to be placed. Thoughts ? > > Also, there are several places in the patch which use spaces for > indentation where tabs are called for by the coding style. It looks > like you may have copied the code from one terminal window and copied > it into another one, converting tabs to spaces in the process. This > makes it hard to evaluate the amount of code duplication. > > In some places the code spins in a tight loop while waiting for a > worker process to become free. If I strace the process, I got a long > list of selects with 0 time outs: > > select(13, [6 8 10 12], NULL, NULL, {0, 0}) = 0 (Timeout) > > I have not tried to track down the code that causes it. I did notice > that vacuumdb spends an awful lot of time at the top of the Linux "top" > output, and this is probably why. I will look into these and fix.. Thanks & Regards, Dilip Kumar
On 01 July 2014 03:48, Alvaro Wrote, > > In particular, pgpipe is almost an exact duplicate between them, > > except the copy in vac_parallel.c has fallen behind changes made to > > parallel.c. (Those changes would have fixed the Windows warnings). > I > > think that this function (and perhaps other parts as > > well--"exit_horribly" for example) need to refactored into a common > > file that both files can include. I don't know where the best place > > for that would be, though. (I haven't done this type of refactoring > > myself.) > > I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos. > Maybe we should move pgpipe back to src/port and have pg_dump and this > new thing use that. I'm not sure about the rest of duplication in > vac_parallel.c; there might be a lot in common with what > pg_dump/parallel.c does too. Having two copies of code is frowned upon > for good reasons. This patch introduces 1200 lines of new code in > vac_parallel.c, ugh. > > If we really require 1200 lines to get parallel vacuum working for > vacuumdb, I would question the wisdom of this effort. To me, it seems > better spent improving autovacuum to cover whatever it is that this > patch is supposed to be good for --- or maybe just enable having a > shell script that launches multiple vacuumdb instances in parallel ... Thanks for looking into the patch, I think if we use shell script for launching parallel vacuumdb, we cannot get complete control of dividing the task, If we directly divide table b/w multiple process, it may happen some process get very big tables then it will be as goodas one process is doing operation. In this patch at a time we assign only one table to each process and whichever process finishes fast, we assign new table,this way all process get equal sharing of the task. Thanks & Regards, Dilip Kumar
On Tue, Jul 1, 2014 at 1:25 PM, Dilip kumar <dilip.kumar@huawei.com> wrote: > On 01 July 2014 03:48, Alvaro Wrote, > >> > In particular, pgpipe is almost an exact duplicate between them, >> > except the copy in vac_parallel.c has fallen behind changes made to >> > parallel.c. (Those changes would have fixed the Windows warnings). >> I >> > think that this function (and perhaps other parts as >> > well--"exit_horribly" for example) need to refactored into a common >> > file that both files can include. I don't know where the best place >> > for that would be, though. (I haven't done this type of refactoring >> > myself.) >> >> I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos. >> Maybe we should move pgpipe back to src/port and have pg_dump and this >> new thing use that. I'm not sure about the rest of duplication in >> vac_parallel.c; there might be a lot in common with what >> pg_dump/parallel.c does too. Having two copies of code is frowned upon >> for good reasons. This patch introduces 1200 lines of new code in >> vac_parallel.c, ugh. > >> >> If we really require 1200 lines to get parallel vacuum working for >> vacuumdb, I would question the wisdom of this effort. To me, it seems >> better spent improving autovacuum to cover whatever it is that this >> patch is supposed to be good for --- or maybe just enable having a >> shell script that launches multiple vacuumdb instances in parallel ... > > Thanks for looking into the patch, > > I think if we use shell script for launching parallel vacuumdb, we cannot get complete control of dividing the task, > If we directly divide table b/w multiple process, it may happen some process get very big tables then it will be as goodas one process is doing operation. > > In this patch at a time we assign only one table to each process and whichever process finishes fast, we assign new table,this way all process get equal sharing of the task. > > > Thanks & Regards, > Dilip Kumar > I have executed latest patch. One question is that how to use --jobs option is correct? $ vacuumdb -d postgres --jobs=30 I got following error. vacuumdb: unrecognized option '--jobs=30' Try "vacuumdb --help" for more information. Regards, ------- Sawada Masahiko
On 01 July 2014 22:17, Sawada Masahiko Wrote, > I have executed latest patch. > One question is that how to use --jobs option is correct? > $ vacuumdb -d postgres --jobs=30 > > I got following error. > vacuumdb: unrecognized option '--jobs=30' > Try "vacuumdb --help" for more information. > Thanks for comments, Your usage are correct, but there are some problem in code and I have fixed the same in attached patch. Apart from this issue fix currently I am working on jeff's comments for making the code common between pg_dump/parallel.cand scripts/vac_parallel.c. I found that almost 300 lines of code we can move to common place, but only problem is where to keep the common code. I am thinking of 1. keeping a common folder in bin folder --> src/bin/common and move common code which is specific to parallel operationin src/bin/common/parallel_common.c 2. Both vacuum db and pg_dump will compile this file in while generating there executables(in future other executable likereindex can also use same for parallel functionality) Thoughts ? Thanks & Regards, Dilip Kumar
Attachment
On Mon, Jun 30, 2014 at 3:17 PM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Jeff Janes wrote: > >> In particular, pgpipe is almost an exact duplicate between them, >> except the copy in vac_parallel.c has fallen behind changes made to >> parallel.c. (Those changes would have fixed the Windows warnings). I >> think that this function (and perhaps other parts as >> well--"exit_horribly" for example) need to refactored into a common >> file that both files can include. I don't know where the best place >> for that would be, though. (I haven't done this type of refactoring >> myself.) > > I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos. > Maybe we should move pgpipe back to src/port and have pg_dump and this > new thing use that. I'm not sure about the rest of duplication in > vac_parallel.c; there might be a lot in common with what > pg_dump/parallel.c does too. Having two copies of code is frowned upon > for good reasons. This patch introduces 1200 lines of new code in > vac_parallel.c, ugh. > > If we really require 1200 lines to get parallel vacuum working for > vacuumdb, I would question the wisdom of this effort. To me, it seems > better spent improving autovacuum to cover whatever it is that this > patch is supposed to be good for --- or maybe just enable having a shell > script that launches multiple vacuumdb instances in parallel ... I would only envision using the parallel feature for vacuumdb after a pg_upgrade or some other major maintenance window (that is the only time I ever envision using vacuumdb at all). I don't think autovacuum can be expected to handle such situations well, as it is designed to be a smooth background process. I guess the ideal solution would be for manual VACUUM to have a PARALLEL option, then vacuumdb could just invoke that one table at a time. That way you would get within-table parallelism which would be important if one table dominates the entire database cluster. But I don't foresee that happening any time soon. I don't know how to calibrate the number of lines that is worthwhile. If you write in C and need to have cross-platform compatibility and robust error handling, it seems to take hundreds of lines to do much of anything. The code duplication is a problem, but I don't think just raw line count is, especially since it has already been written. The trend in this project seems to be for shell scripts to eventually get converted into C programs. In fact, src/bin/scripts now has no scripts at all. Also it is important to vacuum/analyze tables in the same database at the same time, otherwise you will not get much speed-up in the ordinary case where there is only one meaningful database. Doing that in a shell script would be fairly hard. It should be pretty easy in Perl (at least for me--I'm sure others disagree), but that also doesn't seem to be the way we do things for programs intended for end users. Cheers, Jeff
Jeff Janes wrote: > I would only envision using the parallel feature for vacuumdb after a > pg_upgrade or some other major maintenance window (that is the only > time I ever envision using vacuumdb at all). I don't think autovacuum > can be expected to handle such situations well, as it is designed to > be a smooth background process. That's a fair point. One thing that would be pretty neat but I don't think I would get anyone to implement it, is having the user control the autovacuum launcher in some way. For instance "please vacuum this set of tables as quickly as possible", and it would launch as many workers are configured. It would take months to get a UI settled for this, however. > I guess the ideal solution would be for manual VACUUM to have a > PARALLEL option, then vacuumdb could just invoke that one table at a > time. That way you would get within-table parallelism which would be > important if one table dominates the entire database cluster. But I > don't foresee that happening any time soon. I see this as a completely different feature, which might also be pretty neat, at least if you're open to spending more I/O bandwidth processing a single table: have several processes scanning the heap simultaneously. Since I think vacuum is mostly I/O bound at the moment, I'm not sure there is much point in this currently. > I don't know how to calibrate the number of lines that is worthwhile. > If you write in C and need to have cross-platform compatibility and > robust error handling, it seems to take hundreds of lines to do much > of anything. The code duplication is a problem, but I don't think > just raw line count is, especially since it has already been written. Well, there are (at least) two types of duplicate code: first you have these common routines such as pgpipe that are duplicates for no good reason. Just move them to src/port or something and it's all good. But the OP said there is code that cannot be shared even though it's very similar in both incarnations. That means we cannot (or it's difficult to) just have one copy, which means as they fix bugs in one copy we need to update the other. This is bad -- witness the situation with ecpg's copy of date/time code, where there are bugs fixed in the backend version but the ecpg version does not have the fix. It's difficult to keep track of these things. > The trend in this project seems to be for shell scripts to eventually > get converted into C programs. In fact, src/bin/scripts now has no > scripts at all. Also it is important to vacuum/analyze tables in the > same database at the same time, otherwise you will not get much > speed-up in the ordinary case where there is only one meaningful > database. Doing that in a shell script would be fairly hard. It > should be pretty easy in Perl (at least for me--I'm sure others > disagree), but that also doesn't seem to be the way we do things for > programs intended for end users. Yeah, shipping shell scripts doesn't work very well for us. I'm thinking perhaps we can have sample scripts in which we show how to use parallel(1) to run multiple vacuumdb's in parallel in Unix and some similar mechanism in Windows, and that's it. So we wouldn't provide the complete toolset, but the platform surely has ways to make it happen. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Jul 2, 2014 at 2:27 PM, Dilip kumar <dilip.kumar@huawei.com> wrote: > On 01 July 2014 22:17, Sawada Masahiko Wrote, > > >> I have executed latest patch. >> One question is that how to use --jobs option is correct? >> $ vacuumdb -d postgres --jobs=30 >> >> I got following error. >> vacuumdb: unrecognized option '--jobs=30' >> Try "vacuumdb --help" for more information. >> > > Thanks for comments, Your usage are correct, but there are some problem in code and I have fixed the same in attached patch. > This patch allows to set 0 to -j option? When I set 0 to -j option, I think that the behavior of this is same as when I set to 1. Regards, ------- Sawada Masahiko
On 03 July 2014 00:01, Sawada Masahiko Wrote, > > > This patch allows to set 0 to -j option? > When I set 0 to -j option, I think that the behavior of this is same as > when I set to 1. > I have changed the patch, now It will return error if -j set to 0 or less than 0. "vacuumdb: Number of parallel "jobs" should be at least 1" Thanks & Regards, Dilip
Attachment
>
> Jeff Janes wrote:
>
> > I would only envision using the parallel feature for vacuumdb after a
> > pg_upgrade or some other major maintenance window (that is the only
> > time I ever envision using vacuumdb at all). I don't think autovacuum
> > can be expected to handle such situations well, as it is designed to
> > be a smooth background process.
>
> That's a fair point. One thing that would be pretty neat but I don't
> think I would get anyone to implement it, is having the user control the
> autovacuum launcher in some way. For instance "please vacuum this set
> of tables as quickly as possible", and it would launch as many workers
> are configured. It would take months to get a UI settled for this,
> however.
This sounds to be a better way to have multiple workers working
> > I don't know how to calibrate the number of lines that is worthwhile.
> > If you write in C and need to have cross-platform compatibility and
> > robust error handling, it seems to take hundreds of lines to do much
> > of anything. The code duplication is a problem, but I don't think
> > just raw line count is, especially since it has already been written.
>
> Well, there are (at least) two types of duplicate code: first you have
> these common routines such as pgpipe that are duplicates for no good
> reason. Just move them to src/port or something and it's all good. But
> the OP said there is code that cannot be shared even though it's very
> similar in both incarnations. That means we cannot (or it's difficult
> to) just have one copy, which means as they fix bugs in one copy we need
> to update the other.
I checked briefly the duplicate code among both versions and I think,
On 02 July 2014 23:45, Alvaro Herrera Wrote, > > Well, there are (at least) two types of duplicate code: first you have > these common routines such as pgpipe that are duplicates for no good > reason. Just move them to src/port or something and it's all good. > But the OP said there is code that cannot be shared even though it's > very similar in both incarnations. That means we cannot (or it's > difficult > to) just have one copy, which means as they fix bugs in one copy we > need to update the other. This is bad -- witness the situation with > ecpg's copy of date/time code, where there are bugs fixed in the > backend version but the ecpg version does not have the fix. It's > difficult to keep track of these things. In attached patch, I have moved pgpipe, piperead functions to src/port/pipe.c There are some more common function what Jeff and Amit also mentioned to move to common place, Currently I am not sure where we can move other functions to. Can we move other parallel functions to src/port, may be one new file parallel.c under src/port ? Thanks & Regards, Dilip Kumar
Attachment
On Fri, Jul 4, 2014 at 1:15 AM, Dilip kumar <dilip.kumar@huawei.com> wrote: > In attached patch, I have moved pgpipe, piperead functions to src/port/pipe.c If we want to consider proceeding with this approach, you should probably separate this into a refactoring patch that doesn't do anything but move code around and a feature patch that applies on top of it. (As to whether this is the right approach, I'm not sure.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 07 July 2014 17:55 Rebert Hass Wrote, > > On Fri, Jul 4, 2014 at 1:15 AM, Dilip kumar <dilip.kumar@huawei.com> > wrote: > > In attached patch, I have moved pgpipe, piperead functions to > > src/port/pipe.c > > If we want to consider proceeding with this approach, you should > probably separate this into a refactoring patch that doesn't do > anything but move code around and a feature patch that applies on top > of it. > > (As to whether this is the right approach, I'm not sure.) I have done the refactoring of the code. Two patches are attached 1. vacuumdb_parallel_refactor.patch --> Moved pg_dump, parallel code to port/parallel_utils.c (almost 800 lines are movedto the common code). 2. vacuumdb_parallel_v9 --> Feature changes for vaccumdb parallel (created on top of first patch). I think by this changes, we are able to address all the concerns we were having related to duplicate code. Thanks & Regards, Dilip Kumar
Attachment
On Fri, Jun 27, 2014 at 4:10 AM, Dilip kumar <dilip.kumar@huawei.com> wrote: > On 27 June 2014 02:57, Jeff Wrote, > >> Based on that, I find most importantly that it doesn't seem to >> correctly vacuum tables which have upper case letters in the name, >> because it does not quote the table names when they need quotes. > > Thanks for your comments.... > > There are two problem > First -> When doing the vacuum of complete database that time if any table with upper case letter, it was giving error > --FIXED by adding quotes for table name > > Second -> When user pass the table using -t option, and if it has uppercase letter > --This is the existing problem (without parallel implementation), Just for the record, I don't think the second one is actually a bug. If someone uses -t option from the command line, they are required to provide the quotes if quotes are needed, just like they would need to in psql. That can be annoying to do from a shell, as you then need to protect the quotes themselves from the shell, but that is the way it is. vacuumdb -t '"CrAzY QuOtE"' or vacuumdb -t \"CrAzY\ QuOtE\" Cheers, Jeff
On Tue, Jul 1, 2014 at 6:25 AM, Dilip kumar <dilip.kumar@huawei.com> wrote: > On 01 July 2014 03:48, Alvaro Wrote, > >> > In particular, pgpipe is almost an exact duplicate between them, >> > except the copy in vac_parallel.c has fallen behind changes made to >> > parallel.c. (Those changes would have fixed the Windows warnings). >> I >> > think that this function (and perhaps other parts as >> > well--"exit_horribly" for example) need to refactored into a common >> > file that both files can include. I don't know where the best place >> > for that would be, though. (I haven't done this type of refactoring >> > myself.) >> >> I think commit d2c1740dc275543a46721ed254ba3623f63d2204 is apropos. >> Maybe we should move pgpipe back to src/port and have pg_dump and this >> new thing use that. I'm not sure about the rest of duplication in >> vac_parallel.c; there might be a lot in common with what >> pg_dump/parallel.c does too. Having two copies of code is frowned upon >> for good reasons. This patch introduces 1200 lines of new code in >> vac_parallel.c, ugh. > >> >> If we really require 1200 lines to get parallel vacuum working for >> vacuumdb, I would question the wisdom of this effort. To me, it seems >> better spent improving autovacuum to cover whatever it is that this >> patch is supposed to be good for --- or maybe just enable having a >> shell script that launches multiple vacuumdb instances in parallel ... > > Thanks for looking into the patch, > > I think if we use shell script for launching parallel vacuumdb, we cannot get complete control of dividing the task, > If we directly divide table b/w multiple process, it may happen some process get very big tables then it will be as goodas one process is doing operation. > > In this patch at a time we assign only one table to each process and whichever process finishes fast, we assign new table,this way all process get equal sharing of the task. I am late to this game, but the first thing to my mind was - do we really need the whole forking/threading thing on the client at all? We need it for things like pg_dump/pg_restore because they can themselvse benefit from parallelism at the client level, but for something like this, might the code become a lot simpler if we just use multiple database connections and async queries? That would also bring the benefit of less platform dependent code, less cleanup needs etc. (Oh, and for some reason at my quick review i also noticed - you added quoting of the table name, but forgot to do it for the schema name. You should probably also look at using something like quote_identifier(), that'll make things easier). -- Magnus HaganderMe: http://www.hagander.net/Work: http://www.redpill-linpro.com/
On 15 July 2014 19:01, Magnus Hagander Wrote, > I am late to this game, but the first thing to my mind was - do we > really need the whole forking/threading thing on the client at all? We > need it for things like pg_dump/pg_restore because they can themselvse > benefit from parallelism at the client level, but for something like > this, might the code become a lot simpler if we just use multiple > database connections and async queries? That would also bring the > benefit of less platform dependent code, less cleanup needs etc. Thanks for the review, I understand you point, but I think if we have do this directly by independent connection, It's difficult to equally divide the jobs b/w multiple independent connections. As per this implementation we are able to share the load b/w the processes quite well, 1. If one process finishes the work faster it can take the other load. 2. Specially while vacuuming whole database, it's very difficult to divide the load if they are not centralized control. By above points, I think that we can have this patch.. > > (Oh, and for some reason at my quick review i also noticed - you added > quoting of the table name, but forgot to do it for the schema name. > You should probably also look at using something like > quote_identifier(), that'll make things easier). Thanks for the comments, I have attached the updated patch. vacuumdb_parallel_refactor --> No change vacuumdb_parallel_v9 --> Quotes added for namespace Thanks & Regards, Dilip
Attachment
Dilip kumar <dilip.kumar@huawei.com> writes: > On 15 July 2014 19:01, Magnus Hagander Wrote, >> I am late to this game, but the first thing to my mind was - do we >> really need the whole forking/threading thing on the client at all? > Thanks for the review, I understand you point, but I think if we have do this directly by independent connection, > It's difficult to equally divide the jobs b/w multiple independent connections. That argument seems like complete nonsense. You're confusing work allocation strategy with the implementation technology for the multiple working threads. I see no reason why a good allocation strategy couldn't work with either approach; indeed, I think it would likely be easier to do some things *without* client-side physical parallelism, because that makes it much simpler to handle feedback between the results of different operational threads. regards, tom lane
Tom Lane wrote: > Dilip kumar <dilip.kumar@huawei.com> writes: > > On 15 July 2014 19:01, Magnus Hagander Wrote, > >> I am late to this game, but the first thing to my mind was - do we > >> really need the whole forking/threading thing on the client at all? > > > Thanks for the review, I understand you point, but I think if we have do this directly by independent connection, > > It's difficult to equally divide the jobs b/w multiple independent connections. > > That argument seems like complete nonsense. You're confusing work > allocation strategy with the implementation technology for the multiple > working threads. I see no reason why a good allocation strategy couldn't > work with either approach; indeed, I think it would likely be easier to > do some things *without* client-side physical parallelism, because that > makes it much simpler to handle feedback between the results of different > operational threads. So you would have one initial connection, which generates a task list; then open N libpq connections. Launch one vacuum on each, and then sleep on select() on the three sockets. Whenever one returns read-ready, the vacuuming is done and we send another item from the task list. Repeat until tasklist is empty. No need to fork anything. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
<p dir="ltr"><br /> On Jul 16, 2014 7:05 AM, "Alvaro Herrera" <<a href="mailto:alvherre@2ndquadrant.com">alvherre@2ndquadrant.com</a>>wrote:<br /> ><br /> > Tom Lane wrote:<br />> > Dilip kumar <<a href="mailto:dilip.kumar@huawei.com">dilip.kumar@huawei.com</a>> writes:<br /> > >> On 15 July 2014 19:01, Magnus Hagander Wrote,<br /> > > >> I am late to this game, but the first thingto my mind was - do we<br /> > > >> really need the whole forking/threading thing on the client at all?<br/> > ><br /> > > > Thanks for the review, I understand you point, but I think if we have do this directlyby independent connection,<br /> > > > It's difficult to equally divide the jobs b/w multiple independentconnections.<br /> > ><br /> > > That argument seems like complete nonsense. You're confusing work<br/> > > allocation strategy with the implementation technology for the multiple<br /> > > working threads. I see no reason why a good allocation strategy couldn't<br /> > > work with either approach; indeed, I thinkit would likely be easier to<br /> > > do some things *without* client-side physical parallelism, because that<br/> > > makes it much simpler to handle feedback between the results of different<br /> > > operationalthreads.<br /> ><br /> > So you would have one initial connection, which generates a task list;<br /> >then open N libpq connections. Launch one vacuum on each, and then<br /> > sleep on select() on the three sockets. Whenever one returns<br /> > read-ready, the vacuuming is done and we send another item from the task<br /> >list. Repeat until tasklist is empty. No need to fork anything.<br /> ><p dir="ltr">Yeah, those are exactly my points.I think it would be significantly simpler to do it that way, rather than forking and threading. And also easier tomake portable... <p dir="ltr">(and as a optimization on Alvaros suggestion, you can of course reuse the initial connectionas one of the workers as long as you got the full list of tasks from it up front, which I think you do anywayin order to do sorting of tasks...) <p dir="ltr">/Magnus <br />
On 16 July 2014 12:13 Magnus Hagander Wrote,
>>Yeah, those are exactly my points. I think it would be significantly simpler to do it that way, rather than forking and threading. And also easier to make portable...
>>(and as a optimization on Alvaros suggestion, you can of course reuse the initial connection as one of the workers as long as you got the full list of tasks from it up front, which I think you do anyway in order to do sorting of tasks...)
Oh, I got your point, I will update my patch and send,
Now we can completely remove vac_parallel.h file and no need of refactoring also:)
Thanks & Regards,
Dilip Kumar
From: Magnus Hagander [mailto:magnus@hagander.net]
Sent: 16 July 2014 12:13
To: Alvaro Herrera
Cc: Dilip kumar; Jan Lentfer; Tom Lane; PostgreSQL-development; Sawada Masahiko; Euler Taveira
Subject: Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Jul 16, 2014 7:05 AM, "Alvaro Herrera" <alvherre@2ndquadrant.com> wrote:
>
> Tom Lane wrote:
> > Dilip kumar <dilip.kumar@huawei.com> writes:
> > > On 15 July 2014 19:01, Magnus Hagander Wrote,
> > >> I am late to this game, but the first thing to my mind was - do we
> > >> really need the whole forking/threading thing on the client at all?
> >
> > > Thanks for the review, I understand you point, but I think if we have do this directly by independent connection,
> > > It's difficult to equally divide the jobs b/w multiple independent connections.
> >
> > That argument seems like complete nonsense. You're confusing work
> > allocation strategy with the implementation technology for the multiple
> > working threads. I see no reason why a good allocation strategy couldn't
> > work with either approach; indeed, I think it would likely be easier to
> > do some things *without* client-side physical parallelism, because that
> > makes it much simpler to handle feedback between the results of different
> > operational threads.
>
> So you would have one initial connection, which generates a task list;
> then open N libpq connections. Launch one vacuum on each, and then
> sleep on select() on the three sockets. Whenever one returns
> read-ready, the vacuuming is done and we send another item from the task
> list. Repeat until tasklist is empty. No need to fork anything.
>
Yeah, those are exactly my points. I think it would be significantly simpler to do it that way, rather than forking and threading. And also easier to make portable...
(and as a optimization on Alvaros suggestion, you can of course reuse the initial connection as one of the workers as long as you got the full list of tasks from it up front, which I think you do anyway in order to do sorting of tasks...)
/Magnus
On 16 July 2014 12:13, Magnus Hagander Wrote,
>Yeah, those are exactly my points. I think it would be significantly simpler to do it that way, rather than forking and threading. And also easier to make portable...
>(and as a optimization on Alvaros suggestion, you can of course reuse the initial connection as one of the workers as long as you got the full list of tasks from it up front, which I think you do anyway in order to sorting of tasks...)
I have modified the patch as per the suggestion,
Now in beginning we create all connections, and first connection we use for getting table list in beginning, After that all connections will be involved in vacuum task.
Please have a look and provide your opinion…
Thanks & Regards,
Dilip Kumar
Attachment
On 16 July 2014 12:13 Magnus Hagander Wrote,
>>Yeah, those are exactly my points. I think it would be significantly simpler to do it that way, rather than forking and threading. And also easier to make portable...
>>(and as a optimization on Alvaros suggestion, you can of course reuse the initial connection as one of the workers as long as you got the full list of tasks from it up front, which I think you do anyway in order to do sorting of tasks...)
Oh, I got your point, I will update my patch and send,
Now we can completely remove vac_parallel.h file and no need of refactoring also:)
Thanks & Regards,
Dilip Kumar
Jeff Janes wrote: > Should we push the refactoring through anyway? I have a hard time > believing that pg_dump is going to be the only client program we ever have > that will need process-level parallelism, even if this feature itself does > not need it. Why make the next person who comes along re-invent that > re-factoring of this wheel? I gave the refactoring patch a look some days ago, and my conclusion was that it is reasonably sound but it needed quite some cleanup in order for it to be committable. Without any immediate use case, it's hard to justify going through all that effort. Maybe we can add a TODO item and have it point to the posted patch, so that if in the future we see a need for another parallel client program we can easily rebase the current patch. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
> On 16 July 2014 12:13, Magnus Hagander Wrote,
> >Yeah, those are exactly my points. I think it would be significantly simpler to do it that way, rather than forking and threading. And also easier to make portable...
>
> >(and as a optimization on Alvaros suggestion, you can of course reuse the initial connection as one of the workers as long as you got the full list of tasks from it up front, which I think you do anyway in order to sorting of tasks...)
>
> I have modified the patch as per the suggestion,
>
> Now in beginning we create all connections, and first connection we use for getting table list in beginning, After that all connections will be involved in vacuum task.
>
> Please have a look and provide your opinion…
On 31 July 2014 10:59, Amit kapila Wrote,
Thanks for the review and valuable comments.
I have fixed all the comments and attached the revised patch.
As per your suggestion I have taken the performance report also…
Test1:
Machine Configuration:
Core : 8 (Intel(R) Xeon(R) CPU E5520 @ 2.27GHz)
RAM: 48GB
Test Scenario:
8 tables all with 1M+ records. [many records are deleted and inserted using some pattern, (files is attached in the mail)]
Test Result
Base Code: 43.126s
Parallel Vacuum Code
2 Threads : 29.687s
8 Threads : 14.647s
Test2: (as per your scenario, where actual vacuum time is very less)
Vacuum done for complete DB
8 tables all with 10000 records and few dead tuples
Test Result
Base Code: 0.59s
Parallel Vacuum Code
2 Threads : 0.50s
4 Threads : 0.29s
8 Threads : 0.18s
Regards,
Dilip Kumar
From: Amit Kapila [mailto:amit.kapila16@gmail.com]
Sent: 31 July 2014 10:59
To: Dilip kumar
Cc: Magnus Hagander; Alvaro Herrera; Jan Lentfer; Tom Lane; PostgreSQL-development; Sawada Masahiko; Euler Taveira
Subject: Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Fri, Jul 18, 2014 at 10:22 AM, Dilip kumar <dilip.kumar@huawei.com> wrote:
> On 16 July 2014 12:13, Magnus Hagander Wrote,
> >Yeah, those are exactly my points. I think it would be significantly simpler to do it that way, rather than forking and threading. And also easier to make portable...
>
> >(and as a optimization on Alvaros suggestion, you can of course reuse the initial connection as one of the workers as long as you got the full list of tasks from it up front, which I think you do anyway in order to sorting of tasks...)
>
> I have modified the patch as per the suggestion,
>
> Now in beginning we create all connections, and first connection we use for getting table list in beginning, After that all connections will be involved in vacuum task.
>
> Please have a look and provide your opinion…
1.
+ connSlot = (ParallelSlot*)pg_malloc(parallel * sizeof(ParallelSlot));
+ for (i = 0; i < parallel; i++)
+ {
+ connSlot[i].connection = connectDatabase(dbname, host, port, username,
+ prompt_password, progname, false);
+
+ PQsetnonblocking(connSlot[i].connection, 1);
+ connSlot[i].isFree = true;
+ connSlot[i].sock = PQsocket(connSlot[i].connection);
+ }
Here it seems to me that you are opening connections before
getting or checking tables list, so in case you have lesser
number of tables, won't the extra connections be always idle.
Simple case to erify the same is with below example
vacuumdb -t t1 -d postgres -j 4
2.
+ res = executeQuery(conn,
+ "select relname, nspname from pg_class c, pg_namespace ns"
+ " where relkind= \'r\' and c.relnamespace = ns.oid"
+ " order by relpages desc",
+ progname, echo);
Here it is just trying to get the list of relations, however
Vacuum command processes materialized views as well, so
I think here the list should include materialized views as well
unless you have any specific reason for not including those.
3. In function vacuum_parallel(), if user has not provided list of tables,
then it is retrieving all the tables in database and then in run_parallel_vacuum(),
it tries to do Vacuum for each of table using Async mechanism, now
consider a case when after getting list if any table is dropped by user
from some other session, then patch will error out. However without patch
or Vacuum command will internally ignore such a case and complete
the Vacuum for other tables. Don't you think the patch should maintain
the existing behaviour?
4.
+ <term><option>-j <replaceable class="parameter">jobs</replaceable></></term>
+ Number of parallel process to perform the operation.
Change this description as per new implementation. Also I think
there is a need of some explanation for this new option.
5.
It seems there is no change in below function decalration:
static void vacuum_one_database(const char *dbname, bool full, bool verbose,
! bool and_analyze, bool analyze_only, bool analyze_in_stages,
! bool freeze, const char *table, const char *host,
! const char *port, const char *username,
! enum trivalue prompt_password,
const char *progname, bool echo);
6.
+ printf(_(" -j, --jobs=NUM use this many parallel jobs to vacuum\n"));
Change the description as per new implementation.
7.
/* This will give the free connection slot, if no slot is free it will
wait for atleast one slot to get free.*/
Multiline comments should be written like (refer other places)
/*
* This will give the free connection slot, if no slot is free it will
* wait for atleast one slot to get free.
*/
Kindly correct at other places if similar instance exist in patch.
8.
Isn't it a good idea to check performance of this new patch
especially for some worst cases like when there is not much
to vacuum in the tables inside a database. The reason I wanted
to check is that because with new algorithm (for a vacuum of database,
now it will get the list of tables and perform vacuum on individual
tables) we have to repeat certain actions in server side like
allocation/deallocataion of context, sending stats which would have
been otherwise done once.
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachment
<div dir="ltr">On Mon, Aug 4, 2014 at 11:41 AM, Dilip kumar <<a href="mailto:dilip.kumar@huawei.com">dilip.kumar@huawei.com</a>>wrote:<br />><br />> On 31 July 2014 10:59, Amitkapila Wrote,<br />><br />> <br />><br /> > Thanks for the review and valuable comments.<br />> I havefixed all the comments and attached the revised patch.<br /><br />I have again looked into your revised patch and wouldlike<br />to share my findings with you.<br /><br />1. <br />+ Number of parallel connections to perform theoperation. This option will enable the vacuum<br />+ operation to run on parallel connections, at a time one tablewill be operated on one connection.<br /><br />a. How about describing w.r.t asynchronous connections<br />instead ofparallel connections?<br />b. It is better to have line length as lesser than 80.<br />c. As you are using multiple connectionsto achieve parallelism,<br /> I suggest you add a line in your description indicating user should<br />verifymax_connections parameters. Something similar to pg_dump:<br /><br />"pg_dump will open njobs + 1 connections tothe database, so make<br /> sure your max_connections setting is high enough to accommodate<br /> all connections."<br/> <br /><br />2. <br />+ So at one time as many tables will be vacuumed parallely as number of jobs.<br/><br />can you briefly mention about the case when number of jobs<br />is more than number of tables?<br /><br />3.<br/>+ /* When user is giving the table list, and list is smaller then<br />+ * number of tables<br />+ */<br />+ if(tbl_count && (parallel > tbl_count))<br />+ parallel = tbl_count;<br />+ <br /><br />Again here multiline commentsare wrong.<br /><br />Some other instances are as below:<br />a.<br />/* This will give the free connection slot,if no slot is free it will<br />* wait for atleast one slot to get free.<br />*/<br />b.<br />/* if table list is notprovided then we need to do vaccum for whole DB<br /> * get the list of all tables and prpare the list<br />*/<br />c.<br/>/* Some of the slot are free, Process the results for slots whichever<br />* are free<br />*/<br /><br /><br />4.<br/>src/bin/scripts/vacuumdb.c:51: indent with spaces.<br /> + bool analyze_only, bool freeze, PQExpBuffersql);<br />src/bin/scripts/vacuumdb.c:116: indent with spaces.<br />+ int parallel = 0;<br />src/bin/scripts/vacuumdb.c:198:indent with spaces.<br />+ optind++;<br /> src/bin/scripts/vacuumdb.c:299: space beforetab in indent.<br />+ vacuum_one_database(dbname, full, verbose, and_analyze,<br /><br />Thereare lot of redundant whitespaces, check with<br />git diff --check and fix them.<br /><br /><br />5.<br />res = executeQuery(conn,<br/> "select relname, nspname from pg_class c, pg_namespace ns"<br /> " where (relkind= \'r\' or relkind = \'m\')"<br /> " and c.relnamespace = ns.oid order by relpages desc",<br /> progname, echo);<br /><br />a. Here you need to use SQL keywords in capital letters, refer one<br />of the other callerof executeQuery() in vacuumdb.c<br />b. Why do you need this condition c.relnamespace = ns.oid in above<br /> query?<br/>I think to get the list of required objects from pg_class, you don't<br />need to have a join with pg_namespace.<br/><br />6.<br />vacuum_parallel()<br />{<br />..<br />if (!tables || !tables->head)<br />{<br />..<br/>tbl_count++;<br /> }<br />..<br />}<br /><br />a. Here why you need a separate variable (tbl_count) to verify number<br/>asynchronous/parallel connections you want, why can't we use ntuple?<br />b. there is a warning in code (I havecompiled it on windows) as well<br /> related to this variable.<br />vacuumdb.c(695): warning C4700: uninitialized localvariable 'tbl_count' used<br /><br /><br />7.<br />Fix for one of my previous comment is as below:<br />GetQueryResult()<br/>{<br />..<br />if (!r && !completedb)<br /> ..<br />}<br /><br />Here I think some genericerrors like connection broken or others<br />will also get ignored. Is it possible that we can ignore particular<br/>error which we want to ignore without complicating the code?<br /><br /> Also in anycase add comments to explainwhy you are ignoring<br />error for particular case.<br /><br /><br />8.<br />+ fprintf(stderr, _("%s: Number of parallel\"jobs\" should be at least 1\n"),<br />+ progname);<br /> formatting of 2nd line progname is not as per standard(you can refer other fprintf in the same file).<br /><br />9. + int parallel = 0;<br />I think it is better toname it as numAsyncCons or something similar.<br /><br />10. It is better if you can add function header for newly added<br/> functions. <br /> <br />><br />> Test2: (as per your scenario, where actual vacuum time is very less)<br/>><br />> Vacuum done for complete DB<br />><br />> 8 tables all with 10000records and few dead tuples<br /><br />I think this test is missing in attached file. Few means?<br />Can you try with0.1%, 1% of dead tuples in table and try to<br />take time in milliseconds if it is taking less time to complete<br />thetest.<br /><br /><br />PS - <br /> It is better if you mention against each review comment/suggestion<br />what youhave done, because in some cases it will help reviewer to<br />understand your fix easily and as author you will alsobe sure that<br />all got fixed.<br /><br />With Regards,<br />Amit Kapila.<br />EnterpriseDB: <a href="http://www.enterprisedb.com">http://www.enterprisedb.com</a></div>
On 11 August 2014 10:29, Amit kapila wrote,
1. I have fixed all the review comments except few, and modified patch is attached.
2. For not fixed comments, find inline reply in the mail..
>1.
>+ Number of parallel connections to perform the operation. This option will enable the vacuum
>+ operation to run on parallel connections, at a time one table will be operated on one connection.
>
>a. How about describing w.r.t asynchronous connections
>instead of parallel connections?
>b. It is better to have line length as lesser than 80.
>c. As you are using multiple connections to achieve parallelism,
>I suggest you add a line in your description indicating user should
>verify max_connections parameters. Something similar to pg_dump:
>
>2.
>+ So at one time as many tables will be vacuumed parallely as number of jobs.
>
>can you briefly mention about the case when number of jobs
>is more than number of tables?
1 and 2 ARE FIXED in DOC.
>3.
>+ /* When user is giving the table list, and list is smaller then
>+ * number of tables
>+ */
>+ if (tbl_count && (parallel > tbl_count))
>+ parallel = tbl_count;
>+
>
>Again here multiline comments are wrong.
>
>Some other instances are as below:
>a.
>/* This will give the free connection slot, if no slot is free it will
>* wait for atleast one slot to get free.
>*/
>b.
>/* if table list is not provided then we need to do vaccum for whole DB
>* get the list of all tables and prpare the list
>*/
>c.
>/* Some of the slot are free, Process the results for slots whichever
>* are free
>*/
COMMENTS are FIXED
>4.
>src/bin/scripts/vacuumdb.c:51: indent with spaces.
>+ bool analyze_only, bool freeze, PQExpBuffer sql);
>src/bin/scripts/vacuumdb.c:116: indent with spaces.
>+ int parallel = 0;
>src/bin/scripts/vacuumdb.c:198: indent with spaces.
>+ optind++;
>src/bin/scripts/vacuumdb.c:299: space before tab in indent.
>+ vacuum_one_database(dbname, full, verbose, and_analyze,
>
>There are lot of redundant whitespaces, check with
>git diff --check and fix them.
ALL are FIXED
>
>5.
>res = executeQuery(conn,
> "select relname, nspname from pg_class c, pg_namespace ns"
> " where (relkind = \'r\' or relkind = \'m\')"
> " and c.relnamespace = ns.oid order by relpages desc",
> progname, echo);
>
>a. Here you need to use SQL keywords in capital letters, refer one
>of the other caller of executeQuery() in vacuumdb.c
>b. Why do you need this condition c.relnamespace = ns.oid in above
>query?
IT IS POSSIBLE THAT, TWO NAMESPACE HAVE THE SAME TABLE NAME, SO WHEN WE ARE SENDING COMMAND FROM CLIENT WE NEED TO GIVE NAMESPACE WISE BECAUSE WE NEED TO VACUUM ALL THE
TABLES.. (OTHERWISE TWO TABLE WITH SAME NAME FROM DIFFERENT NAMESPACE WILL BE TREATED SAME.)
>I think to get the list of required objects from pg_class, you don't
>need to have a join with pg_namespace.
DONE
>6.
>vacuum_parallel()
>{
>..
>if (!tables || !tables->head)
>{
>..
>tbl_count++;
>}
>..
>}
>
>a. Here why you need a separate variable (tbl_count) to verify number
>asynchronous/parallel connections you want, why can't we use ntuple?
>b. there is a warning in code (I have compiled it on windows) as well
>related to this variable.
>vacuumdb.c(695): warning C4700: uninitialized local variable 'tbl_count' used
Variable REMOVED
>
>7.
>Fix for one of my previous comment is as below:
>GetQueryResult()
>{
>..
>if (!r && !completedb)
>..
>}
>
>Here I think some generic errors like connection broken or others
>will also get ignored. Is it possible that we can ignore particular
>error which we want to ignore without complicating the code?
>
>Also in anycase add comments to explain why you are ignoring
>error for particular case.
Here we are getting message string, I think if we need to find the error code then we need to parse the string, and after that we need to compare with error codes.
Is there any other way to do this ?
Comments are added
>8.
>+ fprintf(stderr, _("%s: Number of parallel \"jobs\" should be at least 1\n"),
>+ progname);
>formatting of 2nd line progname is not as per standard (you can refer other fprintf in the same file).
DONE
>9. + int parallel = 0;
>I think it is better to name it as numAsyncCons or something similar.
CHANGED as PER SUGGESTION
>10. It is better if you can add function header for newly added
>functions.
ADDED
>>
>> Test2: (as per your scenario, where actual vacuum time is very less)
>>
>> Vacuum done for complete DB
>>
>> 8 tables all with 10000 records and few dead tuples
>
>I think this test is missing in attached file. Few means?
>Can you try with 0.1%, 1% of dead tuples in table and try to
>take time in milliseconds if it is taking less time to complete
>the test.
TESTED with 1%, 0.1% and 0.01 % and results are as follows
1. With 1% (file test1%.sql)
Base Code : 22.26 s
2 Threads : 12.82 s
4 Threads : 9.19s
8 Threads : 8.89s
2. With 0.1%
Base Code : 3.83.26 s
2 Threads : 2.01 s
4 Threads : 2.02s
8 Threads : 2.25s
3. With 0.01%
Base Code : 0.60 s
2 Threads : 0.32 s
4 Threads : 0.26s
8 Threads : 0.31s
Thanks & Regards,
Dilip
Attachment
On Mon, Aug 11, 2014 at 12:59 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > 1. > + Number of parallel connections to perform the operation. This > option will enable the vacuum > + operation to run on parallel connections, at a time one table will > be operated on one connection. > > a. How about describing w.r.t asynchronous connections > instead of parallel connections? I don't think "asynchronous" is a good choice of word. Maybe "simultaneous"? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
> On 11 August 2014 10:29, Amit kapila wrote,
> >5.
>
> >res = executeQuery(conn,
>
> > "select relname, nspname from pg_class c, pg_namespace ns"
>
> > " where (relkind = \'r\' or relkind = \'m\')"
>
> > " and c.relnamespace = ns.oid order by relpages desc",
>
> > progname, echo);
>
> >
>
> >a. Here you need to use SQL keywords in capital letters, refer one
> >of the other caller of executeQuery() in vacuumdb.c
>
> >b. Why do you need this condition c.relnamespace = ns.oid in above
>
> >query?
>
>
>
> IT IS POSSIBLE THAT, TWO NAMESPACE HAVE THE SAME TABLE NAME, SO WHEN WE ARE SENDING COMMAND FROM CLIENT WE NEED TO GIVE NAMESPACE WISE BECAUSE WE NEED TO VACUUM ALL THE
>
> TABLES.. (OTHERWISE TWO TABLE WITH SAME NAME FROM DIFFERENT NAMESPACE WILL BE TREATED SAME.)
> >7.
>
>
> Here we are getting message string, I think if we need to find the error code then we need to parse the string, and after that we need to compare with error codes.
>
> Is there any other way to do this ?
>
> On Mon, Aug 11, 2014 at 12:59 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > 1.
> > + Number of parallel connections to perform the operation. This
> > option will enable the vacuum
> > + operation to run on parallel connections, at a time one table will
> > be operated on one connection.
> >
> > a. How about describing w.r.t asynchronous connections
> > instead of parallel connections?
>
> I don't think "asynchronous" is a good choice of word.
Not sure. How about *concurrent* or *multiple*?
On Tue, Aug 19, 2014 at 7:08 AM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Fri, Aug 15, 2014 at 12:55 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> >> On Mon, Aug 11, 2014 at 12:59 AM, Amit Kapila <amit.kapila16@gmail.com> >> wrote: >> > 1. >> > + Number of parallel connections to perform the operation. This >> > option will enable the vacuum >> > + operation to run on parallel connections, at a time one table >> > will >> > be operated on one connection. >> > >> > a. How about describing w.r.t asynchronous connections >> > instead of parallel connections? >> >> I don't think "asynchronous" is a good choice of word. > > Agreed. > >>Maybe "simultaneous"? > > Not sure. How about *concurrent* or *multiple*? multiple isn't right, but we could say concurrent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>
> On Tue, Aug 19, 2014 at 7:08 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Fri, Aug 15, 2014 at 12:55 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> >>
> >> On Mon, Aug 11, 2014 at 12:59 AM, Amit Kapila <amit.kapila16@gmail.com>
> >> wrote:
> >> >
> >> > a. How about describing w.r.t asynchronous connections
> >> > instead of parallel connections?
> >>
> >> I don't think "asynchronous" is a good choice of word.
> >
> > Agreed.
> >
> >>Maybe "simultaneous"?
> >
> > Not sure. How about *concurrent* or *multiple*?
>
> multiple isn't right, but we could say concurrent.
<div class="WordSection1"><p class="MsoNormal">On<span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"></span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">21August 2014 08:31, Amit Kapila Wrote, </span><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"></span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><pclass="MsoNormal" style="margin-bottom:12.0pt">>>><br /> > > >Not sure. How about *concurrent* or *multiple*?<br /> >><br /> > >multiple isn't right, but we could say concurrent.<p class="MsoNormal">>I also find concurrentmore appropriate.<p class="MsoNormal">>Dilip, could you please change it to concurrent in doc updates,<p class="MsoNormal">>variables,functions unless you see any objection for the same.<p class="MsoNormal"> <p class="MsoNormal">Ok,I will take care along with other comments fix..<p class="MsoNormal"> <p class="MsoNormal">Regards,<pclass="MsoNormal">Dilip Kumar</div>
>
> Few more comments:
>
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
On 24 August 2014 11:33, Amit Kapila Wrote
Thanks for you comments, i have worked on both the review comment lists, sent on 19 August, and 24 August.
Latest patch is attached with the mail..
on 19 August:
------------
>You can compare against SQLSTATE by using below API.
>val = PQresultErrorField(res, PG_DIAG_SQLSTATE);
>You need to handle *42P01* SQLSTATE, also please refer below
>usage where we are checking SQLSTATE.
>fe-connect.c
>PQresultErrorField(conn->result, PG_DIAG_SQLSTATE),
> ERRCODE_INVALID_PASSWORD) == 0)
DONE
>Few more comments:
>
>1.
>* If user has not given the vacuum of complete db, then if
>
>I think here you have said reverse of what code is doing.
>You don't need *not* in above sentence.
DONE
>2.
>+ appendPQExpBuffer(&sql, "\"%s\".\"%s\"", nspace, relName);
>I think here you need to use function fmtQualifiedId() or fmtId()
>or something similar to handle quotes appropriately.
DONE
>3.
>
>+ */
>+ if (!r && !completedb)
>Here the usage of completedb variable is reversed which means
>that it goes into error path when actually whole database is
>getting vacuumed and the reason is that you are setting it
>to false in below code:
>+ /* Vaccuming full database*/
>+ vacuum_tables = false;
FIXED
>4.
>Functions prepare_command() and vacuum_one_database() contain
>duplicate code, is there any problem in using prepare_command()
>function in vacuum_one_database(). Another point in this context
>is that I think it is better to name function prepare_command()
>as append_vacuum_options() or something on that lines, also it will
>be better if you can write function header for this function as well.
DONE
>5.
>+ if (error)
>+ {
>+ for (i = 0; i < max_slot; i++)
>+ {
>+ DisconnectDatabase(&connSlot[i]);
>+ }
>
>Here why do we need DisconnectDatabase() type of function?
>Why can't we simply call PQfinish() as in base code?
Beacause base code jut need to handle main connection, and when sending the PQfinish, means either its completed or error,
In both the cases, control is with client, But in multi connection case, if one connection fails then we need to send
cancle to on other connection that wwhat is done DisconnectDatabase.
>
>6.
>+ /*
>+ * if table list is not provided then we need to do vaccum for whole DB
>+ * get the list of all tables and prpare the list
>+ */
>spelling of prepare is wrong. I have noticed spell mistake
>in comments at some other place as well, please check all
>comments once
FIXED
>
>7. I think in new mechanism cancel handler will not work.
>In single connection vacuum it was always set/reset
>in function executeMaintenanceCommand(). You might need
>to set/reset it in function run_parallel_vacuum().
Good catch, Now i have called SetCancelConn(pSlot[0].connection), on first connection. this will enable cancle
handler to cancle the query on first connection so that select loop will come out.
24 August
---------
>
>1. I could see one shortcoming in the way the patch has currently parallelize the
> work for --analyze-in-stages. Basically patch is performing the work for each stage
> for multiple tables in concurrent connections that seems okay for the cases when
> number of parallel connections is less than equal to number of tables, but for
> the case when user has asked for more number of connections than number of tables,
> then I think this strategy will not be able to use the extra connections.
I think --analyze-in-stages should maintain the order.
>2. Similarly for the case of multiple databases, currently it will not be able
> to use connections more than number of tables in each database because the
> parallelizing strategy is to just use the conncurrent connections for
> tables inside single database.
I think for multiple database doing the parallel execution we need to maintain the multiple connection with multiple databases.
And we need to maintain a table list for all the databases together to run them concurrently. I think this may impact the startup cost,
as we need to create a multiple connection and disconnect for preparing the list and i think it will increase the complexity also.
>I am not completely sure whether current strategy is good enough or
>we should try to address the above problems. What do you think?
>3.
>+ do
>+ {
>+ i = select_loop(maxFd, &slotset);
>+ Assert(i != 0);
>
>Could you explain the reason of using this loop, I think you
>want to wait for data on socket descriptor, but why for maxFd?
>Also it is better if you explain this logic in comments.
We are sending vacuum job to the connection and when non of the connection slot is free, we are waiting on all the socket, and
wait until one of them get freed.
>4.
>+ for (i = 0; i < max_slot; i++)
>+ {
>+ if (!FD_ISSET(pSlot[i].sock, &slotset))
>+ continue;
>+
>+ PQconsumeInput(pSlot[i].connection);
>+ if (PQisBusy(pSlot[i].connection))
>+ continue;
>
>I think it is better to call PQconsumeInput() only if you find
>connection is busy.
I think here logic is bit different, in other places of code they call PQconsumeInput untill it shows PQisBusy in a loop to consume all data,
but in over case, we have sent the query now we consume if there is something on n/w and then check PQisBusy,
if not busy means this connection is freed.
And PQconsumeInput functionality is if input available then only consume it so i think we need not to checkk for PQisBusy externally,
However the case where user have to fetch complete data, that time they need to call PQisBusy and then PQconsumeInput in loop so PQconsumeInput
will not get called in tight loop, and will be called only when data connection is busy.
Regards,
Dilip
Attachment
On 24 August 2014 11:33, Amit Kapila Wrote
Thanks for you comments, i have worked on both the review comment lists, sent on 19 August, and 24 August.
Latest patch is attached with the mail..
On Wed, Sep 24, 2014 at 2:48 AM, Dilip kumar <dilip.kumar@huawei.com> wrote:On 24 August 2014 11:33, Amit Kapila Wrote
Thanks for you comments, i have worked on both the review comment lists, sent on 19 August, and 24 August.
Latest patch is attached with the mail..
Hi Dilip,I think you have an off-by-one error in the index into the array of file handles.
> On 24 August 2014 11:33, Amit Kapila Wrote
>
> >7. I think in new mechanism cancel handler will not work.
>
> >In single connection vacuum it was always set/reset
>
> >in function executeMaintenanceCommand(). You might need
>
> >to set/reset it in function run_parallel_vacuum().
>
>
>
> Good catch, Now i have called SetCancelConn(pSlot[0].connection), on first connection. this will enable cancle
>
> handler to cancle the query on first connection so that select loop will come out.
>
>
>
>
> >1. I could see one shortcoming in the way the patch has currently parallelize the
>
> > work for --analyze-in-stages. Basically patch is performing the work for each stage
>
> > for multiple tables in concurrent connections that seems okay for the cases when
>
> > number of parallel connections is less than equal to number of tables, but for
>
> > the case when user has asked for more number of connections than number of tables,
>
> > then I think this strategy will not be able to use the extra connections.
>
>
>
> I think --analyze-in-stages should maintain the order.
Yes, you are right. So lets keep the code as it is for this case.
>
>
> >2. Similarly for the case of multiple databases, currently it will not be able
>
> > to use connections more than number of tables in each database because the
>
> > parallelizing strategy is to just use the conncurrent connections for
>
> > tables inside single database.
>
>
>
> I think for multiple database doing the parallel execution we need to maintain the multiple connection with multiple databases.
> And we need to maintain a table list for all the databases together to run them concurrently. I think this may impact the startup cost,
> as we need to create a multiple connection and disconnect for preparing the list
Amit Kapila wrote: > Today while again thinking about the startegy used in patch to > parallelize the operation (vacuum database), I think we can > improve the same for cases when number of connections are > lesser than number of tables in database (which I presume > will normally be the case). Currently we are sending command > to vacuum one table per connection, how about sending multiple > commands (example Vacuum t1; Vacuum t2) on one connection. > It seems to me there is extra roundtrip for cases when there > are many small tables in database and few large tables. Do > you think we should optimize for any such cases? I don't think this is a good idea; at least not in a first cut of this patch. It's easy to imagine that a table you initially think is small enough turns out to have grown much larger since last analyze. In that case, putting one worker to process that one together with some other table could end up being bad for parallelism, if later it turns out that some other worker has no table to process. (Table t2 in your example could grown between the time the command is sent and t1 is vacuumed.) It's simpler to have workers do one thing at a time only. I don't think it's a very good idea to call pg_relation_size() on every table in the database from vacuumdb. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
>
> Amit Kapila wrote:
>
> > Today while again thinking about the startegy used in patch to
> > parallelize the operation (vacuum database), I think we can
> > improve the same for cases when number of connections are
> > lesser than number of tables in database (which I presume
> > will normally be the case). Currently we are sending command
> > to vacuum one table per connection, how about sending multiple
> > commands (example Vacuum t1; Vacuum t2) on one connection.
> > It seems to me there is extra roundtrip for cases when there
> > are many small tables in database and few large tables. Do
> > you think we should optimize for any such cases?
>
> I don't think this is a good idea; at least not in a first cut of this
> patch. It's easy to imagine that a table you initially think is small
> enough turns out to have grown much larger since last analyze.
> case, putting one worker to process that one together with some other
> table could end up being bad for parallelism, if later it turns out that
> some other worker has no table to process. (Table t2 in your example
> could grown between the time the command is sent and t1 is vacuumed.)
>
> It's simpler to have workers do one thing at a time only.
Yeah probably that is best at least for initial patch.
> I don't think it's a very good idea to call pg_relation_size() on every
> table in the database from vacuumdb.
On 27/09/14 01:36, Alvaro Herrera wrote: > Amit Kapila wrote: > >> Today while again thinking about the startegy used in patch to >> parallelize the operation (vacuum database), I think we can >> improve the same for cases when number of connections are >> lesser than number of tables in database (which I presume >> will normally be the case). Currently we are sending command >> to vacuum one table per connection, how about sending multiple >> commands (example Vacuum t1; Vacuum t2) on one connection. >> It seems to me there is extra roundtrip for cases when there >> are many small tables in database and few large tables. Do >> you think we should optimize for any such cases? > I don't think this is a good idea; at least not in a first cut of this > patch. It's easy to imagine that a table you initially think is small > enough turns out to have grown much larger since last analyze. In that > case, putting one worker to process that one together with some other > table could end up being bad for parallelism, if later it turns out that > some other worker has no table to process. (Table t2 in your example > could grown between the time the command is sent and t1 is vacuumed.) > > It's simpler to have workers do one thing at a time only. > > I don't think it's a very good idea to call pg_relation_size() on every > table in the database from vacuumdb. > Curious: would it be both feasible and useful to have multiple workers process a 'large' table, without complicating things too much? The could each start at a different position in the file. Cheers, Gavin
Gavin Flower wrote: > Curious: would it be both feasible and useful to have multiple > workers process a 'large' table, without complicating things too > much? The could each start at a different position in the file. Feasible: no. Useful: maybe, we don't really know. (You could just as well have a worker at double the speed, i.e. double vacuum_cost_limit). -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 9/26/14, 2:38 PM, Gavin Flower wrote: > Curious: would it be both feasible and useful to have multiple workers > process a 'large' table, without complicating things too much? The > could each start at a different position in the file. Not really feasible without a major overhaul. It might be mildly useful in one rare case. Occasionally I'll find very hot single tables that vacuum is constantly processing, despite mostly living in RAM because the server has a lot of memory. You can set vacuum_cost_page_hit=0 in order to get vacuum to chug through such a table as fast as possible. However, the speed at which that happens will often then be limited by how fast a single core can read from memory, for things in shared_buffers. That is limited by the speed of memory transfers from a single NUMA memory bank. Which bank you get will vary depending on the core that owns that part of shared_buffers' memory, but it's only one at a time. On large servers, that can be only a small fraction of the total memory bandwidth the server is able to reach. I've attached a graph showing how this works on a system with many NUMA banks of RAM, and this is only a medium sized system. This server can hit 40GB/s of memory transfers in total; no one process will ever see more than 8GB/s. If we had more vacuum processes running against the same table, there would then be more situations where they were doing work against different NUMA memory banks at the same time, therefore making faster progress through the hits in shared_buffers possible. In the real world, this situation is rare enough compared to disk-bound vacuum work that I doubt it's worth getting excited over. Systems with lots of RAM where performance is regularly dominated by one big ugly table are common though, so I wouldn't just rule the idea out as not useful either. -- Greg Smith greg.smith@crunchydatasolutions.com Chief PostgreSQL Evangelist - http://crunchydatasolutions.com/
Attachment
Gavin Flower wrote:
> Curious: would it be both feasible and useful to have multiple
> workers process a 'large' table, without complicating things too
> much? The could each start at a different position in the file.
Feasible: no. Useful: maybe, we don't really know. (You could just as
well have a worker at double the speed, i.e. double vacuum_cost_limit).
On 27/09/14 11:36, Gregory Smith wrote: > On 9/26/14, 2:38 PM, Gavin Flower wrote: >> Curious: would it be both feasible and useful to have multiple >> workers process a 'large' table, without complicating things too >> much? The could each start at a different position in the file. > > Not really feasible without a major overhaul. It might be mildly > useful in one rare case. Occasionally I'll find very hot single > tables that vacuum is constantly processing, despite mostly living in > RAM because the server has a lot of memory. You can set > vacuum_cost_page_hit=0 in order to get vacuum to chug through such a > table as fast as possible. > > However, the speed at which that happens will often then be limited by > how fast a single core can read from memory, for things in > shared_buffers. That is limited by the speed of memory transfers from > a single NUMA memory bank. Which bank you get will vary depending on > the core that owns that part of shared_buffers' memory, but it's only > one at a time. > > On large servers, that can be only a small fraction of the total > memory bandwidth the server is able to reach. I've attached a graph > showing how this works on a system with many NUMA banks of RAM, and > this is only a medium sized system. This server can hit 40GB/s of > memory transfers in total; no one process will ever see more than 8GB/s. > > If we had more vacuum processes running against the same table, there > would then be more situations where they were doing work against > different NUMA memory banks at the same time, therefore making faster > progress through the hits in shared_buffers possible. In the real > world, this situation is rare enough compared to disk-bound vacuum > work that I doubt it's worth getting excited over. Systems with lots > of RAM where performance is regularly dominated by one big ugly table > are common though, so I wouldn't just rule the idea out as not useful > either. > Thanks for the very detailed reply of yours, and the comments from others. Cheers, Gavin
On 26 September 2014 01:24, Jeff Janes Wrote,
>I think you have an off-by-one error in the index into the array of file handles.
>Actually the problem is that the socket for the master connection was not getting initialized, see my one line addition here.
> connSlot = (ParallelSlot*)pg_malloc(concurrentCons * sizeof(ParallelSlot));
> connSlot[0].connection = conn;
>+ connSlot[0].sock = PQsocket(conn);
Thanks for the review, I have fixed this.
>However, I don't think it is good to just ignore errors from the select call (like the EBADF) and go into a busy loop instead, so there are more changes needed than this.
Actually this select_loop function I have implemented same as other client application are handling, i.e pg_dum in parallel.c, however parallel.c is handling the case if process is in abort (if Ctrl+c is recieved),
And we need to handle the same, so I have fixed this in attached patch.
>Also, cancelling the run (by hitting ctrl-C in the shell that invoked it) does not seem to work on linux. I get a message that says "Cancel request sent", but then it continues to finish the job anyway.
Apart from above mentioned reason, GetQueryResult was also not setting “SetCancelConn” as Amit has pointed, now this is also fixed.
Regards,
Dilip Kumar
Attachment
<div class="WordSection1"><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">On</span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">26September 2014 12:24, Amit Kapila Wrote,</span><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"></span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>Idon't think this can handle cancel requests properly because</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>you are just settingit in GetIdleSlot() what if the cancel</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>requestcame during GetQueryResult() after sending sql for</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>all connections (probablythats the reason why Jeff is not able</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>tocancel the vacuumdb when using parallel option).</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif"">You are right, I have fixed, it in latestpatch, please check latest patch @ (</span><span style="font-size:8.5pt;font-family:"Verdana","sans-serif";color:black"><a href="http://www.postgresql.org/message-id/4205E661176A124FAF891E0A6BA9135266363710@szxeml509-mbs.china.huawei.com">4205E661176A124FAF891E0A6BA9135266363710@szxeml509-mbs.china.huawei.com</a></span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">)</span><pclass="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span></i><pclass="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">dilip@linux-ltr9:/home/dilip/9.4/install/bin>./vacuumdb -z-a -j 8 -p 9005</span></i><p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">vacuumdb:vacuuming database "db1"</span></i><p class="MsoNormal"><i><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif"">vacuumdb: vacuuming database "postgres"</span></i><pclass="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Cancel requestsent</span></i><p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:red">vacuumdb:vacuuming of database "postgres" failed: ERROR: canceling statement due to user request</span></i><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:red"></span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>Fewother points</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>1.</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>+vacuum_parallel(const char *dbname, bool full, bool verbose,</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>{</span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>..</span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>+ connSlot = (ParallelSlot*)pg_malloc(concurrentCons* sizeof(ParallelSlot));</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>+ connSlot[0].connection = conn;</span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Fixed</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>a.</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>Doesabove memory gets freed anywhere, if not isn't it</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>good idea to do the same</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>b. For slot 0, you arenot seeting it as PQsetnonblocking,</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>whereas I think it can be used to run commands like any other</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>connection.</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Yes,this was missing in the code, I have fixed it..</span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>2.</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>+ /*</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>+ * If user has given the vacuum of complete db, thenif</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>+ * any ofthe object vacuum </span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>failedit can be ignored and vacuuming</span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>+ * of other object can be continued,this is the same </span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>behavioras</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>+ * vacuuming of complete db is handled without --jobsoption</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>+ */</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">> </span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>s/object/object's</span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">FIXED</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>3.</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>+ if(!completedb ||</span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>+ (sqlState&& strcmp(sqlState, </span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>ERRCODE_UNDEFINED_TABLE)!= 0))</span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>+ {</span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>+ </span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>+ fprintf(stderr,_("%s: vacuuming of </span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>database\"%s\" failed: %s"),</span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>+ progname, dbname, PQerrorMessage</span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>(conn));</span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif"">> </span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>Indentation on both places is wrong. Check other palces for</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>similarissues.</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">FIXED</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>4.</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>+ boolanalyze_only, bool freeze, int numAsyncCons,</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">> </span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>Incode still there is reference to AsyncCons, as decided lets</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">>change it to concurrent_connections| conc_cons</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">FIXED</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Regards,</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Dilip</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span></div>
On 27 September 2014 03:55, Jeff Janes <jeff.janes@gmail.com> wrote: > On Fri, Sep 26, 2014 at 11:47 AM, Alvaro Herrera <alvherre@2ndquadrant.com> > wrote: >> >> Gavin Flower wrote: >> >> > Curious: would it be both feasible and useful to have multiple >> > workers process a 'large' table, without complicating things too >> > much? The could each start at a different position in the file. >> >> Feasible: no. Useful: maybe, we don't really know. (You could just as >> well have a worker at double the speed, i.e. double vacuum_cost_limit). > > > Vacuum_cost_delay is already 0 by default. So unless you changed that, > vacuum_cost_limit does not take effect under vacuumdb. > > It is pretty easy for vacuum to be CPU limited, and even easier for analyze > to be CPU limited (It does a lot of sorting). I think analyzing is the main > use case for this patch, to shorten the pg_upgrade window. At least, that > is how I anticipate using it. I've been trying to review this thread with the thought "what does this give me?". I am keen to encourage contributions and also keen to extend our feature set, but I do not wish to complicate our code base. Dilip's developments do seem to be good quality; what I question is whether we want this feature. This patch seems to allow me to run multiple VACUUMs at once. But I can already do this, with autovacuum. Is there anything this patch can do that cannot be already done with autovacuum? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
>
>
> I've been trying to review this thread with the thought "what does
> this give me?". I am keen to encourage contributions and also keen to
> extend our feature set, but I do not wish to complicate our code base.
> Dilip's developments do seem to be good quality; what I question is
> whether we want this feature.
>
> This patch seems to allow me to run multiple VACUUMs at once. But I
> can already do this, with autovacuum.
>
> Is there anything this patch can do that cannot be already done with autovacuum?
The difference lies in the fact that vacuumdb (or VACUUM) gives
On 16 October 2014 06:05, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, Oct 16, 2014 at 8:08 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> >> >> I've been trying to review this thread with the thought "what does >> this give me?". I am keen to encourage contributions and also keen to >> extend our feature set, but I do not wish to complicate our code base. >> Dilip's developments do seem to be good quality; what I question is >> whether we want this feature. >> >> This patch seems to allow me to run multiple VACUUMs at once. But I >> can already do this, with autovacuum. >> >> Is there anything this patch can do that cannot be already done with >> autovacuum? > > The difference lies in the fact that vacuumdb (or VACUUM) gives > the option to user to control the vacuum activity for cases when > autovacuum doesn't suffice the need, one of the example is to perform > vacuum via vacuumdb after pg_upgrade or some other maintenance > activity (as mentioned by Jeff upthread). So I think in all such cases > having parallel option can give benefit in terms of performance which > is already shown by Dilip upthread by running some tests (with and > without patch). Why do we need 2 ways to do the same thing? Why not ask autovacuum to do this for you? Just send a message to autovacuum to request an immediate action. Let it manage the children and the tasks. SELECT pg_autovacuum_immediate(nworkers = N, list_of_tables); Request would allocate an additional N workers and immediately begin vacuuming the stated tables. vacuumdb can still issue the request, but the guts of this are done by the server, not a heavily modified client. If we are going to heavily modify a client then it needs to be able to run more than just one thing. Parallel psql would be nice. pg_batch? -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
> On 16 October 2014 06:05, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Thu, Oct 16, 2014 at 8:08 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >>
> >> This patch seems to allow me to run multiple VACUUMs at once. But I
> >> can already do this, with autovacuum.
> >>
> >> Is there anything this patch can do that cannot be already done with
> >> autovacuum?
> >
> > The difference lies in the fact that vacuumdb (or VACUUM) gives
> > the option to user to control the vacuum activity for cases when
> > autovacuum doesn't suffice the need, one of the example is to perform
> > vacuum via vacuumdb after pg_upgrade or some other maintenance
> > activity (as mentioned by Jeff upthread). So I think in all such cases
> > having parallel option can give benefit in terms of performance which
> > is already shown by Dilip upthread by running some tests (with and
> > without patch).
>
> Why do we need 2 ways to do the same thing?
> Why not ask autovacuum to do this for you?
>
> Just send a message to autovacuum to request an immediate action. Let
> it manage the children and the tasks.
>
> SELECT pg_autovacuum_immediate(nworkers = N, list_of_tables);
>
> Request would allocate an additional N workers and immediately begin
> vacuuming the stated tables.
I think doing anything on the server side can have higher complexity like:
> vacuumdb can still issue the request, but the guts of this are done by
> the server, not a heavily modified client.
>
> If we are going to heavily modify a client then it needs to be able to
> run more than just one thing. Parallel psql would be nice. pg_batch?
Could you be more specific in this point, I am not able to see how
On 16 October 2014 15:09, Amit Kapila <amit.kapila16@gmail.com> wrote: >> Just send a message to autovacuum to request an immediate action. Let >> it manage the children and the tasks. >> >> SELECT pg_autovacuum_immediate(nworkers = N, list_of_tables); >> >> Request would allocate an additional N workers and immediately begin >> vacuuming the stated tables. > > I think doing anything on the server side can have higher complexity like: > a. Does this function return immediately after sending request to > autovacuum, if yes then the behaviour of this new functionality > will be different as compare to vacuumdb which user might not > expect? > b. We need to have some way to handle errors that can occur in > autovacuum (may be need to find a way to pass back to user), > vacuumdb or Vacuum can report error to user. > c. How does nworkers input relates to autovacuum_max_workers > which is needed at start for shared memory initialization and in calc > of maxbackends. > d. How to handle database wide vacuum which is possible via vacuumdb > e. What is the best UI (a command like above or via config parameters)? c) seems like the only issue that needs any thought. I don't think its going to be that hard. I don't see any problems with the other points. You can make a function wait, if you wish. > I think we can find a way for above and may be if any other similar things > needs to be taken care, but still I think it is better that we have this > feature > via vacuumdb rather than adding complexity in server code. Also the > current algorithm used in patch is discussed and agreed upon in this > thread and if now we want to go via some other method (auto vacuum), > it might take much more time to build consensus on all the points. Well, I read Alvaro's point from earlier in the thread and agreed with it. All we really need is an instruction to autovacuum to say "be aggressive". Just because somebody added something to the TODO list doesn't make it a good idea. I apologise to Dilip for saying this, it is not anything against him, just the idea. Perhaps we just accept the patch and change AV in the future. >> vacuumdb can still issue the request, but the guts of this are done by >> the server, not a heavily modified client. >> >> If we are going to heavily modify a client then it needs to be able to >> run more than just one thing. Parallel psql would be nice. pg_batch? > > Could you be more specific in this point, I am not able to see how > vacuumdb utility has anything to do with parallel psql? That's my point. All this code in vacuumdb just for this one isolated use case? Twice the maintenance burden. A more generic utility to run commands in parallel would be useful. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
>
> On 16 October 2014 15:09, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > I think doing anything on the server side can have higher complexity like:
> > a. Does this function return immediately after sending request to
> > autovacuum, if yes then the behaviour of this new functionality
> > will be different as compare to vacuumdb which user might not
> > expect?
> > b. We need to have some way to handle errors that can occur in
> > autovacuum (may be need to find a way to pass back to user),
> > vacuumdb or Vacuum can report error to user.
> > c. How does nworkers input relates to autovacuum_max_workers
> > which is needed at start for shared memory initialization and in calc
> > of maxbackends.
> > d. How to handle database wide vacuum which is possible via vacuumdb
> > e. What is the best UI (a command like above or via config parameters)?
>
>
> c) seems like the only issue that needs any thought. I don't think its
> going to be that hard.
>
> I don't see any problems with the other points. You can make a
> function wait, if you wish.
It is quite possible, but still I think to accomplish such a function,
> > I think we can find a way for above and may be if any other similar things
> > needs to be taken care, but still I think it is better that we have this
> > feature
> > via vacuumdb rather than adding complexity in server code. Also the
> > current algorithm used in patch is discussed and agreed upon in this
> > thread and if now we want to go via some other method (auto vacuum),
> > it might take much more time to build consensus on all the points.
>
> Well, I read Alvaro's point from earlier in the thread and agreed with
> it. All we really need is an instruction to autovacuum to say "be
> aggressive".
>
> Just because somebody added something to the TODO list doesn't make it
> a good idea. I apologise to Dilip for saying this, it is not anything
> against him, just the idea.
>
> Perhaps we just accept the patch and change AV in the future.
On 17 October 2014 12:52, Amit Kapila <amit.kapila16@gmail.com> wrote: > It is quite possible, but still I think to accomplish such a function, > we need to have some mechanism where it can inform auto vacuum > and then some changes in auto vacuum to receive/read that information > and reply back. I don't think any such mechanism exists. That's exactly how the CHECKPOINT command works, in conjunction with the checkpointer process. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Amit Kapila wrote: > On Fri, Oct 17, 2014 at 1:31 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > > > On 16 October 2014 15:09, Amit Kapila <amit.kapila16@gmail.com> wrote: > > c) seems like the only issue that needs any thought. I don't think its > > going to be that hard. > > > > I don't see any problems with the other points. You can make a > > function wait, if you wish. > > It is quite possible, but still I think to accomplish such a function, > we need to have some mechanism where it can inform auto vacuum > and then some changes in auto vacuum to receive/read that information > and reply back. I don't think any such mechanism exists. You're right, it doesn't. I think we have plenty more infrastructure for that than we had when autovacuum was initially developed. It shouldn't be that hard. Of course, this is a task that requires much more thinking, design, and discussion than just adding multi-process capability to vacuumdb ... -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 17 October 2014 14:05, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Of course, this is a task that requires much more thinking, design, and > discussion than just adding multi-process capability to vacuumdb ... Yes, please proceed with this patch as originally envisaged. No more comments from me. -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
>
> On 26 September 2014 12:24, Amit Kapila Wrote,
>
> >I don't think this can handle cancel requests properly because
>
> >you are just setting it in GetIdleSlot() what if the cancel
>
> >request came during GetQueryResult() after sending sql for
>
> >all connections (probably thats the reason why Jeff is not able
>
> >to cancel the vacuumdb when using parallel option).
>
>
>
> You are right, I have fixed, it in latest patch, please check latest patch @ (4205E661176A124FAF891E0A6BA9135266363710@szxeml509-mbs.china.huawei.com)
>
*** 358,363 **** handle_sigint(SIGNAL_ARGS)
--- 358,364 ----
/* Send QueryCancel if we are processing a database query */
if (cancelConn != NULL)
{
+ inAbort = true;
if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
fprintf(stderr, _("Cancel request sent\n"));
else
>
>
> ***************
> *** 358,363 **** handle_sigint(SIGNAL_ARGS)
> --- 358,364 ----
>
> /* Send QueryCancel if we are processing a database query */
> if (cancelConn != NULL)
> {
> + inAbort = true;
> if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
> fprintf(stderr, _("Cancel request sent\n"));
> else
>
> Do we need to set inAbort flag incase PQcancel is successful?
> Basically if PQCancel fails due to any reason, I think behaviour
> can be undefined as the executing thread can assume that cancel is
> done.
>
> *** 391,396 **** consoleHandler(DWORD dwCtrlType)
> --- 392,399 ----
> EnterCriticalSection
> (&cancelConnLock);
> if (cancelConn != NULL)
> {
> + inAbort =
> true;
> +
>
> You have set this flag in case of windows handler, however the same
> is never used incase of windows, are you expecting any use of this
> flag for windows?
RAM = 64GB
max_connections = 128
checkpoint_segments=256
checkpoint_timeout =15min
Attachment
<div class="WordSection1"><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">On</span><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"></span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">25October 2014 17:52, Amit Kapila Wrote,</span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">>***************</span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">>*** 358,363 **** handle_sigint(SIGNAL_ARGS)</span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">>---358,364 ----</span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> </span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> /* Send QueryCancelif we are processing a database query */</span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> if (cancelConn != NULL)</span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> {</span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">>+ inAbort = true;</span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> if(PQcancel(cancelConn, errbuf, sizeof(errbuf)))</span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> fprintf(stderr, _("Cancel request sent\n"));</span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> else</span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> </span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">>Dowe need to set inAbort flag incase PQcancelis successful?</span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">>Basicallyif PQCancel fails due to any reason,I think behaviour</span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">>canbe undefined as the executing thread canassume that cancel is</span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">>done.</span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> </span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">>***391,396 **** consoleHandler(DWORD dwCtrlType)</span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">>---392,399 ----</span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> EnterCriticalSection</span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">>(&cancelConnLock);</span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> if (cancelConn != NULL)</span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">> {</span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">>+ inAbort = </span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">>true;</span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">>+</span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black"> </span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">In“<b>handle_sigint</b>” function if we are goingto cancel the query that time I am setting the flag <b>inAbort </b>(even when it is success), so that in “<b>select_loop</b>”function </span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">If<b>select(maxFd + 1, workerset, NULL, NULL, &tv);</b>come out, we can know whether it came out because of cancel query and handle it accordingly. </span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black"> i = select(maxFd+ 1, workerset, NULL, NULL, NULL);</span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black"> </span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black"> if (in_abort()) <b>//loopbreak because of cancel query, so return fail…</b></span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black"> {</span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black"> return-1;</span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black"> }</span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black"> </span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black"> if (i < 0 &&errno == EINTR)</span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black"> continue;</span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Regards,</span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">DilipKumar</span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span></div>
>
> On 25 October 2014 17:52, Amit Kapila Wrote,
>
> >***************
>
> >*** 358,363 **** handle_sigint(SIGNAL_ARGS)
>
> >--- 358,364 ----
>
> >
>
> > /* Send QueryCancel if we are processing a database query */
>
> > if (cancelConn != NULL)
>
> > {
>
> >+ inAbort = true;
>
> > if (PQcancel(cancelConn, errbuf, sizeof(errbuf)))
>
> > fprintf(stderr, _("Cancel request sent\n"));
>
> > else
>
> >
>
> >Do we need to set inAbort flag incase PQcancel is successful?
>
> >Basically if PQCancel fails due to any reason, I think behaviour
>
> >can be undefined as the executing thread can assume that cancel is
>
> >done.
>
> >
>
> >*** 391,396 **** consoleHandler(DWORD dwCtrlType)
>
> >--- 392,399 ----
>
> > EnterCriticalSection
>
> >(&cancelConnLock);
>
> > if (cancelConn != NULL)
>
> > {
>
> >+ inAbort =
>
> >true;
>
> >+
>
>
>
> In “handle_sigint” function if we are going to cancel the query that time I am setting the flag inAbort (even when it is success), so that in “select_loop” function
>
> If select(maxFd + 1, workerset, NULL, NULL, &tv); come out, we can know whether it came out because of cancel query and handle it accordingly.
>
Yeah, it is fine for the case when PQCancel() is successful, what
<div class="WordSection1"><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">On</span><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"></span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">28October 2014 09:18, Amit Kapila Wrote,</span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">>Iam worried about the case if after setting theinAbort flag,</span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">>PQCancel()fails (returns error).</span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">> </span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">>> If select(maxFd +1, workerset, NULL, NULL, &tv); come out, we can know whether it came out because of cancel query and handle it accordingly.</span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">>> </span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">> </span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">>Yeah,it is fine for the case when PQCancel()is successful, what</span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">>ifit fails?</span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">>I think even if selectcomes out due to any other reason, it will behave</span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">>asif it came out due to Cancel, even though actuallyCancel is failed,</span><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">>howare planning to handle that case?</span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black"> </span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">Ithink If PQcancel fails then also there is no problem,because we are setting inAbort flag in handle_sigint handler, it means user have tried to terminate.</span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black"> </span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">So in this case as well wewill find that <b>inAbort </b>is set so we return error, and in error case we finally call <b>DisconnectDatabase</b>, andin this function we will send the <b>PQcancel</b> for all active connection and then only we disconnect.</span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black"> </span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">Regards,</span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:black">DIlip</span></div>
> On 28 October 2014 09:18, Amit Kapila Wrote,
>
> >I am worried about the case if after setting the inAbort flag,
>
> >PQCancel() fails (returns error).
>
> >
>
> >> If select(maxFd + 1, workerset, NULL, NULL, &tv); come out, we can know whether it came out because of cancel query and handle it accordingly.
>
> >>
>
> >
>
> >Yeah, it is fine for the case when PQCancel() is successful, what
>
> >if it fails?
>
> >I think even if select comes out due to any other reason, it will behave
>
> >as if it came out due to Cancel, even though actually Cancel is failed,
>
> >how are planning to handle that case?
>
>
>
> I think If PQcancel fails then also there is no problem, because we are setting inAbort flag in handle_sigint handler, it means user have tried to terminate.
>
message: "Could not send cancel request" in such a case and still
{
..
for (cell = tables->head; cell; cell = cell->next)
{
/*
* This will give the free connection slot, if no slot is free it will
* wait for atleast one slot to get free.
*/
free_slot = GetIdleSlot(connSlot, max_slot, dbname, progname,
completedb);
if (free_slot == NO_SLOT)
{
error = true;
goto fail;
}
prepare_command(connSlot[free_slot].connection, full, verbose,
and_analyze, analyze_only, freeze, &sql);
appendPQExpBuffer(&sql, " %s", cell->val);
connSlot[free_slot].isFree = false;
slotconn = connSlot[free_slot].connection;
PQsendQuery(slotconn, sql.data);
resetPQExpBuffer(&sql);
}
..
}
>
> Going further with verification of this patch, I found below issue:
> Run the testcase.sql file at below link:
> http://www.postgresql.org/message-id/4205E661176A124FAF891E0A6BA9135266347F25@szxeml509-mbs.china.huawei.com
> ./vacuumdb --analyze-in-stages -j 8 -d postgres
> Generating minimal optimizer statistics (1 target)
> Segmentation fault
>
> Server Log:
> ERROR: syntax error at or near "minimal" at character 12
> STATEMENT: ANALYZE ng minimal optimizer statistics (1 target)
> LOG: could not receive data from client: Connection reset by peer
>
As mentioned by you offlist that you are not able reproduce this
<div class="WordSection1"><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">On</span><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"></span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">13November 2014 15:35 Amit Kapila Wrote, </span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span><p class="MsoNormal">>As mentionedby you offlist that you are not able reproduce this<p class="MsoNormal">>issue, I have tried again and what Iobserve is that I am able to<p class="MsoNormal">>reproduce it only on *release* build and some cases work without<pclass="MsoNormal">>this issue as well,<p class="MsoNormal">>example:<p class="MsoNormal">>./vacuumdb --analyze-in-stages-t t1 -t t2 -t t3 -t t4 -t t5 -t t6 -t t7 -t t8 -j 8 -d postgres<p class="MsoNormal">>Generating minimaloptimizer statistics (1 target)<p class="MsoNormal">>Generating medium optimizer statistics (10 targets)<p class="MsoNormal">>Generatingdefault (full) optimizer statistics<p class="MsoNormal"><br clear="all" /><p class="MsoNormal">>Soto me, it looks like this is a timing issue and please notice<p class="MsoNormal">>why in errorthe statement looks like<p class="MsoNormal">>"ANALYZE ng minimal optimizer statistics (1 target)". I think this<pclass="MsoNormal" style="margin-bottom:12.0pt">>is not a valid statement.<p class="MsoNormal">>Let me know ifyou still could not reproduce it.<p class="MsoNormal"> <p class="MsoNormal">Thank you for looking into it once again..<pclass="MsoNormal"> <p class="MsoNormal">I have tried with the release mode, but could not reproduce the same. Bylooking at server LOG sent by you “"ANALYZE ng minimal optimizer statistics (1 target)". ”, seems like some corruption.<pclass="MsoNormal"> <p class="MsoNormal">So actually looks like two issues here.<p class="MsoListParagraph" style="text-indent:-18.0pt;mso-list:l0level1 lfo1"><span style="mso-list:Ignore">1.<span style="font:7.0pt "Times New Roman""> </span></span>Query string sent to server is getting corrupted.<p class="MsoListParagraph" style="text-indent:-18.0pt;mso-list:l0level1 lfo1"><span style="mso-list:Ignore">2.<span style="font:7.0pt "Times New Roman""> </span></span>Client is getting crashed.<p class="MsoNormal"> <p class="MsoNormal">I will review the code andtry to find the same, meanwhile if you can find some time to debug this, it will be really helpful.<p class="MsoNormal"> <pclass="MsoNormal"> <p class="MsoNormal">Regards,<p class="MsoNormal">Dilip</div>
>
> On 13 November 2014 15:35 Amit Kapila Wrote,
> >As mentioned by you offlist that you are not able reproduce this
>
> >issue, I have tried again and what I observe is that I am able to
>
> >reproduce it only on *release* build and some cases work without
>
> >this issue as well,
>
> >example:
>
> >./vacuumdb --analyze-in-stages -t t1 -t t2 -t t3 -t t4 -t t5 -t t6 -t t7 -t t8 -j 8 -d postgres
>
> >Generating minimal optimizer statistics (1 target)
>
> >Generating medium optimizer statistics (10 targets)
>
> >Generating default (full) optimizer statistics
>
>
> >So to me, it looks like this is a timing issue and please notice
>
> >why in error the statement looks like
>
> >"ANALYZE ng minimal optimizer statistics (1 target)". I think this
>
> >is not a valid statement.
>
> >Let me know if you still could not reproduce it.
>
>
> I will review the code and try to find the same, meanwhile if you can find some time to debug this, it will be really helpful.
>
I think I have found the problem and fix for the same.
{
..
if (!tables || !tables->head)
{
SimpleStringList dbtables = {NULL, NULL};
...
..
tables = &dbtables;
}
..
}
and_analyze, analyze_only, freeze, &sql);
appendPQExpBuffer(&sql, " %s", cell->val);
..
Attachment
On 23 November 2014 14:45, Amit Kapila Wrote
Thanks a lot for debugging and fixing the issue..
>The stacktrace of crash is as below:
>#0 0x00000080108cf3a4 in .strlen () from /lib64/libc.so.6
>#1 0x00000080108925bc in ._IO_vfprintf () from /lib64/libc.so.6
>#2 0x00000080108bc1e0 in .__GL__IO_vsnprintf_vsnprintf () from /lib64/libc.so.6
>#3 0x00000fff7e581590 in .appendPQExpBufferVA () from
>/data/akapila/workspace/master/installation/lib/libpq.so.5
>#4 0x00000fff7e581774 in .appendPQExpBuffer () from
>/data/akapila/workspace/master/installation/lib/libpq.so.5
>#5 0x0000000010003748 in .run_parallel_vacuum ()
>#6 0x0000000010003f60 in .vacuum_parallel ()
>#7 0x0000000010002ae4 in .main ()
>(gdb) f 5
>#5 0x0000000010003748 in .run_parallel_vacuum ()
>So now the real reason here is that the list of tables passed to
>function is corrupted. The below code seems to be the real
>culprit:
>
>vacuum_parallel()
>{
>..
>if (!tables || !tables->head)
>{
>SimpleStringList dbtables = {NULL, NULL};
>...
>..
> tables = &dbtables;
>}
>..
>}
>In above code dbtables is local to if loop and code
>is using the address of same to assign it to tables which
>is used out of if block scope, moving declaration to the
>outer scope fixes the problem in my environment. Find the
>updated patch that fixes this problem attached with this
>mail. Let me know your opinion about the same.
Yes, that’s the reason of corruption, this must be causing both the issues, sending corrupted query to server as well as crash at client side.
>While looking at this problem, I have noticed couple of other
>improvements:
>a. In prepare_command() function, patch is doing init of sql
>buffer (initPQExpBuffer(sql);) which I think is not required
>as both places from where this function is called, it is done by
>caller. I think this will lead to memory leak.
Fixed..
>b. In prepare_command() function, for fixed strings you can
>use appendPQExpBufferStr() which is what used in original code
>as well.
Changed as per comment..
>c.
>run_parallel_vacuum()
>{
>..
>prepare_command(connSlot[free_slot].connection, full, verbose,
>and_analyze, analyze_only, freeze, &sql);
>
>appendPQExpBuffer(&sql, " %s", cell->val);
>..
>}
>I think it is better to end command with ';' by using
>appendPQExpBufferStr(&sql, ";"); in above code.
Done
Latest patch is attached, please have a look.
Regards,
Dilip Kumar
Attachment
>
> On 23 November 2014 14:45, Amit Kapila Wrote
>
>
>
> Thanks a lot for debugging and fixing the issue..
>
>
>
> Latest patch is attached, please have a look.
>
I think still some of the comments given upthread are not handled:
On 24 November 2014 11:29, Amit Kapila Wrote,
>I think still some of the comments given upthread are not handled:
>
>a. About cancel handling
Your Actual comment was à
>One other related point is that I think still cancel handling mechanism
>is not completely right, code is doing that when there are not enough
>number of freeslots, but otherwise it won't handle the cancel request,
>basically I am referring to below part of code:
run_parallel_vacuum()
{
..
for (cell = tables->head; cell; cell = cell->next)
{
..
free_slot = GetIdleSlot(connSlot, max_slot, dbname, progname,
completedb);
…
PQsendQuery(slotconn, sql.data);
resetPQExpBuffer(&sql);
}
1. I think only if connection is going for Slot wait, it will be in blocking, or while GetQueryResult, so we have Handle SetCancleRequest both places.
2. Now a case (as you mentioned), when there are enough slots, and and above for loop is running if user do Ctrl+C then this will not break, This I have handled by checking inAbort
Mode inside the for loop before sending the new command, I think this we cannot do by setting the SetCancel because only when query receive some data it will realize that it canceled and it will fail, but until connection is not going to receive data it will not see the failure. So I have handled inAbort directly.
>b. Setting of inAbort flag for case where PQCancel is successful
Your Actual comment was à
>Yeah, user has tried to terminate, however utility will emit the
>message: "Could not send cancel request" in such a case and still
>silently tries to cancel and disconnect all connections.
You are right, I have fixed the code, now in case of failure we need not to set inAbort Flag..
>c. Performance data of --analyze-in-stages switch
Performance Data
------------------------------
CPU 8 cores
RAM = 16GB
checkpoint_segments=256
Before each test, run the test.sql (attached)
Un-patched -
dilip@linux-ltr9:/home/dilip/9.4/install/bin> time ./vacuumdb -p 9005 --analyze-in-stages -d postgres
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
real 0m0.843s
user 0m0.000s
sys 0m0.000s
Patched
dilip@linux-ltr9:/home/dilip/9.4/install/bin> time ./vacuumdb -p 9005 --analyze-in-stages -j 2 -d postgres
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
real 0m0.593s
user 0m0.004s
sys 0m0.004s
dilip@linux-ltr9:/home/dilip/9.4/install/bin> time ./vacuumdb -p 9005 --analyze-in-stages -j 4 -d postgres
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
real 0m0.421s
user 0m0.004s
sys 0m0.004s
I think in 2 connections we can get 30% improvement.
>d. Have one pass over the comments in patch. I could still some
>wrong multiline comments. Refer below:
>+ /* Otherwise, we got a stage from vacuum_all_databases(), so run
>+ * only that one. */
Checked all, and fixed..
While testing, I found one more different behavior compare to base code,
Base Code:
dilip@linux-ltr9:/home/dilip/9.4/install/bin> time ./vacuumdb -p 9005 -t "t1" -t "t2" -t "t3" -t "t4" --analyze-in-stages -d Postgres
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
real 0m0.605s
user 0m0.004s
sys 0m0.000s
I think it should be like,
SET default_statistics_target=1; do for all three tables
SET default_statistics_target=10; do for all three tables so on..
With Patched
dilip@linux-ltr9:/home/dilip/9.4/install/bin> time ./vacuumdb -p 9005 -t "t1" -t "t2" -t "t3" -t "t4" --analyze-in-stages -j 2 -d postgres
Generating minimal optimizer statistics (1 target)
Generating medium optimizer statistics (10 targets)
Generating default (full) optimizer statistics
real 0m0.395s
user 0m0.000s
sys 0m0.004s
here we are setting each target once and doing for all the tables..
Please provide you opinion.
Regards,
Dilip Kumar
Attachment
On Mon, Dec 1, 2014 at 12:18 PM, Dilip kumar <dilip.kumar@huawei.com> wrote:
>
> On 24 November 2014 11:29, Amit Kapila Wrote,
>
> here we are setting each target once and doing for all the tables..
>
>
> Please provide you opinion.
On Sat, Dec 6, 2014 at 9:01 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > If you agree, then we should try to avoid this change in new behaviour. Still seeing many concerns about this patch, so marking it as returned with feedback. If possible, switching it to the next CF would be fine I guess as this work is still being continued. -- Michael
On 06 December 2014 20:01 Amit Kapila Wrote
>I wanted to understand what exactly the above loop is doing.
>a.
>first of all the comment on top of it says "Some of the slot
>are free, ...", if some slot is free, then why do you want
>to process the results? (Do you mean to say that *None* of
>the slot is free....?)
This comment is wrong, I will remove this.
>b.
>IIUC, you have called function select_loop(maxFd, &slotset)
>to check if socket descriptor is readable, if yes then why
>in do..while loop the same maxFd is checked always, don't
>you want to check different socket descriptors? I am not sure
>if I am missing something here
select_loop(maxFd, &slotset)
maxFd is the max descriptor among all SETS, and slotset contains all the descriptor, so if any of the descriptor get some message select_loop will come out, and once select loop come out,
we need to check how many descriptor have got the message from server so we loop and process the results.
So it’s not only for a maxFd, it’s for all the descriptors. And it’s in do..while loop, because it possible that select_loop come out because of some intermediate message on any of the socket but still query is not complete,
and if none of the socket is still free (that we check in below for loop), then go to select_loop again.
>c.
>After checking the socket descriptor for maxFd why you want
>to run run the below for loop for all slots?
>for (i = 0; i < max_slot; i++)
After Select loop is out, it’s possible that we might have got result on multiple connections, so consume input and check if still busy, then nothing to do, but if finished process the result and mark the connection free.
And if any of the connection is free, then we will break the do..while loop.
From: Amit Kapila [mailto:amit.kapila16@gmail.com]
Sent: 06 December 2014 20:01
To: Dilip kumar
Cc: Magnus Hagander; Alvaro Herrera; Jan Lentfer; Tom Lane; PostgreSQL-development; Sawada Masahiko; Euler Taveira
Subject: Re: [HACKERS] TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
On Mon, Dec 1, 2014 at 12:18 PM, Dilip kumar <dilip.kumar@huawei.com> wrote:
>
> On 24 November 2014 11:29, Amit Kapila Wrote,
>
I have verified that all previous comments are addressed and
the new version is much better than previous version.
>
> here we are setting each target once and doing for all the tables..
>
Hmm, theoretically I think new behaviour could lead to more I/O in
certain cases as compare to existing behaviour. The reason for more I/O
is that in the new behaviour, while doing Analyze for a particular table at
different targets, in-between it has Analyze of different table as well,
so the pages in shared buffers or OS cache for a particular table needs to
be reloded again for a new target whereas currently it will do all stages
of Analyze for a particular table in one-go which means that each stage
of Analyze could get benefit from the pages of a table loaded by previous
stage. If you agree, then we should try to avoid this change in new
behaviour.
>
> Please provide you opinion.
I have few questions regarding function GetIdleSlot()
+ static int
+ GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname,
+ const
char *progname, bool completedb)
{
..
+ /*
+ * Some of the slot are free, Process the results for slots whichever
+ * are free
+ */
+
+ do
+ {
+ SetCancelConn(pSlot[0].connection);
+
+ i = select_loop(maxFd,
&slotset);
+
+ ResetCancelConn();
+
+ if (i < 0)
+ {
+ /*
+
* This can only happen if user has sent the cancel request using
+ *
Ctrl+C, Cancel is handled by 0th slot, so fetch the error result.
+ */
+
+
GetQueryResult(pSlot[0].connection, dbname, progname,
+
completedb);
+ return NO_SLOT;
+ }
+
+ Assert(i != 0);
+
+
for (i = 0; i < max_slot; i++)
+ {
+ if (!FD_ISSET(pSlot[i].sock,
&slotset))
+ continue;
+
+ PQconsumeInput(pSlot[i].connection);
+
if (PQisBusy(pSlot[i].connection))
+ continue;
+
+
pSlot[i].isFree = true;
+
+ if (!GetQueryResult(pSlot[i].connection, dbname,
progname,
+ completedb))
+
return NO_SLOT;
+
+ if (firstFree < 0)
+ firstFree = i;
+
}
+ }while(firstFree < 0);
}
I wanted to understand what exactly the above loop is doing.
a.
first of all the comment on top of it says "Some of the slot
are free, ...", if some slot is free, then why do you want
to process the results? (Do you mean to say that *None* of
the slot is free....?)
b.
IIUC, you have called function select_loop(maxFd, &slotset)
to check if socket descriptor is readable, if yes then why
in do..while loop the same maxFd is checked always, don't
you want to check different socket descriptors? I am not sure
if I am missing something here
c.
After checking the socket descriptor for maxFd why you want
to run run the below for loop for all slots?
for (i = 0; i < max_slot; i++)
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
> On 06 December 2014 20:01 Amit Kapila Wrote
>
> >I wanted to understand what exactly the above loop is doing.
>
>
>
> >a.
>
> >first of all the comment on top of it says "Some of the slot
>
> >are free, ...", if some slot is free, then why do you want
>
> >to process the results? (Do you mean to say that *None* of
>
> >the slot is free....?)
>
>
>
> This comment is wrong, I will remove this.
>
> >b.
>
> >IIUC, you have called function select_loop(maxFd, &slotset)
>
> >to check if socket descriptor is readable, if yes then why
>
> >in do..while loop the same maxFd is checked always, don't
>
> >you want to check different socket descriptors? I am not sure
>
> >if I am missing something here
>
>
>
> select_loop(maxFd, &slotset)
>
>
> So it’s not only for a maxFd, it’s for all the descriptors. And it’s in do..while loop, because it possible that select_loop come out because of some intermediate message on any of the socket but still query is not complete,
>
+ {
+ /*
+ * This can only happen if user has sent the cancel request using
+ * Ctrl+C, Cancel is handled by 0th slot, so fetch the error result.
+ */
+
+ GetQueryResult(pSlot[0].connection, dbname, progname,
+ completedb);
On December 2014 17:31 Amit Kapila Wrote,
>I suggest rather than removing, edit the comment to indicate
>the idea behind code at that place.
Done
>Okay, I think this part of code is somewhat similar to what
>is done in pg_dump/parallel.c with some differences related
>to handling of inAbort. One thing I have noticed here which
>could lead to a problem is that caller of select_loop() function
>assumes that return value is less than zero only if there is a
>cancel request which I think is wrong, because select system
>call could also return -1 in case of error. I am referring below
>code in above context:
+ if (i < 0)
+ {
+ /*
+ * This can only happen if user has sent the cancel request using
+ * Ctrl+C, Cancel is handled by 0th slot, so fetch the error result.
+ */
+
+ GetQueryResult(pSlot[0].connection, dbname, progname,
+ completedb);
Now for abort case I am using special error code, and other than that case we will assert, this behavior is same as in pg_dump.
>Hmm, theoretically I think new behaviour could lead to more I/O in
>certain cases as compare to existing behaviour. The reason for more I/O
>is that in the new behaviour, while doing Analyze for a particular table at
>different targets, in-between it has Analyze of different table as well,
>so the pages in shared buffers or OS cache for a particular table needs to
>be reloded again for a new target whereas currently it will do all stages
>of Analyze for a particular table in one-go which means that each stage
>of Analyze could get benefit from the pages of a table loaded by previous
>stage. If you agree, then we should try to avoid this change in new
>behaviour.
I will work on this comment and post the updated patch..
I will move this patch to the latest commitfest.
Regards,
Dilip
Attachment
>
> On December 2014 17:31 Amit Kapila Wrote,
>
>
> >Hmm, theoretically I think new behaviour could lead to more I/O in
>
> >certain cases as compare to existing behaviour. The reason for more I/O
>
> >is that in the new behaviour, while doing Analyze for a particular table at
>
> >different targets, in-between it has Analyze of different table as well,
>
> >so the pages in shared buffers or OS cache for a particular table needs to
>
> >be reloded again for a new target whereas currently it will do all stages
>
> >of Analyze for a particular table in one-go which means that each stage
>
> >of Analyze could get benefit from the pages of a table loaded by previous
>
> >stage. If you agree, then we should try to avoid this change in new
>
> >behaviour.
>
>
>
> I will work on this comment and post the updated patch..
>
One idea is to send all the stages and corresponding Analyze commands
>
> I will move this patch to the latest commitfest.
>
By the way, I think this patch should be in Waiting On Author stage.
<div class="WordSection1"><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">On</span><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"></span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">19December 2014 16:41, Amit Kapila Wrote,</span><p class="MsoNormal"><spanstyle="font-size:10.0pt;font-family:"Tahoma","sans-serif""> </span><p class="MsoNormal">>One ideais to send all the stages and corresponding Analyze commands<p class="MsoNormal">>to server in one go which meanssomething like<p class="MsoNormal">>"BEGIN; SET default_statistics_target=1; SET vacuum_cost_delay=0;<p class="MsoNormal">> Analyze t1; COMMIT;" <p class="MsoNormal">>"BEGIN; SET default_statistics_target=10; RESET vacuum_cost_delay;<pclass="MsoNormal">> Analyze t1; COMMIT;"<p class="MsoNormal">>"BEGIN; RESET default_statistics_target;<pclass="MsoNormal">> Analyze t1; COMMIT;"<p class="MsoNormal"> <p class="MsoNormal">Case1:<u>InCase for CompleteDB:</u><p class="MsoNormal"> <p class="MsoNormal">In base code first it willprocess all the tables in stage 1 then in stage2 and so on, so that at some time all the tables are analyzed at leastup to certain stage.<p class="MsoNormal"> <p class="MsoNormal">But If we process all the stages for one table first,and then take the other table for processing the stage 1, then it may happen that for some table all the stages areprocessed, <p class="MsoNormal">but others are waiting for even first stage to be processed, this will affect the functionalityfor analyze-in-stages.<p class="MsoNormal"> <p class="MsoNormal"><u>Case2: In case for independent tables like–t “t1” –t “t2”</u><p class="MsoNormal"> <p class="MsoNormal">In base code also currently we are processing all the stagesfor first table and processing same for next table and so on.<p class="MsoNormal"> <p class="MsoNormal">I think, ifuser is giving multiple tables together then his purpose might be to analyze those tables together stage by stage, <p class="MsoNormal">butin our code we analyze table1 in all stages and then only considering the next table.<p class="MsoNormal"> <pclass="MsoNormal">So for tables also it should be like<p class="MsoNormal">Stage1:<p class="MsoNormal"> T1<p class="MsoNormal"> T2<p class="MsoNormal"> ..<p class="MsoNormal">Stage2:<pclass="MsoNormal"> T1<p class="MsoNormal"> T2<p class="MsoNormal">…<p class="MsoNormal"> <pclass="MsoNormal">Thoughts?<p class="MsoNormal"> <p class="MsoNormal">>Now, still parallel operationsin other backends could lead to<p class="MsoNormal">>page misses, but I think the impact will be minimized.<pclass="MsoNormal"> <p class="MsoNormal">Regards,<p class="MsoNormal">Dilip<p class="MsoNormal"> <p class="MsoNormal"> <pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span></div>
>
> Case1:In Case for CompleteDB:
>
> In base code first it will process all the tables in stage 1 then in stage2 and so on, so that at some time all the tables are analyzed at least up to certain stage.
>
> But If we process all the stages for one table first, and then take the other table for processing the stage 1, then it may happen that for some table all the stages are processed,
>
> but others are waiting for even first stage to be processed, this will affect the functionality for analyze-in-stages.
>
> Case2: In case for independent tables like –t “t1” –t “t2”
>
> In base code also currently we are processing all the stages for first table and processing same for next table and so on.
>
> I think, if user is giving multiple tables together then his purpose might be to analyze those tables together stage by stage,
> but in our code we analyze table1 in all stages and then only considering the next table.
>
So basically you want to say that currently the processing for
<div class="WordSection1"><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">On</span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">29December 2014 10:22 Amit Kapila Wrote,</span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><p class="MsoNormal">>>Case1:In Case for CompleteDB:<br /> >><br /> >> In base code first it will processall the tables in stage 1 then in stage2 and so on, so that at some time all the tables are analyzed at least up tocertain stage.<br /> >><br /> >> But If we process all the stages for one table first, and then take the othertable for processing the stage 1, then it may happen that for some table all the stages are processed,<br /> >><br/> >> but others are waiting for even first stage to be processed, this will affect the functionality foranalyze-in-stages.<br /> >><br /> >> Case2: In case for independent tables like –t “t1” –t “t2”<br /> >><br/> > In base code also currently we are processing all the stages for first table and processing same for nexttable and so on.<br /> >><br /> >> I think, if user is giving multiple tables together then his purpose mightbe to analyze those tables together stage by stage,<br /> >> but in our code we analyze table1 in all stages andthen only considering the next table.<br /> >><br /> >So basically you want to say that currently the processingfor<p class="MsoNormal">>tables with --analyze-in-stages switch is different when the user<p class="MsoNormal">>executesvacuumdb for whole database versus when it does for<p class="MsoNormal">>individual tables(multiple tables together). In the proposed patch<p class="MsoNormal">>the processing for tables will be same foreither cases (whole<p class="MsoNormal">>database or independent tables). I think your point has merit, so<p class="MsoNormal">>letsproceed with this as it is in your patch.<p class="MsoNormal"> <p class="MsoNormal">>Do youhave anything more to handle in patch or shall I take one<p class="MsoNormal">>another look and pass it to committerif it is ready for the same.<p class="MsoNormal"> <p class="MsoNormal">I think nothing more to be handled from myside, you can go ahead with review..<p class="MsoNormal"> <p class="MsoNormal">Regards,<p class="MsoNormal">Dilip<p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span></div>
>
> On 29 December 2014 10:22 Amit Kapila Wrote,
>
>
> I think nothing more to be handled from my side, you can go ahead with review..
>
Attachment
<div class="WordSection1"><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">On</span><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"></span><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">31December 2014 18:36, Amit Kapila Wrote,</span><p class="MsoNormal"><spanstyle="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><p class="MsoNormal">>Thepatch looks good to me. I have done couple of<p class="MsoNormal">>cosmetic changes (spellingmistakes, improve some comments,<p class="MsoNormal">>etc.), check the same once and if you are okay, we canmove<p class="MsoNormal">>ahead. <p class="MsoNormal"> <p class="MsoNormal">Thanks for review and changes, changeslooks fine to me..<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Regards,</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">Dilip</span><pclass="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span></div>
>
> On 31 December 2014 18:36, Amit Kapila Wrote,
>
> >The patch looks good to me. I have done couple of
>
> >cosmetic changes (spelling mistakes, improve some comments,
>
> >etc.), check the same once and if you are okay, we can move
>
> >ahead.
>
>
>
> Thanks for review and changes, changes looks fine to me..
>
Okay, I have marked this patch as "Ready For Committer"
Amit Kapila <amit.kapila16@gmail.com> wrote: > Notes for Committer - > There is one behavioural difference in the handling of --analyze-in-stages > switch, when individual tables (by using -t option) are analyzed by > using this switch, patch will process (in case of concurrent jobs) all the > given tables for stage-1 and then for stage-2 and so on whereas in the > unpatched code it will process all the three stages table by table > (table-1 all three stages, table-2 all three stages and so on). I think > the new behaviour is okay as the same is done when this utility does > vacuum for whole database. IMV, the change is for the better. The whole point of --analyze-in-stages is to get minimal statistics so that "good enough" plans will be built for most queries to allow a production database to be brought back on-line quickly, followed by generating increasing granularity (which takes longer but should help ensure "best plan") while the database is in use with the initial statistics. Doing all three levels for one table before generating the rough statistics for the others doesn't help with that, so I see this change as fixing a bug. Is it feasible to break that part out as a separate patch? -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
>
> Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> > Notes for Committer -
> > There is one behavioural difference in the handling of --analyze-in-stages
> > switch, when individual tables (by using -t option) are analyzed by
> > using this switch, patch will process (in case of concurrent jobs) all the
> > given tables for stage-1 and then for stage-2 and so on whereas in the
> > unpatched code it will process all the three stages table by table
> > (table-1 all three stages, table-2 all three stages and so on). I think
> > the new behaviour is okay as the same is done when this utility does
> > vacuum for whole database.
>
> IMV, the change is for the better. The whole point of
> --analyze-in-stages is to get minimal statistics so that "good
> enough" plans will be built for most queries to allow a production
> database to be brought back on-line quickly, followed by generating
> increasing granularity (which takes longer but should help ensure
> "best plan") while the database is in use with the initial
> statistics. Doing all three levels for one table before generating
> the rough statistics for the others doesn't help with that, so I
> see this change as fixing a bug. Is it feasible to break that part
> out as a separate patch?
>
Currently as the patch stands the fix (or new behaviour) is only
On 2014-12-31 18:35:38 +0530, Amit Kapila wrote: > + <term><option>-j <replaceable class="parameter">jobs</replaceable></option></term> > + <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term> > + <listitem> > + <para> > + Number of concurrent connections to perform the operation. > + This option will enable the vacuum operation to run on asynchronous > + connections, at a time one table will be operated on one connection. > + So at one time as many tables will be vacuumed parallely as number of > + jobs. If number of jobs given are more than number of tables then > + number of jobs will be set to number of tables. "asynchronous connections" isn't a very well defined term. Also, the second part of that sentence doesn't seem to be gramattically correct. > + </para> > + <para> > + <application>vacuumdb</application> will open > + <replaceable class="parameter"> njobs</replaceable> connections to the > + database, so make sure your <xref linkend="guc-max-connections"> > + setting is high enough to accommodate all connections. > + </para> Isn't it njobs+1? > @@ -141,6 +199,7 @@ main(int argc, char *argv[]) > } > } > > + optind++; Hm, where's that coming from? > + PQsetnonblocking(connSlot[0].connection, 1); > + > + for (i = 1; i < concurrentCons; i++) > + { > + connSlot[i].connection = connectDatabase(dbname, host, port, username, > + prompt_password, progname, false); > + > + PQsetnonblocking(connSlot[i].connection, 1); > + connSlot[i].isFree = true; > + connSlot[i].sock = PQsocket(connSlot[i].connection); > + } Are you sure about this global PQsetnonblocking()? This means that you might not be able to send queries... And you don't seem to be waiting for sockets waiting for writes in the select loop - which means you might end up being stuck waiting for reads when you haven't submitted the query. I think you might need a more complex select() loop. On nonfree connections also wait for writes if PQflush() returns != 0. > +/* > + * GetIdleSlot > + * Process the slot list, if any free slot is available then return > + * the slotid else perform the select on all the socket's and wait > + * until atleast one slot becomes available. > + */ > +static int > +GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname, > + const char *progname, bool completedb) > +{ > + int i; > + fd_set slotset; Hm, you probably need to limit -j to FD_SETSIZE - 1 or so. > + int firstFree = -1; > + pgsocket maxFd; > + > + for (i = 0; i < max_slot; i++) > + if (pSlot[i].isFree) > + return i; > + FD_ZERO(&slotset); > + > + maxFd = pSlot[0].sock; > + > + for (i = 0; i < max_slot; i++) > + { > + FD_SET(pSlot[i].sock, &slotset); > + if (pSlot[i].sock > maxFd) > + maxFd = pSlot[i].sock; > + } So we're waiting for idle connections? I think you'll have to have to use two fdsets here, and set the write set based on PQflush() != 0. > +/* > + * A select loop that repeats calling select until a descriptor in the read > + * set becomes readable. On Windows we have to check for the termination event > + * from time to time, on Unix we can just block forever. > + */ Should a) mention why we have to check regularly on windows b) that on linux we don't have to because we send a cancel event from the signal handler. > +static int > +select_loop(int maxFd, fd_set *workerset) > +{ > + int i; > + fd_set saveSet = *workerset; > > +#ifdef WIN32 > + /* should always be the master */ Hm? I have to say, this is a fairly large patch for such a minor feature... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Sun, Jan 4, 2015 at 10:57 AM, Andres Freund <andres@2ndquadrant.com> wrote: > On 2014-12-31 18:35:38 +0530, Amit Kapila wrote: >> + <term><option>-j <replaceable class="parameter">jobs</replaceable></option></term> >> + <term><option>--jobs=<replaceable class="parameter">njobs</replaceable></option></term> >> + <listitem> >> + <para> >> + Number of concurrent connections to perform the operation. >> + This option will enable the vacuum operation to run on asynchronous >> + connections, at a time one table will be operated on one connection. >> + So at one time as many tables will be vacuumed parallely as number of >> + jobs. If number of jobs given are more than number of tables then >> + number of jobs will be set to number of tables. > > "asynchronous connections" isn't a very well defined term. Also, the > second part of that sentence doesn't seem to be gramattically correct. > >> + </para> >> + <para> >> + <application>vacuumdb</application> will open >> + <replaceable class="parameter"> njobs</replaceable> connections to the >> + database, so make sure your <xref linkend="guc-max-connections"> >> + setting is high enough to accommodate all connections. >> + </para> > > Isn't it njobs+1? > >> @@ -141,6 +199,7 @@ main(int argc, char *argv[]) >> } >> } >> >> + optind++; > > Hm, where's that coming from? > >> + PQsetnonblocking(connSlot[0].connection, 1); >> + >> + for (i = 1; i < concurrentCons; i++) >> + { >> + connSlot[i].connection = connectDatabase(dbname, host, port, username, >> + prompt_password, progname, false); >> + >> + PQsetnonblocking(connSlot[i].connection, 1); >> + connSlot[i].isFree = true; >> + connSlot[i].sock = PQsocket(connSlot[i].connection); >> + } > > Are you sure about this global PQsetnonblocking()? This means that you > might not be able to send queries... And you don't seem to be waiting > for sockets waiting for writes in the select loop - which means you > might end up being stuck waiting for reads when you haven't submitted > the query. > > I think you might need a more complex select() loop. On nonfree > connections also wait for writes if PQflush() returns != 0. > > >> +/* >> + * GetIdleSlot >> + * Process the slot list, if any free slot is available then return >> + * the slotid else perform the select on all the socket's and wait >> + * until atleast one slot becomes available. >> + */ >> +static int >> +GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname, >> + const char *progname, bool completedb) >> +{ >> + int i; >> + fd_set slotset; > > > Hm, you probably need to limit -j to FD_SETSIZE - 1 or so. > >> + int firstFree = -1; >> + pgsocket maxFd; >> + >> + for (i = 0; i < max_slot; i++) >> + if (pSlot[i].isFree) >> + return i; > >> + FD_ZERO(&slotset); >> + >> + maxFd = pSlot[0].sock; >> + >> + for (i = 0; i < max_slot; i++) >> + { >> + FD_SET(pSlot[i].sock, &slotset); >> + if (pSlot[i].sock > maxFd) >> + maxFd = pSlot[i].sock; >> + } > > So we're waiting for idle connections? > > I think you'll have to have to use two fdsets here, and set the write > set based on PQflush() != 0. > >> +/* >> + * A select loop that repeats calling select until a descriptor in the read >> + * set becomes readable. On Windows we have to check for the termination event >> + * from time to time, on Unix we can just block forever. >> + */ > > Should a) mention why we have to check regularly on windows b) that on > linux we don't have to because we send a cancel event from the signal > handler. > >> +static int >> +select_loop(int maxFd, fd_set *workerset) >> +{ >> + int i; >> + fd_set saveSet = *workerset; >> >> +#ifdef WIN32 >> + /* should always be the master */ > > Hm? > > > I have to say, this is a fairly large patch for such a minor feature... Andres, this patch needs more effort from the author, right? So marking it as returned with feedback. -- Michael
Michael Paquier wrote: > Andres, this patch needs more effort from the author, right? So > marking it as returned with feedback. I will give this patch a look in the current commitfest, if you can please set as 'needs review' instead with me as reviewer, so that I don't forget, I would appreciate it. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jan 16, 2015 at 12:53 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Michael Paquier wrote: > >> Andres, this patch needs more effort from the author, right? So >> marking it as returned with feedback. > > I will give this patch a look in the current commitfest, if you can > please set as 'needs review' instead with me as reviewer, so that I > don't forget, I would appreciate it. Fine for me, done this way. -- Michael
On 04 January 2015 07:27, Andres Freund Wrote, > On 2014-12-31 18:35:38 +0530, Amit Kapila wrote: > > + <term><option>-j <replaceable > class="parameter">jobs</replaceable></option></term> > > + <term><option>--jobs=<replaceable > class="parameter">njobs</replaceable></option></term> > > + <listitem> > > + <para> > > + Number of concurrent connections to perform the operation. > > + This option will enable the vacuum operation to run on > asynchronous > > + connections, at a time one table will be operated on one > connection. > > + So at one time as many tables will be vacuumed parallely as > number of > > + jobs. If number of jobs given are more than number of > tables then > > + number of jobs will be set to number of tables. > > "asynchronous connections" isn't a very well defined term. Also, the > second part of that sentence doesn't seem to be gramattically correct. I have changed this to concurrent connections, is this ok? > > + </para> > > + <para> > > + <application>vacuumdb</application> will open > > + <replaceable class="parameter"> njobs</replaceable> > connections to the > > + database, so make sure your <xref linkend="guc-max- > connections"> > > + setting is high enough to accommodate all connections. > > + </para> > > Isn't it njobs+1? The main connections what we are using for getting table information, same is use as first slot connections, so total numberof connections are still njobs. > > @@ -141,6 +199,7 @@ main(int argc, char *argv[]) > > } > > } > > > > + optind++; > > Hm, where's that coming from? This is wrong, I have removed it. > > > + PQsetnonblocking(connSlot[0].connection, 1); > > + > > + for (i = 1; i < concurrentCons; i++) > > + { > > + connSlot[i].connection = connectDatabase(dbname, host, port, > username, > > + prompt_password, > progname, false); > > + > > + PQsetnonblocking(connSlot[i].connection, 1); > > + connSlot[i].isFree = true; > > + connSlot[i].sock = PQsocket(connSlot[i].connection); > > + } > > Are you sure about this global PQsetnonblocking()? This means that you > might not be able to send queries... And you don't seem to be waiting > for sockets waiting for writes in the select loop - which means you > might end up being stuck waiting for reads when you haven't submitted > the query. > > I think you might need a more complex select() loop. On nonfree > connections also wait for writes if PQflush() returns != 0. 1. In GetIdleSlot we are making sure that, only if connection is busy, means if we have sent query on that connections, onlyin that case we will wait. 2. When all the connections are busy in that case we are doing select on all FD to make sure some response on connections,and if there is any response on connections Select will come out, then we consume the input and check whether connection is idle, or it's just a intermediate response,if it not busy then we process all the result and set it as free. > > > +/* > > + * GetIdleSlot > > + * Process the slot list, if any free slot is available then return > > + * the slotid else perform the select on all the socket's and wait > > + * until atleast one slot becomes available. > > + */ > > +static int > > +GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname, > > + const char *progname, bool completedb) { > > + int i; > > + fd_set slotset; > > > Hm, you probably need to limit -j to FD_SETSIZE - 1 or so. I will change this in next patch.. > > + int firstFree = -1; > > + pgsocket maxFd; > > + > > + for (i = 0; i < max_slot; i++) > > + if (pSlot[i].isFree) > > + return i; > > > + FD_ZERO(&slotset); > > + > > + maxFd = pSlot[0].sock; > > + > > + for (i = 0; i < max_slot; i++) > > + { > > + FD_SET(pSlot[i].sock, &slotset); > > + if (pSlot[i].sock > maxFd) > > + maxFd = pSlot[i].sock; > > + } > > So we're waiting for idle connections? > > I think you'll have to have to use two fdsets here, and set the write > set based on PQflush() != 0. I did not get this ? The logic here is, we are waiting for any connections to respond, and wait using select on all fds. When select come out, we check all the socket that which all are not busy, mark all the finished connection as idle at once, If none of the connection free, we go to select again, otherwise will return first idle connection. > > +/* > > + * A select loop that repeats calling select until a descriptor in > > +the read > > + * set becomes readable. On Windows we have to check for the > > +termination event > > + * from time to time, on Unix we can just block forever. > > + */ > > Should a) mention why we have to check regularly on windows b) that on > linux we don't have to because we send a cancel event from the signal > handler. I have added the comments.. > > +static int > > +select_loop(int maxFd, fd_set *workerset) { > > + int i; > > + fd_set saveSet = *workerset; > > > > +#ifdef WIN32 > > + /* should always be the master */ > > Hm? > > > I have to say, this is a fairly large patch for such a minor feature... > > Greetings, > > Andres Freund > > -- > Andres Freund http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services
Attachment
I didn't understand the coding in GetQueryResult(); why do we check the result status of the last returned result only? It seems simpler to me to check it inside the loop, but maybe there's a reason you didn't do it like that? Also, what is the reason we were ignoring those errors only in "completedb" mode? It doesn't seem like it would cause any harm if we did it in all cases. That means we can just not have the "completeDb" flag at all. Finally, I think it's better to report the "missing relation" error, even if we're going to return true to continue processing other tables. That makes the situation clearer to the user. So the function would end up looking like this: /** GetQueryResult** Process the query result. Returns true if there's no error, false* otherwise -- but errors about tryingto vacuum a missing relation are* ignored.*/ static bool GetQueryResult(PGconn *conn, errorOptions *erropts) {PGresult *result = NULL; SetCancelConn(conn);while ((result = PQgetResult(conn)) != NULL){ /* * If errors are found, report them. Errors abouta missing table are * harmless so we continue processing, but die for other errors. */ if (PQresultStatus(result)!= PGRES_COMMAND_OK) { char *sqlState = PQresultErrorField(result, PG_DIAG_SQLSTATE); fprintf(stderr, _("%s: vacuuming of database \"%s\" failed: %s"), erropts->progname, erropts->dbname,PQerrorMessage(conn)); if (sqlState && strcmp(sqlState, ERRCODE_UNDEFINED_TABLE) != 0) { PQclear(result); returnfalse; } } PQclear(result);}ResetCancelConn(); return true; } -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
> I didn't understand the coding in GetQueryResult(); why do we check the
> result status of the last returned result only? It seems simpler to me
> to check it inside the loop, but maybe there's a reason you didn't do it
> like that?
>
> Also, what is the reason we were ignoring those errors only in
> "completedb" mode? It doesn't seem like it would cause any harm if we
> did it in all cases. That means we can just not have the "completeDb"
> flag at all.
>
Amit Kapila wrote: > On Wed, Jan 21, 2015 at 8:51 PM, Alvaro Herrera <alvherre@2ndquadrant.com> > wrote: > > > > I didn't understand the coding in GetQueryResult(); why do we check the > > result status of the last returned result only? It seems simpler to me > > to check it inside the loop, but maybe there's a reason you didn't do it > > like that? > > > > Also, what is the reason we were ignoring those errors only in > > "completedb" mode? It doesn't seem like it would cause any harm if we > > did it in all cases. That means we can just not have the "completeDb" > > flag at all. > > IIRC it is done to match the existing behaviour where such errors are > ignored we use this utility to vacuum database. I think that's fine, but we should do it always, not just in whole-database mode. I've been hacking this patch today BTW; hope to have something to show tomorrow. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
> Amit Kapila wrote:
> > On Wed, Jan 21, 2015 at 8:51 PM, Alvaro Herrera <alvherre@2ndquadrant.com>
> > wrote:
> > >
> > > I didn't understand the coding in GetQueryResult(); why do we check the
> > > result status of the last returned result only? It seems simpler to me
> > > to check it inside the loop, but maybe there's a reason you didn't do it
> > > like that?
> > >
> > > Also, what is the reason we were ignoring those errors only in
> > > "completedb" mode? It doesn't seem like it would cause any harm if we
> > > did it in all cases. That means we can just not have the "completeDb"
> > > flag at all.
> >
> > IIRC it is done to match the existing behaviour where such errors are
> > ignored we use this utility to vacuum database.
>
> I think that's fine, but we should do it always, not just in
> whole-database mode.
>
> I've been hacking this patch today BTW; hope to have something to show
> tomorrow.
>
Great!
Here's v23. I reworked a number of things. First, I changed trivial stuff like grouping all the vacuuming options in a struct, to avoid passing an excessive number of arguments to functions. full, freeze, analyze_only, and_analyze and verbose are all in a single struct now. Also, the stage_commands and stage_messages was duplicated by your patch; I moved them to a file-level static struct. I made prepare_command reset the string buffer and receive an optional table name, so that it can append it to the generated command, and append the semicolon as well. Forcing the callers to reset the string before calling, and having them add the table name and semicolon afterwards was awkward and unnecessarily verbose. You had a new in_abort() function in common.c which seems an unnecessary layer; in its place I just exported the inAbort boolean flag it was returning, and renamed to CancelRequested. I was then troubled by the fact that vacuum_one_database() was being called in a loop by main() when multiple tables are vacuumed, but vacuum_parallel() was doing the loop internally. I found this discrepancy confusing, so I renamed that new function to vacuum_one_database_parallel and modified the original vacuum_one_database to do the loop internally as well. Now they are, in essence, a mirror of each other, one doing the parallel stuff and one doing it serially. This seems to make more sense to me -- but see below. I also modified some underlying stuff like GetIdleSlot returning a ParallelSlot pointer instead of an array index. Since its caller always has to dereference the array with the given index, it makes more sense to return the right element pointer instead, so I made it do that. Also, that way, instead of returning NO_SLOT in case of error it can just return NULL; no need for extra cognitive burden. I also changed select_loop. In your patch it had two implementations, one WIN32 and another one for the rest. It looks nicer to me to have only one with small exceptions in the places that need it. (I haven't tested the WIN32 path.) Also, instead of returning ERROR_IN_ABORT I made it set a boolean flag in case of error, which seems cleaner. I changed GetQueryResult as I described in a previous message. There are two things that continue to bother me and I would like you, dear patch author, to change them before committing this patch: 1. I don't like having vacuum_one_database() and a separate vacuum_one_database_parallel(). I think we should merge them into one function, which does either thing according to parameters. There's plenty in there that's duplicated. 2. in particular, the above means that run_parallel_vacuum can no longer exist as it is. Right now vacuum_one_database_parallel relies on run_parallel_vacuum to do the actual job parallellization. I would like to have that looping in the improved vacuum_one_database() function instead. Looking forward to v24, -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 22 January 2015 23:16, Alvaro Herrera Wrote, > Here's v23. > > There are two things that continue to bother me and I would like you, > dear patch author, to change them before committing this patch: > > 1. I don't like having vacuum_one_database() and a separate > vacuum_one_database_parallel(). I think we should merge them into one > function, which does either thing according to parameters. There's > plenty in there that's duplicated. > > 2. in particular, the above means that run_parallel_vacuum can no > longer exist as it is. Right now vacuum_one_database_parallel relies > on run_parallel_vacuum to do the actual job parallellization. I would > like to have that looping in the improved vacuum_one_database() > function instead. > Looking forward to v24, Thanks you for your effort, I have tried to change the patch as per your instructions and come up with v24, Changes: 1. In current patch vacuum_one_database (for table list), have the table loop outside and analyze_stage loop inside, so itwill finish All three stage for one table first and then pick the next table. But vacuum_one_database_parallel will do the stage loopoutside and will call run_parallel_vacuum, Which will have table loop, so for one stage all the tables will be vacuumed first, then go to next stage. So for merging two function both functions behaviors should be identical, I think if user have given a list of tables inanalyze-in-stages, than doing all the table Atleast for one stage and then picking next stage will be better solution so I have done it that way. 2. in select_loop For WIN32 TranslateSocketError function I replaced with if (WSAGetLastError() == WSAEINTR) errno == EINTR; otherwise I have to expose TranslateSocketError function from socket and include extra header. I have tested in windows also its working fine. Regards, Dilip
Attachment
Dilip kumar wrote: > Changes: > 1. In current patch vacuum_one_database (for table list), have the table loop outside and analyze_stage loop inside, soit will finish > All three stage for one table first and then pick the next table. But vacuum_one_database_parallel will do the stage loopoutside and will call run_parallel_vacuum, > Which will have table loop, so for one stage all the tables will be vacuumed first, then go to next stage. > So for merging two function both functions behaviors should be identical, I think if user have given a list of tables inanalyze-in-stages, than doing all the table > Atleast for one stage and then picking next stage will be better solution so I have done it that way. Yeah, I think the stages loop should be outermost, as discussed upthread somewhere -- it's better to have initial stats for all tables as soon as possible, and improve them later, than have some tables/dbs with no stats for a longer period while full stats are computed for some specific tables/database. I'm tweaking your v24 a bit more now, thanks -- main change is to make vacuum_one_database be called only to run one analyze stage, so it never iterates for each stage; callers must iterate calling it multiple times in those cases. (There's only one callsite that needs changing anyway.) > 2. in select_loop > For WIN32 TranslateSocketError function I replaced with > if (WSAGetLastError() == WSAEINTR) > errno == EINTR; > > otherwise I have to expose TranslateSocketError function from socket and include extra header. Grumble. Don't like this bit, but moving TranslateSocketError to src/common is outside the scope of this patch, so okay. (pg_dump's parallel stuff has the same issue anyway.) In case you're up for doing some more work later on, there are two ideas here: move the backend's TranslateSocketError to src/common, and try to merge pg_dump's select_loop function with the one in this new code. But that's for another patch anyway (and this new function needs a little battle-testing, of course.) > I have tested in windows also its working fine. Great, thanks. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andres Freund wrote: > On 2014-12-31 18:35:38 +0530, Amit Kapila wrote: > > + PQsetnonblocking(connSlot[0].connection, 1); > > + > > + for (i = 1; i < concurrentCons; i++) > > + { > > + connSlot[i].connection = connectDatabase(dbname, host, port, username, > > + prompt_password, progname, false); > > + > > + PQsetnonblocking(connSlot[i].connection, 1); > > + connSlot[i].isFree = true; > > + connSlot[i].sock = PQsocket(connSlot[i].connection); > > + } > > Are you sure about this global PQsetnonblocking()? This means that you > might not be able to send queries... And you don't seem to be waiting > for sockets waiting for writes in the select loop - which means you > might end up being stuck waiting for reads when you haven't submitted > the query. > > I think you might need a more complex select() loop. On nonfree > connections also wait for writes if PQflush() returns != 0. I removed the PQsetnonblocking() calls. They were a misunderstanding, I think. > > +/* > > + * GetIdleSlot > > + * Process the slot list, if any free slot is available then return > > + * the slotid else perform the select on all the socket's and wait > > + * until atleast one slot becomes available. > > + */ > > +static int > > +GetIdleSlot(ParallelSlot *pSlot, int max_slot, const char *dbname, > > + const char *progname, bool completedb) > > +{ > > + int i; > > + fd_set slotset; > > > Hm, you probably need to limit -j to FD_SETSIZE - 1 or so. I tried without the check to use 1500 connections, and select() didn't even blink -- everything worked fine vacuuming 1500 tables in parallel on a set of 2000 tables. Not sure what's the actual limit but my FD_SETSIZE says 1024, so I'm clearly over the limit. (I tried to run it with -j2000 but the server didn't start with that many connections. I didn't try any intermediate numbers.) Anyway I added the check. I fixed some more minor issues and pushed. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Alvaro Herrera wrote: > I'm tweaking your v24 a bit more now, thanks -- main change is to make > vacuum_one_database be called only to run one analyze stage, so it never > iterates for each stage; callers must iterate calling it multiple times > in those cases. (There's only one callsite that needs changing anyway.) I made some more changes, particularly so that the TAP test pass (we were missing the semicolon when a table name was not specified to prepare_vacuum_command). I reordered the code in a more sensible manner, remove the vacuum_database_stage layer (which was pretty useless), and changed the analyze-in-stages mode: if we pass a valid stage number, run that stage, if not, then we're not in analyze-in-stage mode. So I got rid of the boolean flag altogether. I also moved the per-stage commands and messages back into a struct inside a function, since there's no need to have them be file-level variables anymore. -j1 is now the same as not specifying anything, and vacuum_one_database uses more common code in the parallel and not-parallel cases: the not-parallel case is just the parallel case with a single connection, so the setup and shutdown is mostly the same in both cases. I pushed the result. Please test, particularly on Windows. If you can use configure --enable-tap-tests and run them ("make check" in the src/bin/scripts subdir) that would be good too .. not sure whether that's expected to work on Windows. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 23 January 2015 21:10, Alvaro Herrera Wrote, > In case you're up for doing some more work later on, there are two > ideas > here: move the backend's TranslateSocketError to src/common, and try to > merge pg_dump's select_loop function with the one in this new code. > But that's for another patch anyway (and this new function needs a > little battle-testing, of course.) > Thanks for your effort, I will take it up for next commitfest..
On 23 January 2015 23:55, Alvaro Herrera, > -j1 is now the same as not specifying anything, and vacuum_one_database > uses more common code in the parallel and not-parallel cases: the not- > parallel case is just the parallel case with a single connection, so > the setup and shutdown is mostly the same in both cases. > > I pushed the result. Please test, particularly on Windows. If you can > use configure --enable-tap-tests and run them ("make check" in the > src/bin/scripts subdir) that would be good too .. not sure whether > that's expected to work on Windows. > I have tested in windows, its working fine, Not sure how to enable tap test in windows, I will check it and run if possible. Thanks, Dilip
[pavel@localhost bin]$ /usr/local/pgsql/bin/vacuumdb test2 -fz -j 4
vacuumdb: vacuuming database "test2"
vacuumdb: vacuuming of database "test2" failed: ERROR: deadlock detected
DETAIL: Process 24689 waits for RowExclusiveLock on relation 2608 of database 194769; blocked by process 24690.
Process 24690 waits for AccessShareLock on relation 1249 of database 194769; blocked by process 24689.
HINT: See server log for query details.
ERROR: deadlock detected
DETAIL: Process 24689 waits for RowExclusiveLock on relation 2608 of database 194769; blocked by process 24690.
Process 24690 waits for AccessShareLock on relation 1249 of database 194769; blocked by process 24689.
Process 24689: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute;
Process 24690: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend;
HINT: See server log for query details.
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_class;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_proc;
LOG: could not send data to client: Broken pipe
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_proc;
FATAL: connection to client lost
LOG: could not send data to client: Broken pipe
ERROR: canceling statement due to user request
FATAL: connection to client lost
Schema | Name | Type | Owner | Size | Description
------------+-------------------------+-------+----------+------------+-------------
pg_catalog | pg_attribute | table | postgres | 439 MB |
pg_catalog | pg_rewrite | table | postgres | 314 MB |
pg_catalog | pg_proc | table | postgres | 136 MB |
pg_catalog | pg_depend | table | postgres | 133 MB |
pg_catalog | pg_class | table | postgres | 69 MB |
pg_catalog | pg_attrdef | table | postgres | 55 MB |
pg_catalog | pg_trigger | table | postgres | 47 MB |
pg_catalog | pg_type | table | postgres | 31 MB |
pg_catalog | pg_description | table | postgres | 23 MB |
pg_catalog | pg_index | table | postgres | 20 MB |
pg_catalog | pg_constraint | table | postgres | 17 MB |
pg_catalog | pg_shdepend | table | postgres | 17 MB |
pg_catalog | pg_statistic | table | postgres | 928 kB |
pg_catalog | pg_operator | table | postgres | 552 kB |
pg_catalog | pg_collation | table | postgres | 232 kB |
Pavel Stehule
On 23 January 2015 21:10, Alvaro Herrera Wrote,
> In case you're up for doing some more work later on, there are two
> ideas
> here: move the backend's TranslateSocketError to src/common, and try to
> merge pg_dump's select_loop function with the one in this new code.
> But that's for another patch anyway (and this new function needs a
> little battle-testing, of course.)
>
Thanks for your effort, I will take it up for next commitfest..
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Em quinta-feira, 29 de janeiro de 2015, Pavel Stehule <pavel.stehule@gmail.com> escreveu:
HiI am testing this feature on relative complex schema (38619 tables in db) and I got deadlock
[pavel@localhost bin]$ /usr/local/pgsql/bin/vacuumdb test2 -fz -j 4
vacuumdb: vacuuming database "test2"
vacuumdb: vacuuming of database "test2" failed: ERROR: deadlock detected
DETAIL: Process 24689 waits for RowExclusiveLock on relation 2608 of database 194769; blocked by process 24690.
Process 24690 waits for AccessShareLock on relation 1249 of database 194769; blocked by process 24689.
HINT: See server log for query details.
ERROR: deadlock detected
DETAIL: Process 24689 waits for RowExclusiveLock on relation 2608 of database 194769; blocked by process 24690.
Process 24690 waits for AccessShareLock on relation 1249 of database 194769; blocked by process 24689.
Process 24689: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute;
Process 24690: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend;
HINT: See server log for query details.
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_class;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_proc;
LOG: could not send data to client: Broken pipe
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_proc;
FATAL: connection to client lost
LOG: could not send data to client: Broken pipe
ERROR: canceling statement due to user request
FATAL: connection to client lost
Schema | Name | Type | Owner | Size | Description
------------+-------------------------+-------+----------+------------+-------------
pg_catalog | pg_attribute | table | postgres | 439 MB |
pg_catalog | pg_rewrite | table | postgres | 314 MB |
pg_catalog | pg_proc | table | postgres | 136 MB |
pg_catalog | pg_depend | table | postgres | 133 MB |
pg_catalog | pg_class | table | postgres | 69 MB |
pg_catalog | pg_attrdef | table | postgres | 55 MB |
pg_catalog | pg_trigger | table | postgres | 47 MB |
pg_catalog | pg_type | table | postgres | 31 MB |
pg_catalog | pg_description | table | postgres | 23 MB |
pg_catalog | pg_index | table | postgres | 20 MB |
pg_catalog | pg_constraint | table | postgres | 17 MB |
pg_catalog | pg_shdepend | table | postgres | 17 MB |
pg_catalog | pg_statistic | table | postgres | 928 kB |
pg_catalog | pg_operator | table | postgres | 552 kB |
pg_catalog | pg_collation | table | postgres | 232 kB |
--
Consultoria/Coaching PostgreSQL
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
Em quinta-feira, 29 de janeiro de 2015, Pavel Stehule <pavel.stehule@gmail.com> escreveu:HiI am testing this feature on relative complex schema (38619 tables in db) and I got deadlock
[pavel@localhost bin]$ /usr/local/pgsql/bin/vacuumdb test2 -fz -j 4
vacuumdb: vacuuming database "test2"
vacuumdb: vacuuming of database "test2" failed: ERROR: deadlock detected
DETAIL: Process 24689 waits for RowExclusiveLock on relation 2608 of database 194769; blocked by process 24690.
Process 24690 waits for AccessShareLock on relation 1249 of database 194769; blocked by process 24689.
HINT: See server log for query details.
ERROR: deadlock detected
DETAIL: Process 24689 waits for RowExclusiveLock on relation 2608 of database 194769; blocked by process 24690.
Process 24690 waits for AccessShareLock on relation 1249 of database 194769; blocked by process 24689.
Process 24689: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute;
Process 24690: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend;
HINT: See server log for query details.
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_attribute;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_depend;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_class;
ERROR: canceling statement due to user request
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_proc;
LOG: could not send data to client: Broken pipe
STATEMENT: VACUUM (FULL, ANALYZE) pg_catalog.pg_proc;
FATAL: connection to client lost
LOG: could not send data to client: Broken pipe
ERROR: canceling statement due to user request
FATAL: connection to client lost
Schema | Name | Type | Owner | Size | Description
------------+-------------------------+-------+----------+------------+-------------
pg_catalog | pg_attribute | table | postgres | 439 MB |
pg_catalog | pg_rewrite | table | postgres | 314 MB |
pg_catalog | pg_proc | table | postgres | 136 MB |
pg_catalog | pg_depend | table | postgres | 133 MB |
pg_catalog | pg_class | table | postgres | 69 MB |
pg_catalog | pg_attrdef | table | postgres | 55 MB |
pg_catalog | pg_trigger | table | postgres | 47 MB |
pg_catalog | pg_type | table | postgres | 31 MB |
pg_catalog | pg_description | table | postgres | 23 MB |
pg_catalog | pg_index | table | postgres | 20 MB |
pg_catalog | pg_constraint | table | postgres | 17 MB |
pg_catalog | pg_shdepend | table | postgres | 17 MB |
pg_catalog | pg_statistic | table | postgres | 928 kB |
pg_catalog | pg_operator | table | postgres | 552 kB |
pg_catalog | pg_collation | table | postgres | 232 kB |
Regards,Fabrízio
--Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello>> Github: http://github.com/fabriziomello
Pavel Stehule wrote: > should not be used a pessimist controlled locking instead? > Patches welcome. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Okay, I have marked this patch as "Ready For Committer"Notes for Committer -There is one behavioural difference in the handling of --analyze-in-stagesswitch, when individual tables (by using -t option) are analyzed byusing this switch, patch will process (in case of concurrent jobs) all thegiven tables for stage-1 and then for stage-2 and so on whereas in theunpatched code it will process all the three stages table by table(table-1 all three stages, table-2 all three stages and so on). I thinkthe new behaviour is okay as the same is done when this utility doesvacuum for whole database. As there was no input from any committeron this point, I thought it is better to get the same rather than waitingmore just for one point.
--
Laurent Laborde wrote: > Friendly greetings ! > > What's the status of parallel clusterdb please ? > I'm having fun (and troubles) applying the vacuumdb patch to clusterdb. > > This thread also talk about unifying code for parallelizing clusterdb and > reindex. > Was anything done about it ? Because i can't see it and my work currently > involve a lot of copy/pasting from vacuumdb to clusterdb :) Honestly, I have to wonder whether there are really valid use cases for clusterdb. Are you actually using it and want to see it improved, or is this just an academical exercise? > And no, (i'm pretty sure) i don't have the required postgresql knowledge to > do this unification if it isn't done already. You may or may not lack it *now*, but that doesn't mean you will continue to lack it forever. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
<div dir="ltr"><p dir="ltr">Le 23 juil. 2015 19:27, "Alvaro Herrera" <<a href="mailto:alvherre@2ndquadrant.com" target="_blank">alvherre@2ndquadrant.com</a>>a écrit :<br /> ><br /> > Laurent Laborde wrote:<br /> ><br /> >> Friendly greetings !<br /> > ><br /> > > What's the status of parallel clusterdb please ?<br /> >> I'm having fun (and troubles) applying the vacuumdb patch to clusterdb.<br /> > ><br /> > > This threadalso talk about unifying code for parallelizing clusterdb and<br /> > > reindex.<br /> > > Was anythingdone about it ? Because i can't see it and my work currently<br /> > > involve a lot of copy/pasting from vacuumdbto clusterdb :)<br /> ><br /> > Honestly, I have to wonder whether there are really valid use cases for<br/> > clusterdb. Are you actually using it and want to see it improved, or is<br /> > this just an academicalexercise?<p dir="ltr">Purely academical. I don't use it.<br /><p dir="ltr">> > And no, (i'm pretty sure)i don't have the required postgresql knowledge to<br /> > > do this unification if it isn't done already.<br />><br /> > You may or may not lack it *now*, but that doesn't mean you will<br /> > continue to lack it forever.<pdir="ltr">That's why i'm working on it :)<br /><br />--<br />Laurent "ker2x" Laborde<br />DBA \o/ <a href="http://gandi.net">gandi.net</a></div>