TODO : Allow parallel cores to be used by vacuumdb [ WIP ] - Mailing list pgsql-hackers

From Dilip kumar
Subject TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Date
Msg-id 4205E661176A124FAF891E0A6BA9135265924388@SZXEML507-MBS.china.huawei.com
Whole thread Raw
Responses Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]  (Euler Taveira <euler@timbira.com.br>)
Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]  (Jan Lentfer <Jan.Lentfer@web.de>)
Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]  (Michael Paquier <michael.paquier@gmail.com>)
Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ]  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers

This patch implementing the following TODO item

 

Allow parallel cores to be used by vacuumdb

http://www.postgresql.org/message-id/4F10A728.7090403@agliodbs.com

 

Like Parallel pg_dump, vacuumdb is provided with the option to run the vacuum of multiple tables in parallel. [ vacuumdb –j ]

 

1.       One new option is provided with vacuumdb to give the number of workers.

2.       All worker will be started in beginning and all will be waiting for the vacuum instruction from the master.

3.       Now, if table list is provided in vacuumdb command using –t then, it will send the vacuum of one table to one of the IDLE worker, next table to next IDLE worker and so on.

4.       If vacuum is given for one DB then, it will execute select on pg_class to get the table list and fetch the table name one by one and also assign the vacuum responsibility to IDLE workers.

 

Performance Data by parallel vacuumdb:

Machine Configuration:

                                Core : 8

                                RAM: 24GB        

Test Scenario:

                                16 tables all with 4M records. [many records are deleted and inserted using some pattern, (files is attached in the mail)]

 

Test Result

 

{Base Code}    Time(s)    %CPU Usage      Avg Read(kB/s)    Avg Write(kB/s)

                                521       3%                         12000                       20000                                                   

 

 

{With Parallel Vacuum Patch}

   worker          Time(s)    %CPU Usage    Avg Read(kB/s)          Avg Write(kB/s)

      1                     518                     3%                 12000                           20000   --> this will take the same path as base code

      2                     390                     5%                 14000                           30000            

      8                     235                     7%                 18000                           40000

      16                   197                     8%                 20000                           50000

 

Conclusion:

                By running the vacuumdb in parallel, CPU and I/O throughput is increasing and it can give >50% performance improvement.

 

Work to be Done:

1.       Documentations of the new command.

2.       Parallel support for vacuum all db.

 

Is it required to move the common code for parallel operation of pg_dump and vacuumdb to one place and reuse it ?

 

Prototype patch is attached in the mail, please provide your feedback/Suggestions…

 

                Thanks & Regards,

                Dilip Kumar

 

 

Attachment

pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: Changing pg_dump default file format
Next
From: Magnus Hagander
Date:
Subject: Re: Changing pg_dump default file format