Home > mailing lists

TODO : Allow parallel cores to be used by vacuumdb [ WIP ] - Mailing list pgsql-hackers

From	Dilip kumar
Subject	TODO : Allow parallel cores to be used by vacuumdb [ WIP ]
Date	November 7, 2013 21:14:11
Msg-id	4205E661176A124FAF891E0A6BA9135265924388@SZXEML507-MBS.china.huawei.com Whole thread Raw
Responses	Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] (Euler Taveira <euler@timbira.com.br>) Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] (Jan Lentfer <Jan.Lentfer@web.de>) Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] (Michael Paquier <michael.paquier@gmail.com>) Re: TODO : Allow parallel cores to be used by vacuumdb [ WIP ] (Michael Paquier <michael.paquier@gmail.com>)
List	pgsql-hackers

Tree view

This patch implementing the following TODO item

Allow parallel cores to be used by vacuumdb

http://www.postgresql.org/message-id/4F10A728.7090403@agliodbs.com

Like Parallel pg_dump, vacuumdb is provided with the option to run the vacuum of multiple tables in parallel. [ vacuumdb –j ]

1. One new option is provided with vacuumdb to give the number of workers.

2. All worker will be started in beginning and all will be waiting for the vacuum instruction from the master.

3. Now, if table list is provided in vacuumdb command using –t then, it will send the vacuum of one table to one of the IDLE worker, next table to next IDLE worker and so on.

4. If vacuum is given for one DB then, it will execute select on pg_class to get the table list and fetch the table name one by one and also assign the vacuum responsibility to IDLE workers.

Performance Data by parallel vacuumdb:

Machine Configuration:

Core : 8

RAM: 24GB

Test Scenario:

16 tables all with 4M records. [many records are deleted and inserted using some pattern, (files is attached in the mail)]

Test Result

{Base Code} Time(s) %CPU Usage Avg Read(kB/s) Avg Write(kB/s)

521 3% 12000 20000

{With Parallel Vacuum Patch}

worker Time(s) %CPU Usage Avg Read(kB/s) Avg Write(kB/s)

1 518 3% 12000 20000 --> this will take the same path as base code

2 390 5% 14000 30000

8 235 7% 18000 40000

16 197 8% 20000 50000

Conclusion:

By running the vacuumdb in parallel, CPU and I/O throughput is increasing and it can give >50% performance improvement.

Work to be Done:

1. Documentations of the new command.

2. Parallel support for vacuum all db.

Is it required to move the common code for parallel operation of pg_dump and vacuumdb to one place and reuse it ?

Prototype patch is attached in the mail, please provide your feedback/Suggestions…

Thanks & Regards,

Dilip Kumar

Attachment

pgsql-hackers by date:

From: "Joshua D. Drake"
Date: 07 November 2013, 21:08:35
Subject: Re: Changing pg_dump default file format

From: Magnus Hagander
Date: 07 November 2013, 21:41:35
Subject: Re: Changing pg_dump default file format

TODO : Allow parallel cores to be used by vacuumdb [ WIP ] - Mailing list pgsql-hackers

Attachment

Previous

Next