Re: POC: Parallel processing of indexes in autovacuum - Mailing list pgsql-hackers
| From | Daniil Davydov |
|---|---|
| Subject | Re: POC: Parallel processing of indexes in autovacuum |
| Date | |
| Msg-id | CAJDiXgi73x7h0=UoXriFjskRB6htZ-uqXKqvWN3RefuxbP93gA@mail.gmail.com Whole thread Raw |
| In response to | Re: POC: Parallel processing of indexes in autovacuum (SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com>) |
| Responses |
Re: POC: Parallel processing of indexes in autovacuum
Re: POC: Parallel processing of indexes in autovacuum Re: POC: Parallel processing of indexes in autovacuum |
| List | pgsql-hackers |
Hi, On Mon, Mar 30, 2026 at 7:17 AM SATYANARAYANA NARLAPURAM <satyanarlapuram@gmail.com> wrote: > > Thank you for working on this, very useful feature. Sharing a few thoughts: > > 1. Shouldn't we also cap by max_parallel_workers to avoid wasting DSM resources in parallel_vacuum_compute_workers? Actually, autovacuum_max_parallel_workers is already limited by max_parallel_workers. It is not clear for me why we allow setting this GUC higher than max_parallel_workers, but if this happens, I think it is a user's misconfiguration. > 2. Is it intentional that other autovacuum workers not yield cost limits to the parallel auto vacuum workers? Cost limitsare distributed first equally to the autovacuum workers. > and then they share that. Therefore, parallel workers will be heavily throttled. IIUC, this problem doesn't exist withmanual vacuum. > If we don't fix this, at least we should document this. Parallel a/v workers inherit cost based parameters (including the vacuum_cost_limit) from the leader worker. Do you mean that this can be too low value for parallel operation? If so, user can manually increase the vacuum_cost_limit reloption for those tables, where parallel a/v sleeps too much (due to cost delay). BTW, describing the cost limit propagation to the parallel a/v workers is worth mentioning in the documentation. I'll add it in the next patch version. > 3. Additionally, is there a point where, based on the cost limits, launching additional workers becomes counterproductivecompared to running fewer workers and preventing it? I don't think that we can possibly find a universal limit that will be appropriate for all possible configurations. By now we are using a pretty simple formula for parallel degree calculation. Since user have several ways to affect this formula, I guess that there will be no problems with it (except my concerns about opt-out style). > 4. Would it make sense to add a table level override to disable parallelism or set parallel worker count? We already have the "autovacuum_parallel_workers" reloption that is used as an additional limit for the number of parallel workers. In particular, this reloption can be used to disable parallelism at all. > > I ran some perf tests to show the improvements with parallel vacuum and shared below. Thank you very much! > Observations: > > 1. Parallel autovacuum provides consistent speedup. With cost_limit=200 and > 7 workers, vacuum completes 1.41x faster (71s -> 50s). With cost_limit=60, > the speedup is 1.25x (194s -> 154s). > 2. I see the benefit comes from parallelizing index vacuum. With 8 indexes totaling > ~530 MB, parallel workers scan indexes concurrently instead of the leader > scanning them one by one. The leader's CPU user time drops from ~3s to > ~0.8s as index work is offloaded > 1.41 speedup with 7 parallel workers may not seem like a great win, but it is a whole time of autovacuum operation (not only index bulkdel/cleanup) with pretty small indexes. May I ask you to run the same test with a higher table's size (several dozen gigabytes)? I think the results will be more "expressive". -- Best regards, Daniil Davydov
pgsql-hackers by date: