Re: POC: Parallel processing of indexes in autovacuum - Mailing list pgsql-hackers
| From | SATYANARAYANA NARLAPURAM |
|---|---|
| Subject | Re: POC: Parallel processing of indexes in autovacuum |
| Date | |
| Msg-id | CAHg+QDdRyxC-cBk7CK-=pnfqFNWh6BeFDDnz3CSUPyoTbdUJ+A@mail.gmail.com Whole thread |
| In response to | Re: POC: Parallel processing of indexes in autovacuum (Daniil Davydov <3danissimo@gmail.com>) |
| Responses |
Re: POC: Parallel processing of indexes in autovacuum
|
| List | pgsql-hackers |
Hi
On Mon, Mar 30, 2026 at 1:44 AM Daniil Davydov <3danissimo@gmail.com> wrote:
Hi,
On Mon, Mar 30, 2026 at 7:17 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
>
> Thank you for working on this, very useful feature. Sharing a few thoughts:
>
> 1. Shouldn't we also cap by max_parallel_workers to avoid wasting DSM resources in parallel_vacuum_compute_workers?
Actually, autovacuum_max_parallel_workers is already limited by
max_parallel_workers. It is not clear for me why we allow setting this GUC
higher than max_parallel_workers, but if this happens, I think it is a user's
misconfiguration.
> 2. Is it intentional that other autovacuum workers not yield cost limits to the parallel auto vacuum workers? Cost limits are distributed first equally to the autovacuum workers.
> and then they share that. Therefore, parallel workers will be heavily throttled. IIUC, this problem doesn't exist with manual vacuum.
> If we don't fix this, at least we should document this.
Parallel a/v workers inherit cost based parameters (including the
vacuum_cost_limit) from the leader worker. Do you mean that this can be too
low value for parallel operation? If so, user can manually increase the
vacuum_cost_limit reloption for those tables, where parallel a/v sleeps too
much (due to cost delay).
BTW, describing the cost limit propagation to the parallel a/v workers is
worth mentioning in the documentation. I'll add it in the next patch version.
> 3. Additionally, is there a point where, based on the cost limits, launching additional workers becomes counterproductive compared to running fewer workers and preventing it?
I don't think that we can possibly find a universal limit that will be
appropriate for all possible configurations. By now we are using a pretty
simple formula for parallel degree calculation. Since user have several ways
to affect this formula, I guess that there will be no problems with it (except
my concerns about opt-out style).
> 4. Would it make sense to add a table level override to disable parallelism or set parallel worker count?
We already have the "autovacuum_parallel_workers" reloption that is used as
an additional limit for the number of parallel workers. In particular, this
reloption can be used to disable parallelism at all.
>
> I ran some perf tests to show the improvements with parallel vacuum and shared below.
Thank you very much!
> Observations:
>
> 1. Parallel autovacuum provides consistent speedup. With cost_limit=200 and
> 7 workers, vacuum completes 1.41x faster (71s -> 50s). With cost_limit=60,
> the speedup is 1.25x (194s -> 154s).
> 2. I see the benefit comes from parallelizing index vacuum. With 8 indexes totaling
> ~530 MB, parallel workers scan indexes concurrently instead of the leader
> scanning them one by one. The leader's CPU user time drops from ~3s to
> ~0.8s as index work is offloaded
>
1.41 speedup with 7 parallel workers may not seem like a great win, but it is
a whole time of autovacuum operation (not only index bulkdel/cleanup) with
pretty small indexes.
May I ask you to run the same test with a higher table's size (several dozen
gigabytes)? I think the results will be more "expressive".
I ran it with a Billion rows in a table with 8 indexes. The improvement with 7 workers is 1.8x.
Please note that there is a fixed overhead in other vacuum steps, for example heap scan.
In the environments where cost-based delay is used (the default), benefits will be modest
unless vacuum_cost_delay is set to sufficiently large value.
Hardware:
CPU: Intel Xeon Platinum 8573C, 1 socket × 8 cores × 2 threads = 16 vCPUs
RAM: 128 GB (131,900 MB)
Swap: None
CPU: Intel Xeon Platinum 8573C, 1 socket × 8 cores × 2 threads = 16 vCPUs
RAM: 128 GB (131,900 MB)
Swap: None
Workload Description
Table Schema:
CREATE TABLE avtest (
id bigint PRIMARY KEY,
col1 int, -- random()*1e9
col2 int, -- random()*1e9
col3 int, -- random()*1e9
col4 int, -- random()*1e9
col5 int, -- random()*1e9
col6 text, -- 'text_' || random()*1e6 (short text ~10 chars)
col7 timestamp, -- now() - random()*365 days
padding text -- repeat('x', 50)
) WITH (fillfactor = 90);
Indexes (8 total):
avtest_pkey — btree on (id) bigint
idx_av_col1 — btree on (col1) int
idx_av_col2 — btree on (col2) int
idx_av_col3 — btree on (col3) int
idx_av_col4 — btree on (col4) int
idx_av_col5 — btree on (col5) int
idx_av_col6 — btree on (col6) text
idx_av_col7 — btree on (col7) timestamp
Dead Tuple Generation:
DELETE FROM avtest WHERE id % 5 IN (1, 2);
This deletes exactly 40% of rows, uniformly distributed across all pages.
Vacuum Trigger:
Autovacuum is triggered naturally by lowering the threshold to 0 and setting
scale_factor to a value that causes immediate launch after the DELETE.
Worker Configurations Tested:
0 workers — leader-only vacuum (baseline, no parallelism)
2 workers — leader + 2 parallel workers (3 processes total)
4 workers — leader + 4 parallel workers (5 processes total)
7 workers — leader + 7 parallel workers (8 processes total, 1 per index)
Dataset:
Rows: 1,000,000,000
Heap size: 139 GB
Total size: 279 GB (heap + 8 indexes)
Dead tuples: 400,000,000 (40%)
Index Sizes:
avtest_pkey 21 GB (bigint)
idx_av_col7 21 GB (timestamp)
idx_av_col1 18 GB (int)
idx_av_col2 18 GB (int)
idx_av_col3 18 GB (int)
idx_av_col4 18 GB (int)
idx_av_col5 18 GB (int)
idx_av_col6 7 GB (text — shorter keys, smaller index)
Total indexes: 139 GB
Server Settings:
shared_buffers = 96GB
maintenance_work_mem = 1GB
max_wal_size = 100GB
checkpoint_timeout = 1h
autovacuum_vacuum_cost_delay = 0ms (NO throttling)
autovacuum_vacuum_cost_limit = 1000
Summary:
Workers Avg(s) Min(s) Max(s) Speedup Time Saved
------- ------ ------ ------ ------- ----------
0 1645.93 1645.01 1646.84 1.00x —
2 1276.35 1275.64 1277.05 1.29x 369.58s (6.2 min)
4 1052.62 1048.92 1056.32 1.56x 593.31s (9.9 min)
7 892.23 886.59 897.86 1.84x 753.70s (12.6 min)
Rows: 1,000,000,000
Heap size: 139 GB
Total size: 279 GB (heap + 8 indexes)
Dead tuples: 400,000,000 (40%)
Index Sizes:
avtest_pkey 21 GB (bigint)
idx_av_col7 21 GB (timestamp)
idx_av_col1 18 GB (int)
idx_av_col2 18 GB (int)
idx_av_col3 18 GB (int)
idx_av_col4 18 GB (int)
idx_av_col5 18 GB (int)
idx_av_col6 7 GB (text — shorter keys, smaller index)
Total indexes: 139 GB
Server Settings:
shared_buffers = 96GB
maintenance_work_mem = 1GB
max_wal_size = 100GB
checkpoint_timeout = 1h
autovacuum_vacuum_cost_delay = 0ms (NO throttling)
autovacuum_vacuum_cost_limit = 1000
Summary:
Workers Avg(s) Min(s) Max(s) Speedup Time Saved
------- ------ ------ ------ ------- ----------
0 1645.93 1645.01 1646.84 1.00x —
2 1276.35 1275.64 1277.05 1.29x 369.58s (6.2 min)
4 1052.62 1048.92 1056.32 1.56x 593.31s (9.9 min)
7 892.23 886.59 897.86 1.84x 753.70s (12.6 min)
Thanks,
Satya
pgsql-hackers by date: