Re: POC: Parallel processing of indexes in autovacuum - Mailing list pgsql-hackers

From SATYANARAYANA NARLAPURAM
Subject Re: POC: Parallel processing of indexes in autovacuum
Date
Msg-id CAHg+QDdRyxC-cBk7CK-=pnfqFNWh6BeFDDnz3CSUPyoTbdUJ+A@mail.gmail.com
Whole thread
In response to Re: POC: Parallel processing of indexes in autovacuum  (Daniil Davydov <3danissimo@gmail.com>)
Responses Re: POC: Parallel processing of indexes in autovacuum
List pgsql-hackers
Hi

On Mon, Mar 30, 2026 at 1:44 AM Daniil Davydov <3danissimo@gmail.com> wrote:
Hi,

On Mon, Mar 30, 2026 at 7:17 AM SATYANARAYANA NARLAPURAM
<satyanarlapuram@gmail.com> wrote:
>
> Thank you for working on this, very useful feature. Sharing a few thoughts:
>
> 1. Shouldn't we also cap by max_parallel_workers to avoid wasting DSM resources in parallel_vacuum_compute_workers?

Actually, autovacuum_max_parallel_workers is already limited by
max_parallel_workers. It is not clear for me why we allow setting this GUC
higher than max_parallel_workers, but if this happens, I think it is a user's
misconfiguration.

> 2. Is it intentional that other autovacuum workers not yield cost limits to the parallel auto vacuum workers? Cost limits are distributed first equally to the autovacuum workers.
> and then they share that. Therefore, parallel workers will be heavily throttled. IIUC, this problem doesn't exist with manual vacuum.
>  If we don't fix this, at least we should document this.

Parallel a/v workers inherit cost based parameters (including the
vacuum_cost_limit) from the leader worker. Do you mean that this can be too
low value for parallel operation? If so, user can manually increase the
vacuum_cost_limit reloption for those tables, where parallel a/v sleeps too
much (due to cost delay).

BTW, describing the cost limit propagation to the parallel a/v workers is
worth mentioning in the documentation. I'll add it in the next patch version.

> 3. Additionally, is there a point where, based on the cost limits, launching additional workers becomes counterproductive compared to running fewer workers and preventing it?

I don't think that we can possibly find a universal limit that will be
appropriate for all possible configurations. By now we are using a pretty
simple formula for parallel degree calculation. Since user have several ways
to affect this formula, I guess that there will be no problems with it (except
my concerns about opt-out style).

> 4. Would it make sense to add a table level override to disable parallelism or set parallel worker count?

We already have the "autovacuum_parallel_workers" reloption that is used as
an additional limit for the number of parallel workers. In particular, this
reloption can be used to disable parallelism at all.

>
> I ran some perf tests to show the improvements with parallel vacuum and shared below.

Thank you very much!

> Observations:
>
> 1. Parallel autovacuum provides consistent speedup. With cost_limit=200 and
>    7 workers, vacuum completes 1.41x faster (71s -> 50s). With cost_limit=60,
>    the speedup is 1.25x (194s -> 154s).
> 2. I see the benefit comes from parallelizing index vacuum. With 8 indexes totaling
>    ~530 MB, parallel workers scan indexes concurrently instead of the leader
>    scanning them one by one. The leader's CPU user time drops from ~3s to
>    ~0.8s as index work is offloaded
>

1.41 speedup with 7 parallel workers may not seem like a great win, but it is
a whole time of autovacuum operation (not only index bulkdel/cleanup) with
pretty small indexes.

May I ask you to run the same test with a higher table's size (several dozen
gigabytes)? I think the results will be more "expressive".

I ran it with a Billion rows in a table with 8 indexes. The improvement with 7 workers is 1.8x.
Please note that there is a fixed overhead in other vacuum steps, for example heap scan.
In the environments where cost-based delay is used (the default), benefits will be modest 
unless vacuum_cost_delay is set to sufficiently large value.

Hardware:
  CPU:     Intel Xeon Platinum 8573C, 1 socket × 8 cores × 2 threads = 16 vCPUs
  RAM:     128 GB (131,900 MB)
  Swap:    None

Workload Description

Table Schema:

  CREATE TABLE avtest (
      id       bigint PRIMARY KEY,
      col1     int,           -- random()*1e9
      col2     int,           -- random()*1e9
      col3     int,           -- random()*1e9
      col4     int,           -- random()*1e9
      col5     int,           -- random()*1e9
      col6     text,          -- 'text_' || random()*1e6  (short text ~10 chars)
      col7     timestamp,     -- now() - random()*365 days
      padding  text           -- repeat('x', 50)
  ) WITH (fillfactor = 90);

Indexes (8 total):
  avtest_pkey   — btree on (id)        bigint
  idx_av_col1   — btree on (col1)      int
  idx_av_col2   — btree on (col2)      int
  idx_av_col3   — btree on (col3)      int
  idx_av_col4   — btree on (col4)      int
  idx_av_col5   — btree on (col5)      int
  idx_av_col6   — btree on (col6)      text
  idx_av_col7   — btree on (col7)      timestamp

Dead Tuple Generation:
  DELETE FROM avtest WHERE id % 5 IN (1, 2);
  This deletes exactly 40% of rows, uniformly distributed across all pages.

Vacuum Trigger:
  Autovacuum is triggered naturally by lowering the threshold to 0 and setting
  scale_factor to a value that causes immediate launch after the DELETE.

Worker Configurations Tested:
  0 workers  — leader-only vacuum (baseline, no parallelism)
  2 workers  — leader + 2 parallel workers (3 processes total)
  4 workers  — leader + 4 parallel workers (5 processes total)
  7 workers  — leader + 7 parallel workers (8 processes total, 1 per index)

Dataset:
  Rows:         1,000,000,000
  Heap size:    139 GB
  Total size:   279 GB (heap + 8 indexes)
  Dead tuples:  400,000,000 (40%)

Index Sizes:
  avtest_pkey    21 GB   (bigint)
  idx_av_col7    21 GB   (timestamp)
  idx_av_col1    18 GB   (int)
  idx_av_col2    18 GB   (int)
  idx_av_col3    18 GB   (int)
  idx_av_col4    18 GB   (int)
  idx_av_col5    18 GB   (int)
  idx_av_col6     7 GB   (text — shorter keys, smaller index)
  Total indexes: 139 GB

Server Settings:
  shared_buffers                = 96GB
  maintenance_work_mem          = 1GB
  max_wal_size                  = 100GB
  checkpoint_timeout            = 1h
  autovacuum_vacuum_cost_delay  = 0ms (NO throttling)
  autovacuum_vacuum_cost_limit  = 1000


Summary:

Workers  Avg(s)    Min(s)    Max(s)    Speedup   Time Saved
-------  ------    ------    ------    -------   ----------
0        1645.93   1645.01   1646.84    1.00x          —
2        1276.35   1275.64   1277.05    1.29x     369.58s (6.2 min)
4        1052.62   1048.92   1056.32    1.56x     593.31s (9.9 min)
7         892.23    886.59    897.86    1.84x     753.70s (12.6 min)


 Thanks,
Satya

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Introduce XID age based replication slot invalidation
Next
From: Masahiko Sawada
Date:
Subject: Re: Initial COPY of Logical Replication is too slow