Re: POC: Parallel processing of indexes in autovacuum - Mailing list pgsql-hackers

From SATYANARAYANA NARLAPURAM
Subject Re: POC: Parallel processing of indexes in autovacuum
Date
Msg-id CAHg+QDehxaJEd1Yp1MpW8UO71xmbasy7t2GZGvqOYwkr0md8DQ@mail.gmail.com
Whole thread
In response to Re: POC: Parallel processing of indexes in autovacuum  (Daniil Davydov <3danissimo@gmail.com>)
Responses Re: POC: Parallel processing of indexes in autovacuum
List pgsql-hackers
Hi

On Sat, Mar 28, 2026 at 4:11 AM Daniil Davydov <3danissimo@gmail.com> wrote:
Hi,

On Thu, Mar 26, 2026 at 5:43 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> On Wed, Mar 25, 2026 at 12:45 AM Daniil Davydov <3danissimo@gmail.com> wrote:
> >
> >  Searching for arguments in
> > favor of opt-in style, I asked for help from another person who has been
> > managing the setup of highload systems for decades. He promised to share his
> > opinion next week.
>
> Given that we have one and half weeks before the feature freeze, I
> think it's better to complete the project first before waiting for
> his/her comments next week. Even if we finish this feature with the
> opt-out style, we can hear more opinions on it and change the default
> behavior as the change would be privial. What do you think?
>

Sure, if we can change the default value after the feature freeze, I don't
mind leaving our parameter in opt-out style by now.

> I've squashed all patches except for the documentation patch as I
> assume you're working on it. The attached fixup patch contains several
> changes: using opt-out style, comment improvements, and fixing typos
> etc.
>

Thank you very much for the proposed fixes!
I like the way you have changed nparallel_workers calculation (autovacuum.c).
Forcing parallel workers to always read shared cost params at the first time
is a good decision. All comments changes are also LGTM.

The only place that I have changed is reloptions.c :
As you have explained, it is not appropriate to use the "overrides" wording
in the reloption's description, so I decided to return an old one.

On Fri, Mar 27, 2026 at 10:54 AM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Hi,
>
> On Wed, Mar 25, 2026 at 3:43 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > Given that we have one and half weeks before the feature freeze, I
> > think it's better to complete the project first before waiting for
> > his/her comments next week. Even if we finish this feature with the
> > opt-out style, we can hear more opinions on it and change the default
> > behavior as the change would be privial. What do you think?
> >
> > I've squashed all patches except for the documentation patch as I
> > assume you're working on it. The attached fixup patch contains several
> > changes: using opt-out style, comment improvements, and fixing typos
> > etc.
>
> +1 for enabling this feature by default. When enough CPU is available,
> vacuuming multiple indexes of a table in parallel in autovacuum
> definitely speeds things up.

Yes, for sure. But I have concerns that enabling parallel a/v for everyone
will cause the parallel workers shortage during processing of the most huge
tables.

> Thank you for sending the latest patches. I quickly reviewed the v31
> patches. Here are some comments.
>
> 1/ +       {"autovacuum_parallel_workers", RELOPT_TYPE_INT,
>
> I haven't looked at the whole thread, but do we all think we need this
> as a relopt? IMHO, we can wait for field experience and introduce this
> later.

I think that we should leave both reloption and the config parameter.
Getting rid from the reloption will greatly reduce the ability of users to
tune this feature. I'm afraid that this may lead to people not using parallel
autovacuum.

> I'm having a hard time finding a use-case where one wants to
> disable the indexes at the table level. If there was already an
> agreement, I agree to commit to that decision.

You can read discussion from [1] to the current message in order to dive into
the question.

To make the long story short, I think that the most common use case for this
feature is allowing parallel a/v for 2-3 tables, each of which has ~100
indexes. The rest of the tables do not require parallel processing (at least
it's a much lower priority for them).

At the same time, Masahiko-san thinks that only the system should decide which
tables will be processed in parallel. System's decision should be based on the
number of indexes and a few other config parameters (e.g.
min_parallel_index_scan_size). Thus, possibly many tables will be able to be
processed in parallel.

(Both opinions are pretty simplified).

>
> 2/  +   /*
> +    * If 'true' then we are running parallel autovacuum. Otherwise, we are
> +    * running parallel maintenence VACUUM.
> +    */
> +   bool        is_autovacuum;
> +
>
> The variable name looks a bit confusing. How about we rely on
> AmAutoVacuumWorkerProcess() and avoid the bool in shared memory?

This variable is needed for parallel workers, which are taken from the
bgworkers pool. I.e. AmAutovacuumWorker() will return 'false' for them.
We need the "is_autovacuum" variable in order to understand exactly what this
process was started for (VACUUM PARALLEL or parallel autovacuum).


Thanks everyone for the review!
Please, see an updated set of patches :
As I promised, I created a dedicated chapter for Parallel Vacuum description.
Both maintenance VACUUM and autovacuum now refer to this chapter.

I am pretty inexperienced in the documentation writing, so forgive me if
something is out of code style.

[1] https://www.postgresql.org/message-id/CAJDiXggH1bW%3D4n%2B55CGLvs_sRU4SYNXwYLZ37wvJ5H_3yURSPw%40mail.gmail.com

Thank you for working on this, very useful feature. Sharing a few thoughts:

1. Shouldn't we also cap by max_parallel_workers to avoid wasting DSM resources in parallel_vacuum_compute_workers?
2. Is it intentional that other autovacuum workers not yield cost limits to the parallel auto vacuum workers? Cost limits are distributed first equally to the autovacuum workers.
and then they share that. Therefore, parallel workers will be heavily throttled. IIUC, this problem doesn't exist with manual vacuum.
 If we don't fix this, at least we should document this.
3. Additionally, is there a point where, based on the cost limits, launching additional workers becomes counterproductive compared to running fewer workers and preventing it?
4. Would it make sense to add a table level override to disable parallelism or set parallel worker count?


I ran some perf tests to show the improvements with parallel vacuum and shared below.
System Configuration
--------------------
Hardware:
  CPU: 16 cores
  RAM: 128 GB
  Storage: NVMe SSDs
  OS: Ubuntu Linux

Workload Description
--------------------
Table: avtest
  - 5,000,000 rows
  - 9 columns: id (bigint PK), col1-col5 (int), col6 (text), col7 (timestamp),
    padding (text, 50 bytes)
  - 8 indexes:
      avtest_pkey  (col: id)        107 MB
      idx_av_col7  (col: col7)      107 MB
      idx_av_col2  (col: col2)       56 MB
      idx_av_col4  (col: col4)       56 MB
      idx_av_col5  (col: col5)       56 MB
      idx_av_col1  (col: col1)       56 MB
      idx_av_col3  (col: col3)       56 MB
      idx_av_col6  (col: col6)       35 MB
  - Total size: 1171 MB

Each test iteration:
  1. Delete 2,000,000 rows (40%) using: DELETE WHERE id % 5 IN (1, 2)
  2. CHECKPOINT to flush dirty pages
  3. Trigger autovacuum by setting autovacuum_vacuum_threshold = 100 and
     autovacuum_vacuum_scale_factor = 0 on the table
  4. Wait for autovacuum to complete (detected via server log)
  5. Re-insert the deleted rows and VACUUM to restore the table for the next run


Test Methodology
----------------
Worker configurations tested: 0, 2, 4, 7 parallel workers
  (7 is the maximum: nindexes - 1, since the leader always handles one index)

Two experiments were run with different cost-based vacuum delay settings:

  Experiment A: cost_limit=200, cost_delay=2ms
  Experiment B: cost_limit=60,  cost_delay=2ms

Common server settings for both experiments:
  shared_buffers        = 120 GB  (entire dataset fits in shared buffers)
  maintenance_work_mem  = 1 GB
  max_wal_size          = 100 GB  (prevents checkpoints during vacuum)
  min_wal_size          = 10 GB
  checkpoint_timeout    = 1 hour  (prevents time-based checkpoints)
  wal_buffers           = 128 MB
  max_parallel_workers  = 16
  max_worker_processes  = 24
  autovacuum_naptime    = 1s

Between every single run:
  1. PostgreSQL server is fully stopped (pg_ctl stop -m fast)
  2. OS page cache is dropped (echo 3 > /proc/sys/vm/drop_caches)
  3. Server is restarted with a clean log file
  4. After DELETE and CHECKPOINT, the server is stopped again, OS caches
     dropped again, and the server restarted -- so vacuum starts fully cold
  5. The autovacuum_max_parallel_workers GUC is reloaded via pg_ctl reload

Each configuration was tested for 5 iterations.

Timing is extracted from the PostgreSQL server log "system usage" line that
autovacuum emits at completion. This reports elapsed wall-clock time and CPU
time for the autovacuum worker leader process.


Results: Experiment A (cost_limit=200, cost_delay=2ms)
------------------------------------------------------

Workers  Iter1    Iter2    Iter3    Iter4    Iter5    Avg(s)  Speedup
-------  ------   ------   ------   ------   ------   ------  -------
0        66.21    79.11    66.27    77.11    66.30    71.00   1.00x
2        66.55    53.27    52.66    55.74    55.71    56.78   1.25x
4        51.50    51.74    65.07    52.06    70.25    58.12   1.22x
7        50.05    50.35    50.04    50.12    50.07    50.12   1.41x

CPU usage (leader process only):
Workers  Avg CPU user  Avg CPU sys
-------  -----------   ----------
0        3.04s         1.70s
2        1.24s         1.50s
4        0.78s         1.49s
7        0.79s         1.48s


Results: Experiment B (cost_limit=60, cost_delay=2ms)
-----------------------------------------------------

Workers  Iter1    Iter2    Iter3    Iter4    Iter5    Avg(s)  Speedup
-------  ------   ------   ------   ------   ------   ------  -------
0        199.00   195.26   191.44   191.90   191.67   193.85  1.00x
2        160.68   181.33   176.85   167.84   159.47   169.23  1.14x
4        154.02   165.02   174.33   164.16   156.53   162.81  1.19x
7        148.49   158.68   160.66   154.37   149.20   154.28  1.25x

CPU usage (leader process only):
Workers  Avg CPU user  Avg CPU sys
-------  -----------   ----------
0        3.06s         1.90s
2        1.28s         1.72s
4        0.80s         1.69s
7        0.82s         1.68s



Observations:

1. Parallel autovacuum provides consistent speedup. With cost_limit=200 and
   7 workers, vacuum completes 1.41x faster (71s -> 50s). With cost_limit=60,
   the speedup is 1.25x (194s -> 154s).
2. I see the benefit comes from parallelizing index vacuum. With 8 indexes totaling
   ~530 MB, parallel workers scan indexes concurrently instead of the leader
   scanning them one by one. The leader's CPU user time drops from ~3s to
   ~0.8s as index work is offloaded





Thanks,
Satya



pgsql-hackers by date:

Previous
From: SATYANARAYANA NARLAPURAM
Date:
Subject: Re: Add max_wal_replay_size connection parameter to libpq
Next
From: Fujii Masao
Date:
Subject: Re: Avoid multiple SetLatch() calls in procsignal_sigusr1_handler()