Re: POC: Parallel processing of indexes in autovacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: POC: Parallel processing of indexes in autovacuum
Date
Msg-id CAD21AoB7v5tLPXLK=qmtt6PaEC1f+Fb-gh+MwAbXfm6x4eZGNw@mail.gmail.com
Whole thread Raw
In response to Re: POC: Parallel processing of indexes in autovacuum  (Daniil Davydov <3danissimo@gmail.com>)
Responses Re: POC: Parallel processing of indexes in autovacuum
List pgsql-hackers
On Sun, Nov 23, 2025 at 7:02 AM Daniil Davydov <3danissimo@gmail.com> wrote:
>
> Hi,
>
> On Sun, Nov 23, 2025 at 5:51 AM Sami Imseih <samimseih@gmail.com> wrote:
> >
> > > > nworkers has a double meaning. The return value of
> > > > AutoVacuumReserveParallelWorkers
> > > > is nreserved. I think this should be
> > > >
> > > > ```
> > > > nreserved = AutoVacuumReserveParallelWorkers(nworkers);
> > > > ```
> > > >
> > > > and nreserved becomes the authoritative value for the number of parallel
> > > > workers after that point.
> >
> > I could not find this pattern being used in the code base.
> > I think it will be better and more in-line without what we generally do
> > and pass-by-reference and update the value inside
> > AutoVacuumReserveParallelWorkers:
> >
> > ```
> > AutoVacuumReserveParallelWorkers(&nworkers).
> > ```
>
> Maybe I just don't like functions with side effects, but this function will
> have ones anyway. I'll add logic with passing by reference as you
> suggested.
>
> >
> > >> ---
> > >> { name => 'autovacuum_max_parallel_workers', type => 'int', context =>
> > >> 'PGC_SIGHUP', group => 'VACUUM_AUTOVACUUM',
> > >>   short_desc => 'Maximum number of parallel autovacuum workers, that
> > >> can be taken from bgworkers pool.',
> > >>   long_desc => 'This parameter is capped by "max_worker_processes"
> > >> (not by "autovacuum_max_workers"!).',
> > >>   variable => 'autovacuum_max_parallel_workers',
> > >>   boot_val => '0',
> > >>   min => '0',
> > >>   max => 'MAX_BACKENDS',
> > >> },
> > >>
> > >> Parallel vacuum in autovacuum can be used only when users set the
> > >> autovacuum_parallel_workers storage parameter. How about using the
> > >> default value 2 for autovacuum_max_parallel_workers GUC parameter?
> >
> > > Sounds reasonable, +1 for it.
> >
> > v15-0004 has an incorrect default value for `autovacuum_max_parallel_workers`.
> > It should now be 2.
> >
> > +          Sets the maximum number of parallel autovacuum workers that
> > +          can be used for parallel index vacuuming at one time. Is capped by
> > +          <xref linkend="guc-max-worker-processes"/>. The default is 0,
> > +          which means no parallel index vacuuming.
>
> Thanks for noticing it! Fixed.
>
> I am sending an updated set of patches.

Thank you for updating the patch! I've reviewed the 0001 patch and
here are some comments:

---
+        /* Remember how many workers we have reserved. */
+        av_nworkers_reserved += *nworkers;

I think we can simply assign *nworkers to av_nworkers_reserved instead
of incrementing it as we're sure that av_nworkers_reserved is 0 at the
beginning of this function.

---
+static void
+AutoVacuumReleaseAllParallelWorkers(void)
+{
+        /* Only leader worker can call this function. */
+        Assert(AmAutoVacuumWorkerProcess() && !IsParallelWorker());
+
+        if (av_nworkers_reserved > 0)
+                AutoVacuumReleaseParallelWorkers(av_nworkers_reserved);
+}

We can put an assertion at the end of the function to verify that this
worker doesn't reserve any worker.

---
+static void
+autovacuum_worker_before_shmem_exit(int code, Datum arg)
+{
+        if (code != 0)
+                AutoVacuumReleaseAllParallelWorkers();
+}

I think it would be more future-proof if we call
AutoVacuumReleaseAllParallelWorkers() regardless of the code if there
is no strong reason why we check the code there.

---
+        before_shmem_exit(autovacuum_worker_before_shmem_exit, 0);

     /*
      * Create a per-backend PGPROC struct in shared memory.  We must do this
      * before we can use LWLocks or access any shared memory.
      */
     InitProcess();

I think it's better to register the
autovacuum_worker_before_shmem_exit() after the process
initialization. The function could use LWLocks to release the reserved
workers. Given that AutoVacuumReleaseAllParallelWorkers() doesn't try
to release the reserved worker when av_nworkers_reserved == 0, but it
would be more future-proof to do that after the basic process
initialization processes.

How about renaming autovacuum_worker_before_shmem_exit() to
autovacuum_worker_onexit()?

---
IIUC the patch needs to implement some logic to propagate the updates
of vacuum delay parameters to parallel vacuum workers. Are you still
working on it? Or shall I draft this part on top of the 0001 patch?

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Newly created replication slot may be invalidated by checkpoint
Next
From: Jacob Champion
Date:
Subject: Re: Custom oauth validator options