Home > mailing lists

Re: POC: Parallel processing of indexes in autovacuum - Mailing list pgsql-hackers

From	Daniil Davydov
Subject	Re: POC: Parallel processing of indexes in autovacuum
Date	May 2 21:12:49
Msg-id	CAJDiXgi6ZQOoSEqj9RyZMEh+HHBtmW0+PHD85UNPtKch8ubvdg@mail.gmail.com Whole thread Raw
In response to	Re: POC: Parallel processing of indexes in autovacuum (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses	Re: POC: Parallel processing of indexes in autovacuum
List	pgsql-hackers

Tree view

On Thu, May 1, 2025 at 8:03 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> As I understand it, we initially disabled parallel vacuum for
> autovacuum because their objectives are somewhat contradictory.
> Parallel vacuum aims to accelerate the process by utilizing additional
> resources, while autovacuum is designed to perform cleaning operations
> with minimal impact on foreground transaction processing (e.g.,
> through vacuum delay).
>
Yep, we also decided that we must not create more a/v workers for
index processing.
In current implementation, the leader process sends a signal to the
a/v launcher, and the launcher tries to launch all requested workers.
But the number of workers never exceeds `autovacuum_max_workers`.
Thus, we will never have more a/v workers than in the standard case
(without this feature).

> Nevertheless, I see your point about the potential benefits of using
> parallel vacuum within autovacuum in specific scenarios. The crucial
> consideration is determining appropriate criteria for triggering
> parallel vacuum in autovacuum. Given that we currently support only
> parallel index processing, suitable candidates might be autovacuum
> operations on large tables that have a substantial number of
> sufficiently large indexes and a high volume of garbage tuples.
>
> Although the actual number of parallel workers ultimately depends on
> the number of eligible indexes, it might be beneficial to introduce a
> storage parameter, say parallel_vacuum_workers, that allows control
> over the number of parallel vacuum workers on a per-table basis.
>
For now, we have three GUC variables for this purpose:
max_parallel_index_autovac_workers, autovac_idx_parallel_min_rows,
autovac_idx_parallel_min_indexes.
That is, everything is as you said. But we are still conducting
research on this issue. I would like to get rid of some of these
parameters.

> Regarding implementation: I notice the WIP patch implements its own
> parallel vacuum mechanism for autovacuum. Have you considered simply
> setting at_params.nworkers to a value greater than zero?
>
About `at_params.nworkers = N` - that's exactly what we're doing (you
can see it in the `vacuum_rel` function). But we cannot fully reuse
code of VACUUM PARALLEL, because it creates its own processes via
dynamic bgworkers machinery.
As I said above - we don't want to consume additional resources. Also
we don't want to complicate communication between processes (the idea
is that a/v workers can only send signals to the a/v launcher).
As a result, we created our own implementation of parallel index
processing control - see changes in vacuumparallel.c and autovacuum.c.

--
Best regards,
Daniil Davydov

pgsql-hackers by date:

From: Nathan Bossart
Date: 02 May, 20:59:12
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER

From: Peter Geoghegan
Date: 02 May, 21:22:07
Subject: Re: Adding skip scan (including MDAM style range skip scan) to nbtree

Re: POC: Parallel processing of indexes in autovacuum - Mailing list pgsql-hackers

Previous

Next