Re: POC: Parallel processing of indexes in autovacuum - Mailing list pgsql-hackers

From Daniil Davydov
Subject Re: POC: Parallel processing of indexes in autovacuum
Date
Msg-id CAJDiXgjnUdE6Sk4M0unmT+9dULyFAxcum2txQKpWTuo4uQ_oXQ@mail.gmail.com
Whole thread Raw
In response to Re: POC: Parallel processing of indexes in autovacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Sat, May 3, 2025 at 5:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> > In current implementation, the leader process sends a signal to the
> > a/v launcher, and the launcher tries to launch all requested workers.
> > But the number of workers never exceeds `autovacuum_max_workers`.
> > Thus, we will never have more a/v workers than in the standard case
> > (without this feature).
>
> I have concerns about this design. When autovacuuming on a single
> table consumes all available autovacuum_max_workers slots with
> parallel vacuum workers, the system becomes incapable of processing
> other tables. This means that when determining the appropriate
> autovacuum_max_workers value, users must consider not only the number
> of tables to be processed concurrently but also the potential number
> of parallel workers that might be launched. I think it would more make
> sense to maintain the existing autovacuum_max_workers parameter while
> introducing a new parameter that would either control the maximum
> number of parallel vacuum workers per autovacuum worker or set a
> system-wide cap on the total number of parallel vacuum workers.
>

For now we have max_parallel_index_autovac_workers - this GUC limits
the number of parallel a/v workers that can process a single table. I
agree that the scenario you provided is problematic.
The proposal to limit the total number of supportive a/v workers seems
attractive to me (I'll implement it as an experiment).

It seems to me that this question is becoming a key one. First we need
to determine the role of the user in the whole scheduling mechanism.
Should we allow users to determine priority? Will this priority affect
only within a single vacuuming cycle, or it will be more 'global'?
I guess I don't have enough expertise to determine this alone. I will
be glad to receive any suggestions.

> > About `at_params.nworkers = N` - that's exactly what we're doing (you
> > can see it in the `vacuum_rel` function). But we cannot fully reuse
> > code of VACUUM PARALLEL, because it creates its own processes via
> > dynamic bgworkers machinery.
> > As I said above - we don't want to consume additional resources. Also
> > we don't want to complicate communication between processes (the idea
> > is that a/v workers can only send signals to the a/v launcher).
>
> Could you elaborate on the reasons why you don't want to use
> background workers and avoid complicated communication between
> processes? I'm not sure whether these concerns provide sufficient
> justification for implementing its own parallel index processing.
>

Here are my thoughts on this. A/v worker has a very simple role - it
is born after the launcher's request and must do exactly one 'task' -
vacuum table or participate in parallel index vacuum.
We also have a dedicated 'launcher' role, meaning the whole design
implies that only the launcher is able to launch processes.
If we allow a/v worker to use bgworkers, then :
1) A/v worker will go far beyond his responsibility.
2) Its functionality will overlap with the functionality of the launcher.
3) Resource consumption can jump dramatically, which is unexpected for
the user. Autovacuum will also be dependent on other resources
(bgworkers pool). The current design does not imply this.

I wanted to create a patch that would fit into the existing mechanism
without drastic innovations. But if you think that the above is not so
important, then we can reuse VACUUM PARALLEL code and it would
simplify the final implementation)

--
Best regards,
Daniil Davydov



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Make COPY format extendable: Extract COPY TO format implementations
Next
From: Daniil Davydov
Date:
Subject: Re: POC: Parallel processing of indexes in autovacuum