Re: POC: Parallel processing of indexes in autovacuum - Mailing list pgsql-hackers

From Sami Imseih
Subject Re: POC: Parallel processing of indexes in autovacuum
Date
Msg-id CAA5RZ0vfBg=c_0Sa1Tpxv8tueeBk8C5qTf9TrxKBbXUqPc99Ag@mail.gmail.com
Whole thread Raw
In response to Re: POC: Parallel processing of indexes in autovacuum  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: POC: Parallel processing of indexes in autovacuum
Re: POC: Parallel processing of indexes in autovacuum
List pgsql-hackers

On Sat, May 3, 2025 at 1:10 AM Daniil Davydov <3danissimo@gmail.com> wrote:
>
> On Sat, May 3, 2025 at 5:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> >
> > > In current implementation, the leader process sends a signal to the
> > > a/v launcher, and the launcher tries to launch all requested workers.
> > > But the number of workers never exceeds `autovacuum_max_workers`.
> > > Thus, we will never have more a/v workers than in the standard case
> > > (without this feature).
> >
> > I have concerns about this design. When autovacuuming on a single
> > table consumes all available autovacuum_max_workers slots with
> > parallel vacuum workers, the system becomes incapable of processing
> > other tables. This means that when determining the appropriate
> > autovacuum_max_workers value, users must consider not only the number
> > of tables to be processed concurrently but also the potential number
> > of parallel workers that might be launched. I think it would more make
> > sense to maintain the existing autovacuum_max_workers parameter while
> > introducing a new parameter that would either control the maximum
> > number of parallel vacuum workers per autovacuum worker or set a
> > system-wide cap on the total number of parallel vacuum workers.
> >
>
> For now we have max_parallel_index_autovac_workers - this GUC limits
> the number of parallel a/v workers that can process a single table. I
> agree that the scenario you provided is problematic.
> The proposal to limit the total number of supportive a/v workers seems
> attractive to me (I'll implement it as an experiment).
>
> It seems to me that this question is becoming a key one. First we need
> to determine the role of the user in the whole scheduling mechanism.
> Should we allow users to determine priority? Will this priority affect
> only within a single vacuuming cycle, or it will be more 'global'?
> I guess I don't have enough expertise to determine this alone. I will
> be glad to receive any suggestions.

What I roughly imagined is that we don't need to change the entire
autovacuum scheduling, but would like autovacuum workers to decides
whether or not to use parallel vacuum during its vacuum operation
based on GUC parameters (having a global effect) or storage parameters
(having an effect on the particular table). The criteria of triggering
parallel vacuum in autovacuum might need to be somewhat pessimistic so
that we don't unnecessarily use parallel vacuum on many tables.

Perhaps we should only provide a reloption, therefore only tables specified 
by the user via the reloption can be autovacuumed  in parallel? 

This gives a targeted approach. Of course if multiple of these allowed tables 
are to be autovacuumed at the same time, some may not get all the workers,
But that’s not different from if you are to manually vacuum in parallel the tables 
at the same time. 

What do you think ? 

Sami 

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Improve hash join's handling of tuples with null join keys
Next
From: Tatsuo Ishii
Date:
Subject: Re: Row pattern recognition