Re: POC: Parallel processing of indexes in autovacuum - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: POC: Parallel processing of indexes in autovacuum
Date
Msg-id CAD21AoAxTkpkLtJDgrH9dXg_h+yzOZpOZj3B-4FjW1Mr4qEdbQ@mail.gmail.com
Whole thread Raw
In response to POC: Parallel processing of indexes in autovacuum  (Maxim Orlov <orlovmg@gmail.com>)
List pgsql-hackers
On Thu, May 22, 2025 at 10:48 AM Sami Imseih <samimseih@gmail.com> wrote:
>
> I started looking at the patch but I have some high level thoughts I would
> like to share before looking further.
>
> > > I find that the name "autovacuum_reserved_workers_num" is generic. It
> > > would be better to have a more specific name for parallel vacuum such
> > > as autovacuum_max_parallel_workers. This parameter is related to
> > > neither autovacuum_worker_slots nor autovacuum_max_workers, which
> > > seems fine to me. Also, max_parallel_maintenance_workers doesn't
> > > affect this parameter.
> > > .......
> > > I've also considered some alternative names. If we were to use
> > > parallel_maintenance_workers, it sounds like it controls the parallel
> > > degree for all operations using max_parallel_maintenance_workers,
> > > including CREATE INDEX. Similarly, vacuum_parallel_workers could be
> > > interpreted as affecting both autovacuum and manual VACUUM commands,
> > > suggesting that when users run "VACUUM (PARALLEL) t", the system would
> > > use their specified value for the parallel degree. I prefer
> > > autovacuum_parallel_workers or vacuum_parallel_workers.
> > >
> >
> > This was my headache when I created names for variables. Autovacuum
> > initially implies parallelism, because we have several parallel a/v
> > workers. So I think that parameter like
> > `autovacuum_max_parallel_workers` will confuse somebody.
> > If we want to have a more specific name, I would prefer
> > `max_parallel_index_autovacuum_workers`.
>
> I don't think we should have a separate pool of parallel workers for those
> that are used to support parallel autovacuum. At the end of the day, these
> are parallel workers and they should be capped by max_parallel_workers. I think
> it will be confusing if we claim these are parallel workers, but they
> are coming from
> a different pool.

I agree that parallel vacuum workers used during autovacuum should be
capped by the max_parallel_workers.

>
> I envision we have another GUC such as "max_parallel_autovacuum_workers"
> (which I think is a better name) that matches the behavior of
> "max_parallel_maintenance_worker". Meaning that the autovacuum workers
> still maintain their existing behavior ( launching a worker per table
> ), and if they do need
> to vacuum in parallel, they can draw from a pool of parallel workers.
>
> With the above said, I therefore think the reloption should actually be a number
> of parallel workers rather than a boolean. Let's take an example of a
> user that has 3 tables
> they wish to (auto)vacuum can process in parallel, and if available
> they wish each of these tables
> could be autovacuumed with 4 parallel workers. However, as to not
> overload the system, they
> cap the 'max_parallel_maintenance_worker' to something like 8. If it
> so happens that all
> 3 tables are auto-vacuumed at the same time, there may not be enough
> parallel workers,
> so one table will be a loser and be vacuumed in serial.

+1 for the reloption having a number of parallel workers, leaving
aside the name competition.

> That is
> acceptable, and a/v logging
> ( and perhaps other stat views ) should display this behavior: workers
> planned vs workers launched.

Agreed. The workers planned vs. launched is reported only with VERBOSE
option so we need to change it so that autovacuum can log it at least.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: POC: Parallel processing of indexes in autovacuum
Next
From: Andy Fan
Date:
Subject: Re: parallel_safe