Re: POC: Parallel processing of indexes in autovacuum - Mailing list pgsql-hackers
From | Sami Imseih |
---|---|
Subject | Re: POC: Parallel processing of indexes in autovacuum |
Date | |
Msg-id | CAA5RZ0vF+Lr-jU1LAZWTGUjboUETk8oLvaNBbA5ozX6dau+how@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Parallel processing of indexes in autovacuum (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-hackers |
> On Mon, May 5, 2025 at 5:21 PM Sami Imseih <samimseih@gmail.com> wrote: > > > > > >> On Sat, May 3, 2025 at 1:10 AM Daniil Davydov <3danissimo@gmail.com> wrote: > >> > > >> > On Sat, May 3, 2025 at 5:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >> > > > >> > > > In current implementation, the leader process sends a signal to the > >> > > > a/v launcher, and the launcher tries to launch all requested workers. > >> > > > But the number of workers never exceeds `autovacuum_max_workers`. > >> > > > Thus, we will never have more a/v workers than in the standard case > >> > > > (without this feature). > >> > > > >> > > I have concerns about this design. When autovacuuming on a single > >> > > table consumes all available autovacuum_max_workers slots with > >> > > parallel vacuum workers, the system becomes incapable of processing > >> > > other tables. This means that when determining the appropriate > >> > > autovacuum_max_workers value, users must consider not only the number > >> > > of tables to be processed concurrently but also the potential number > >> > > of parallel workers that might be launched. I think it would more make > >> > > sense to maintain the existing autovacuum_max_workers parameter while > >> > > introducing a new parameter that would either control the maximum > >> > > number of parallel vacuum workers per autovacuum worker or set a > >> > > system-wide cap on the total number of parallel vacuum workers. > >> > > > >> > > >> > For now we have max_parallel_index_autovac_workers - this GUC limits > >> > the number of parallel a/v workers that can process a single table. I > >> > agree that the scenario you provided is problematic. > >> > The proposal to limit the total number of supportive a/v workers seems > >> > attractive to me (I'll implement it as an experiment). > >> > > >> > It seems to me that this question is becoming a key one. First we need > >> > to determine the role of the user in the whole scheduling mechanism. > >> > Should we allow users to determine priority? Will this priority affect > >> > only within a single vacuuming cycle, or it will be more 'global'? > >> > I guess I don't have enough expertise to determine this alone. I will > >> > be glad to receive any suggestions. > >> > >> What I roughly imagined is that we don't need to change the entire > >> autovacuum scheduling, but would like autovacuum workers to decides > >> whether or not to use parallel vacuum during its vacuum operation > >> based on GUC parameters (having a global effect) or storage parameters > >> (having an effect on the particular table). The criteria of triggering > >> parallel vacuum in autovacuum might need to be somewhat pessimistic so > >> that we don't unnecessarily use parallel vacuum on many tables. > > > > > > Perhaps we should only provide a reloption, therefore only tables specified > > by the user via the reloption can be autovacuumed in parallel? > > > > This gives a targeted approach. Of course if multiple of these allowed tables > > are to be autovacuumed at the same time, some may not get all the workers, > > But that’s not different from if you are to manually vacuum in parallel the tables > > at the same time. > > > > What do you think ? > > +1. I think that's a good starting point. We can later introduce a new > GUC parameter that globally controls the maximum number of parallel > vacuum workers used in autovacuum, if necessary. and I this reloption should also apply to parallel heap vacuum in non-failsafe scenarios. In the failsafe case however, all tables will be eligible for parallel vacuum. Anyhow, that discussion could be taken in that thread, but wanted to point that out. -- Sami Imseih Amazon Web Services (AWS)
pgsql-hackers by date: