Re: POC: Parallel processing of indexes in autovacuum - Mailing list pgsql-hackers
From | Daniil Davydov |
---|---|
Subject | Re: POC: Parallel processing of indexes in autovacuum |
Date | |
Msg-id | CAJDiXgjnUdE6Sk4M0unmT+9dULyFAxcum2txQKpWTuo4uQ_oXQ@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Parallel processing of indexes in autovacuum (Masahiko Sawada <sawada.mshk@gmail.com>) |
List | pgsql-hackers |
On Sat, May 3, 2025 at 5:28 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > In current implementation, the leader process sends a signal to the > > a/v launcher, and the launcher tries to launch all requested workers. > > But the number of workers never exceeds `autovacuum_max_workers`. > > Thus, we will never have more a/v workers than in the standard case > > (without this feature). > > I have concerns about this design. When autovacuuming on a single > table consumes all available autovacuum_max_workers slots with > parallel vacuum workers, the system becomes incapable of processing > other tables. This means that when determining the appropriate > autovacuum_max_workers value, users must consider not only the number > of tables to be processed concurrently but also the potential number > of parallel workers that might be launched. I think it would more make > sense to maintain the existing autovacuum_max_workers parameter while > introducing a new parameter that would either control the maximum > number of parallel vacuum workers per autovacuum worker or set a > system-wide cap on the total number of parallel vacuum workers. > For now we have max_parallel_index_autovac_workers - this GUC limits the number of parallel a/v workers that can process a single table. I agree that the scenario you provided is problematic. The proposal to limit the total number of supportive a/v workers seems attractive to me (I'll implement it as an experiment). It seems to me that this question is becoming a key one. First we need to determine the role of the user in the whole scheduling mechanism. Should we allow users to determine priority? Will this priority affect only within a single vacuuming cycle, or it will be more 'global'? I guess I don't have enough expertise to determine this alone. I will be glad to receive any suggestions. > > About `at_params.nworkers = N` - that's exactly what we're doing (you > > can see it in the `vacuum_rel` function). But we cannot fully reuse > > code of VACUUM PARALLEL, because it creates its own processes via > > dynamic bgworkers machinery. > > As I said above - we don't want to consume additional resources. Also > > we don't want to complicate communication between processes (the idea > > is that a/v workers can only send signals to the a/v launcher). > > Could you elaborate on the reasons why you don't want to use > background workers and avoid complicated communication between > processes? I'm not sure whether these concerns provide sufficient > justification for implementing its own parallel index processing. > Here are my thoughts on this. A/v worker has a very simple role - it is born after the launcher's request and must do exactly one 'task' - vacuum table or participate in parallel index vacuum. We also have a dedicated 'launcher' role, meaning the whole design implies that only the launcher is able to launch processes. If we allow a/v worker to use bgworkers, then : 1) A/v worker will go far beyond his responsibility. 2) Its functionality will overlap with the functionality of the launcher. 3) Resource consumption can jump dramatically, which is unexpected for the user. Autovacuum will also be dependent on other resources (bgworkers pool). The current design does not imply this. I wanted to create a patch that would fit into the existing mechanism without drastic innovations. But if you think that the above is not so important, then we can reuse VACUUM PARALLEL code and it would simplify the final implementation) -- Best regards, Daniil Davydov
pgsql-hackers by date: