Re: POC: Parallel processing of indexes in autovacuum - Mailing list pgsql-hackers

From Daniil Davydov
Subject Re: POC: Parallel processing of indexes in autovacuum
Date
Msg-id CAJDiXgigcF3CMY86oREdQvxUDaUDFihkK9f78rdEyLTLeB0hdA@mail.gmail.com
Whole thread Raw
In response to Re: POC: Parallel processing of indexes in autovacuum  (Sami Imseih <samimseih@gmail.com>)
Responses Re: POC: Parallel processing of indexes in autovacuum
List pgsql-hackers
On Fri, May 2, 2025 at 11:58 PM Sami Imseih <samimseih@gmail.com> wrote:
>
> I am generally -1 on the idea of autovacuum performing parallel
> index vacuum, because I always felt that the parallel option should
> be employed in a targeted manner for a specific table. if you have a bunch
> of large tables, some more important than others, a/c may end
> up using parallel resources on the least important tables and you
> will have to adjust a/v settings per table, etc to get the right table
> to be parallel index vacuumed by a/v.

Hm, this is a good point. I think I should clarify one moment - in
practice, there is a common situation when users have one huge table
among all databases (with 80+ indexes created on it). But, of course,
in general there may be few such tables.
But we can still adjust the autovac_idx_parallel_min_rows parameter.
If a table has a lot of dead tuples => it is actively used => table is
important (?).
Also, if the user can really determine the "importance" of each of the
tables - we can provide an appropriate table option. Tables with this
option set will be processed in parallel in priority order. What do
you think about such an idea?

>
> Also, with the TIDStore improvements for index cleanup, and the practical
> elimination of multi-pass index vacuums, I see this being even less
> convincing as something to add to a/v.

If I understood correctly, then we are talking about the fact that
TIDStore can store so many tuples that in fact a second pass is never
needed.
But the number of passes does not affect the presented optimization in
any way. We must think about a large number of indexes that must be
processed. Even within a single pass we can have a 40% increase in
speed.

>
> Now, If I am going to allocate extra workers to run vacuum in parallel, why
> not just provide more autovacuum workers instead so I can get more tables
> vacuumed within a span of time?

For now, only one process can clean up indexes, so I don't see how
increasing the number of a/v workers will help in the situation that I
mentioned above.
Also, we don't consume additional resources during autovacuum in this
patch - total number of a/v workers always <= autovacuum_max_workers.

BTW, see v2 patch, attached to this letter (bug fixes) :-)

--
Best regards,
Daniil Davydov

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: SQL functions: avoid making a tuplestore unnecessarily
Next
From: Tom Lane
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER