Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers
Date
Msg-id CAA4eK1KLRZOhZyDT0ypUt_f1TKsVe6u612+uNXa9XRGwYotSQA@mail.gmail.com
Whole thread Raw
In response to Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers  (Masahiko Sawada <masahiko.sawada@2ndquadrant.com>)
Responses Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers
List pgsql-hackers
On Sat, Oct 3, 2020 at 6:55 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Sat, 3 Oct 2020 at 20:03, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Sep 30, 2020 at 9:23 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > >
> > > On Tue, Sep 22, 2020 at 3:20 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > > > It would be good if we were consistent with these parallel options.
> > > > Right now max_parallel_workers_per_gather will restrict the
> > > > parallel_workers reloption.  I'd say this
> > > > max_parallel_workers_per_gather is similar to
> > > > max_parallel_maintenance_workers here and the PARALLEL vacuum option
> > > > is like the parallel_workers reloption.
> > > >
> > > > If we want VACUUM's parallel option to work the same way as that then
> > > > max_parallel_maintenance_workers should restrict whatever is mentioned
> > > > in VACUUM PARALLEL.
> > > >
> > > > Or perhaps this is slightly different as the user is explicitly asking
> > > > for this in the command, but you could likely say the same about ALTER
> > > > TABLE <table> SET (parallel_workers = N); too.
> > >
> > > There is a subtle difference between these two cases. In the case of a
> > > query, there may be multiple table scans involved, all under the same
> > > Gather node. So a limit on the Gather node is to some degree a
> > > separate constraint on the overall query plan from the reloption
> > > applied to a particular table. So there is at least some kind of an
> > > argument that it's sensible to combine those limits somehow. I'm not
> > > sure I believe it, though. The user probably wants exactly the number
> > > of workers they specify, not the GUC value.
> > >
> > > However, in the VACUUM case, there's no possibility of distinguishing
> > > between the parallel operation as a whole and the expectations for a
> > > particular table. It's a single operation.
> > >
> >
> >
> > But the same is true for the 'Create Index' operation as well where we
> > follow the same thing. We will use the number of workers as specified
> > in reloption (parallel_workers) which is then limited by
> > max_parallel_maintenance_workers.
>
> Both opinions have a valid point.
>
> To make the behavior of parallel vacuum more consistent with other
> parallel maintenance commands (i.g., only parallel INDEX CREATE for
> now), as a second idea, can we make use of parallel_workers reloption
> in parallel vacuum case as well? That is, when PARALLEL option without
> an integer is specified or VACUUM command without PARALLEL option, the
> parallel degree is the number of indexes that support parallel vacuum
> and are bigger than min_parallel_index_scan_size. If the
> parallel_workers reloption of the table is set we use it instead. In
> both cases, the parallel degree is capped by
> max_parallel_maintenance_workers. OTOH when PARALLEL option with an
> integer is specified, the parallel degree is the specified integer
> value and it's capped by max_parallel_workers and the number of
> indexes that support parallel vacuum and are bigger than
> min_parallel_index_scan_size.
>

This seems more difficult to explain and have more variable parts. I
think one of the blogs I recently read about this work [1] (see
section:
Parallel VACUUM & Better Support for Append-only Workloads) explains
the currently implemented behavior (related to the workers) nicely and
in simple words. Now unless I or the person who wrote that blog missed
something it appears to me that the current implemented behavior is
understood by others who might not be even directly involved in this
work which to some extent indicates that users will be able to use
currently implemented behavior without difficulty. I think we can keep
the current behavior as it is and wait to see if we see any complaints
from the users trying to use it.

[1] -
https://pganalyze.com/blog/postgres13-better-performance-monitoring-usability?utm_source=PostgresWeeklyPrimary09302020

-- 
With Regards,
Amit Kapila.



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers