Re: parallel vacuum options/syntax - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: parallel vacuum options/syntax
Date
Msg-id CAD21AoDDUagqwbv7MDs-ovvU0sw989LQewkn3surqaqvmkMLzA@mail.gmail.com
Whole thread Raw
In response to Re: parallel vacuum options/syntax  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: parallel vacuum options/syntax  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
List pgsql-hackers
On Sun, Jan 5, 2020 at 7:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Jan 5, 2020 at 6:40 AM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> >
> > On Sun, Jan 05, 2020 at 08:54:15AM +0900, Masahiko Sawada wrote:
> > >On Thu, Jan 2, 2020 at 9:09 PM Amit Kapila <amit.kapila16@gmail.com>
> > >wrote:
> > >>
> > >> Hi,
> > >>
> > >> I am starting a new thread for some of the decisions for a parallel
> > >> vacuum in the hope to get feedback from more people.  There are
> > >> mainly two points for which we need some feedback.
> > >>
> > >> 1. Tomas Vondra has pointed out on the main thread [1] that by
> > >> default the parallel vacuum should be enabled similar to what we do
> > >> for Create Index.  As proposed, the patch enables it only when the
> > >> user specifies it (ex. Vacuum (Parallel 2) <tbl_name>;).   One of the
> > >> arguments in favor of enabling it by default as mentioned by Tomas is
> > >> "It's pretty much the same thing we did with vacuum throttling - it's
> > >> disabled for explicit vacuum by default, but you can enable it. If
> > >> you're worried about VACUUM causing issues, you should set cost
> > >> delay.".  Some of the arguments against enabling it are that it will
> > >> lead to use of more resources (like CPU, I/O) which users might or
> > >> might like.
> > >>
> > >
> > >I'm a bit wary of making parallel vacuum enabled by default. Single
> > >process vacuum does sequential reads/writes on most of indexes but
> > >parallel vacuum does random access random reads/writes. I've tested
> > >parallel vacuum on HDD and confirmed the performance is good but I'm
> > >concerned that it might be cause of more disk I/O than user expected.
> > >
> >
> > I understand the concern, but it's not clear to me why to apply this
> > defensive approach just to vacuum and not to all commands. Especially
> > when we do have a way to throttle vacuum (unlike pretty much any other
> > command) if I/O really is a scarce resource.
> >
> > As the vacuum workers are separate processes, each generating requests
> > with a sequential pattern, so I'd expect readahead to kick in and keep
> > the efficiency of sequential access pattern.
> >
>
> Right, I also think so.

Okay I understand.

>
> > >> Now, if we want to enable it by default, we need a way to disable it
> > >> as well and along with that, we need a way for users to specify a
> > >> parallel degree.  I have mentioned a few reasons why we need a
> > >> parallel degree for this operation in the email [2] on the main
> > >> thread.
> > >>
> > >> If parallel vacuum is *not* enabled by default, then I think the
> > >> current way to enable is fine which is as follows: Vacuum (Parallel
> > >> 2) <tbl_name>;
> > >>
> > >> Here, if the user doesn't specify parallel_degree, then we internally
> > >> decide based on number of indexes that support a parallel vacuum with
> > >> a maximum of max_parallel_maintenance_workers.
> > >>
> > >> If the parallel vacuum is enabled by default, then I could think of
> > >> the following ways:
> > >>
> > >> (a) Vacuum (disable_parallel) <tbl_name>;  Vacuum (Parallel
> > >> <parallel_degree>) <tbl_name>;
> > >>
> > >> (b) Vacuum (Parallel <parallel_degree>) <tbl_name>;  If user
> > >> specifies parallel_degree as 0, then disable parallelism.
> > >>
> > >> (c) ... Any better ideas?
> > >>
> > >
> > >If parallel vacuum is enabled by default, I would prefer (b) but I
> > >don't think it's a good idea to accept 0 as parallel degree. If we want
> > >to disable parallel vacuum we should max_parallel_maintenance_workers
> > >to 0 instead.
> > >
> >
> > IMO that just makes the interaction between vacuum options and the GUC
> > even more complicated/confusing.
> >
>
> Yeah, I am also not sure if that will be a good idea.
>
> > If we want to have a vacuum option to determine parallel degree, we
> > should probably have a vacuum option to disable parallelism using just a
> > vacuum option. I don't think 0 is too bad, and disable_parallel seems a
> > bit awkward. Maybe we could use NOPARALLEL (in addition to PARALLEL n).
> > That's what Oracle does, so it's not entirely without a precedent.
> >
>
> We can go either way (using 0 for parallel to indicate disable
> parallelism or by introducing a new option like NOPARALLEL).  I think
> initially we can avoid introducing more options and just go with
> 'Parallel 0' and if we find a lot of people find it inconvenient, then
> we can always introduce a new option later.

Hmm I'm confused. Specifying NOPARALLEL or PARALLEL 0 is the same as
setting max_parallel_maintenance_workers to 0, right? We normally set
max_parallel_workers_per_gather to 0 to disable parallel queries on a
query. So I think that disabling parallel vacuum by setting
max_parallel_maintenance_workers to 0 is the same concept. Regarding
proposed two options we already have storage parameter
parallel_workers and it accepts 0 but PARALLEL 0 looks like
contradicted at a glance. And NOPARALLEL is inconsistent with existing
DISABLE_XXX options and it's a bit awkward to specify like (NOPARALLEL
off).

Regards,

--
Masahiko Sawada  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: parallel vacuum options/syntax
Next
From: Vik Fearing
Date:
Subject: Re: WIP: System Versioned Temporal Table