Re: parallel vacuum options/syntax - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: parallel vacuum options/syntax
Date
Msg-id CAA4eK1+Hnjj1sKd1hSzRFcOzxcxEAC48gX88hAbUMJ8amU1eYA@mail.gmail.com
Whole thread Raw
In response to Re: parallel vacuum options/syntax  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: parallel vacuum options/syntax  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: parallel vacuum options/syntax  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Sun, Jan 5, 2020 at 6:40 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Sun, Jan 05, 2020 at 08:54:15AM +0900, Masahiko Sawada wrote:
> >On Thu, Jan 2, 2020 at 9:09 PM Amit Kapila <amit.kapila16@gmail.com>
> >wrote:
> >>
> >> Hi,
> >>
> >> I am starting a new thread for some of the decisions for a parallel
> >> vacuum in the hope to get feedback from more people.  There are
> >> mainly two points for which we need some feedback.
> >>
> >> 1. Tomas Vondra has pointed out on the main thread [1] that by
> >> default the parallel vacuum should be enabled similar to what we do
> >> for Create Index.  As proposed, the patch enables it only when the
> >> user specifies it (ex. Vacuum (Parallel 2) <tbl_name>;).   One of the
> >> arguments in favor of enabling it by default as mentioned by Tomas is
> >> "It's pretty much the same thing we did with vacuum throttling - it's
> >> disabled for explicit vacuum by default, but you can enable it. If
> >> you're worried about VACUUM causing issues, you should set cost
> >> delay.".  Some of the arguments against enabling it are that it will
> >> lead to use of more resources (like CPU, I/O) which users might or
> >> might like.
> >>
> >
> >I'm a bit wary of making parallel vacuum enabled by default. Single
> >process vacuum does sequential reads/writes on most of indexes but
> >parallel vacuum does random access random reads/writes. I've tested
> >parallel vacuum on HDD and confirmed the performance is good but I'm
> >concerned that it might be cause of more disk I/O than user expected.
> >
>
> I understand the concern, but it's not clear to me why to apply this
> defensive approach just to vacuum and not to all commands. Especially
> when we do have a way to throttle vacuum (unlike pretty much any other
> command) if I/O really is a scarce resource.
>
> As the vacuum workers are separate processes, each generating requests
> with a sequential pattern, so I'd expect readahead to kick in and keep
> the efficiency of sequential access pattern.
>

Right, I also think so.

> >> Now, if we want to enable it by default, we need a way to disable it
> >> as well and along with that, we need a way for users to specify a
> >> parallel degree.  I have mentioned a few reasons why we need a
> >> parallel degree for this operation in the email [2] on the main
> >> thread.
> >>
> >> If parallel vacuum is *not* enabled by default, then I think the
> >> current way to enable is fine which is as follows: Vacuum (Parallel
> >> 2) <tbl_name>;
> >>
> >> Here, if the user doesn't specify parallel_degree, then we internally
> >> decide based on number of indexes that support a parallel vacuum with
> >> a maximum of max_parallel_maintenance_workers.
> >>
> >> If the parallel vacuum is enabled by default, then I could think of
> >> the following ways:
> >>
> >> (a) Vacuum (disable_parallel) <tbl_name>;  Vacuum (Parallel
> >> <parallel_degree>) <tbl_name>;
> >>
> >> (b) Vacuum (Parallel <parallel_degree>) <tbl_name>;  If user
> >> specifies parallel_degree as 0, then disable parallelism.
> >>
> >> (c) ... Any better ideas?
> >>
> >
> >If parallel vacuum is enabled by default, I would prefer (b) but I
> >don't think it's a good idea to accept 0 as parallel degree. If we want
> >to disable parallel vacuum we should max_parallel_maintenance_workers
> >to 0 instead.
> >
>
> IMO that just makes the interaction between vacuum options and the GUC
> even more complicated/confusing.
>

Yeah, I am also not sure if that will be a good idea.

> If we want to have a vacuum option to determine parallel degree, we
> should probably have a vacuum option to disable parallelism using just a
> vacuum option. I don't think 0 is too bad, and disable_parallel seems a
> bit awkward. Maybe we could use NOPARALLEL (in addition to PARALLEL n).
> That's what Oracle does, so it's not entirely without a precedent.
>

We can go either way (using 0 for parallel to indicate disable
parallelism or by introducing a new option like NOPARALLEL).  I think
initially we can avoid introducing more options and just go with
'Parallel 0' and if we find a lot of people find it inconvenient, then
we can always introduce a new option later.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Surafel Temesgen
Date:
Subject: Re: WIP: System Versioned Temporal Table
Next
From: Fabien COELHO
Date:
Subject: Re: Patch to document base64 encoding