Thread: parallel vacuum options/syntax

parallel vacuum options/syntax

From
Amit Kapila
Date:
Hi,

I am starting a new thread for some of the decisions for a parallel vacuum in the hope to get feedback from more people.  There are mainly two points for which we need some feedback.

1. Tomas Vondra has pointed out on the main thread [1] that by default the parallel vacuum should be enabled similar to what we do for Create Index.  As proposed, the patch enables it only when the user specifies it (ex. Vacuum (Parallel 2) <tbl_name>;).   One of the arguments in favor of enabling it by default as mentioned by Tomas is "It's pretty much the same thing we did with vacuum throttling - it's disabled for explicit vacuum by default, but you can enable it. If you're worried about VACUUM causing issues, you should set cost delay.".  Some of the arguments against enabling it are that it will lead to use of more resources (like CPU, I/O) which users might or might like.

Now, if we want to enable it by default, we need a way to disable it as well and along with that, we need a way for users to specify a parallel degree.  I have mentioned a few reasons why we need a parallel degree for this operation in the email [2] on the main thread.

If parallel vacuum is *not* enabled by default, then I think the current way to enable is fine which is as follows:
Vacuum (Parallel 2) <tbl_name>;

Here, if the user doesn't specify parallel_degree, then we internally decide based on number of indexes that support a parallel vacuum with a maximum of max_parallel_maintenance_workers.

If the parallel vacuum is enabled by default, then I could think of the following ways:
(a) Vacuum (disable_parallel) <tbl_name>;  Vacuum (Parallel <parallel_degree>) <tbl_name>;
(b) Vacuum (Parallel <parallel_degree>) <tbl_name>;  If user specifies parallel_degree as 0, then disable parallelism.
(c) ... Any better ideas?

2. The patch provides a FAST option (based on suggestion by Robert) for a parallel vacuum which will make it behave like vacuum_cost_delay = 0 which means it will disable throttling.  So,
VACUUM (PARALLEL n, FAST) <tbl_name> will allow the parallel vacuum to run without resource throttling.  Tomas thinks that we don't need such an option as the same can be served by setting vacuum_cost_delay = 0 which is a valid argument, but OTOH, providing an option to the user which can make his life easier is not a bad idea either.

Thoughts?

[1] - https://www.postgresql.org/message-id/20191229212354.tqivttn23lxjg2jz%40development
[2] - https://www.postgresql.org/message-id/CAA4eK1%2B1o-BaPvJnK7BPThTryx3MRDS%2BmCf9eVVZT%3DSVJ8mwLg%40mail.gmail.com

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: parallel vacuum options/syntax

From
Guillaume Lelarge
Date:
Le jeu. 2 janv. 2020 à 13:09, Amit Kapila <amit.kapila16@gmail.com> a écrit :
Hi,

I am starting a new thread for some of the decisions for a parallel vacuum in the hope to get feedback from more people.  There are mainly two points for which we need some feedback.

1. Tomas Vondra has pointed out on the main thread [1] that by default the parallel vacuum should be enabled similar to what we do for Create Index.  As proposed, the patch enables it only when the user specifies it (ex. Vacuum (Parallel 2) <tbl_name>;).   One of the arguments in favor of enabling it by default as mentioned by Tomas is "It's pretty much the same thing we did with vacuum throttling - it's disabled for explicit vacuum by default, but you can enable it. If you're worried about VACUUM causing issues, you should set cost delay.".  Some of the arguments against enabling it are that it will lead to use of more resources (like CPU, I/O) which users might or might like.

Now, if we want to enable it by default, we need a way to disable it as well and along with that, we need a way for users to specify a parallel degree.  I have mentioned a few reasons why we need a parallel degree for this operation in the email [2] on the main thread.

If parallel vacuum is *not* enabled by default, then I think the current way to enable is fine which is as follows:
Vacuum (Parallel 2) <tbl_name>;

Here, if the user doesn't specify parallel_degree, then we internally decide based on number of indexes that support a parallel vacuum with a maximum of max_parallel_maintenance_workers.

If the parallel vacuum is enabled by default, then I could think of the following ways:
(a) Vacuum (disable_parallel) <tbl_name>;  Vacuum (Parallel <parallel_degree>) <tbl_name>;
(b) Vacuum (Parallel <parallel_degree>) <tbl_name>;  If user specifies parallel_degree as 0, then disable parallelism.
(c) ... Any better ideas?


AFAICT, every parallel-able statement use parallelisation by default, so it wouldn't be consistent if VACUUM behaves some other way.

So, (c) has my vote.

2. The patch provides a FAST option (based on suggestion by Robert) for a parallel vacuum which will make it behave like vacuum_cost_delay = 0 which means it will disable throttling.  So,
VACUUM (PARALLEL n, FAST) <tbl_name> will allow the parallel vacuum to run without resource throttling.  Tomas thinks that we don't need such an option as the same can be served by setting vacuum_cost_delay = 0 which is a valid argument, but OTOH, providing an option to the user which can make his life easier is not a bad idea either.


The user already has an option (the vacuum_cost_delay GUC). So I kinda agree with Tomas on this.


--
Guillaume.

Re: parallel vacuum options/syntax

From
Dilip Kumar
Date:
On Thu, Jan 2, 2020 at 5:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Hi,
>
> I am starting a new thread for some of the decisions for a parallel vacuum in the hope to get feedback from more
people. There are mainly two points for which we need some feedback. 
>
> 1. Tomas Vondra has pointed out on the main thread [1] that by default the parallel vacuum should be enabled similar
towhat we do for Create Index.  As proposed, the patch enables it only when the user specifies it (ex. Vacuum (Parallel
2)<tbl_name>;).   One of the arguments in favor of enabling it by default as mentioned by Tomas is "It's pretty much
thesame thing we did with vacuum throttling - it's disabled for explicit vacuum by default, but you can enable it. If
you'reworried about VACUUM causing issues, you should set cost delay.".  Some of the arguments against enabling it are
thatit will lead to use of more resources (like CPU, I/O) which users might or might like. 
>
> Now, if we want to enable it by default, we need a way to disable it as well and along with that, we need a way for
usersto specify a parallel degree.  I have mentioned a few reasons why we need a parallel degree for this operation in
theemail [2] on the main thread. 
>
> If parallel vacuum is *not* enabled by default, then I think the current way to enable is fine which is as follows:
> Vacuum (Parallel 2) <tbl_name>;
>
> Here, if the user doesn't specify parallel_degree, then we internally decide based on number of indexes that support
aparallel vacuum with a maximum of max_parallel_maintenance_workers. 
>
> If the parallel vacuum is enabled by default, then I could think of the following ways:
> (a) Vacuum (disable_parallel) <tbl_name>;  Vacuum (Parallel <parallel_degree>) <tbl_name>;
> (b) Vacuum (Parallel <parallel_degree>) <tbl_name>;  If user specifies parallel_degree as 0, then disable
parallelism.
> (c) ... Any better ideas?

IMHO, it's better to keep the parallelism enables by default.  Because
if the user is giving an explicit vacuum then better to keep it fast
by default.  However, I agree that we can provide an option for the
user to disable it and provide the parallel degree with the vacuum
command something like option (b).
>
> 2. The patch provides a FAST option (based on suggestion by Robert) for a parallel vacuum which will make it behave
likevacuum_cost_delay = 0 which means it will disable throttling.  So, 
> VACUUM (PARALLEL n, FAST) <tbl_name> will allow the parallel vacuum to run without resource throttling.  Tomas thinks
thatwe don't need such an option as the same can be served by setting vacuum_cost_delay = 0 which is a valid argument,
butOTOH, providing an option to the user which can make his life easier is not a bad idea either. 

I agree that there is already an option to run it without cost delay
but there is no harm in providing extra power to the user where he can
run a particular vacuum command without IO throttling.  So +1 for the
FAST option.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: parallel vacuum options/syntax

From
Amit Kapila
Date:
On Fri, Jan 3, 2020 at 8:50 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Jan 2, 2020 at 5:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > If parallel vacuum is *not* enabled by default, then I think the current way to enable is fine which is as
follows:
> > Vacuum (Parallel 2) <tbl_name>;
> >
> > Here, if the user doesn't specify parallel_degree, then we internally decide based on number of indexes that
supporta parallel vacuum with a maximum of max_parallel_maintenance_workers.
 
> >
> > If the parallel vacuum is enabled by default, then I could think of the following ways:
> > (a) Vacuum (disable_parallel) <tbl_name>;  Vacuum (Parallel <parallel_degree>) <tbl_name>;
> > (b) Vacuum (Parallel <parallel_degree>) <tbl_name>;  If user specifies parallel_degree as 0, then disable
parallelism.
> > (c) ... Any better ideas?
>
> IMHO, it's better to keep the parallelism enables by default.  Because
> if the user is giving an explicit vacuum then better to keep it fast
> by default.

Okay.

>  However, I agree that we can provide an option for the
> user to disable it and provide the parallel degree with the vacuum
> command something like option (b).
>

The option (b) has some advantage over (a) that we don't need to
invent multiple options to enable/disable parallelism for vacuum.
However, it might appear awkward to set parallel_degree as 0 (Vacuum
(Parallel 0) tbl_name) to disable parallelism.  Having said that, we
already have some precedence wherein if we set parameters like
statement_timeout, lock_timeout, etc to zero, it disables the timeout.
So, it won't be insane if we choose this option.

Does anyone else have any opinion on what makes sense here?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: parallel vacuum options/syntax

From
Amit Kapila
Date:
On Thu, Jan 2, 2020 at 7:09 PM Guillaume Lelarge <guillaume@lelarge.info> wrote:
>
> Le jeu. 2 janv. 2020 à 13:09, Amit Kapila <amit.kapila16@gmail.com> a écrit :
>>
>> If parallel vacuum is *not* enabled by default, then I think the current way to enable is fine which is as follows:
>> Vacuum (Parallel 2) <tbl_name>;
>>
>> Here, if the user doesn't specify parallel_degree, then we internally decide based on number of indexes that support
aparallel vacuum with a maximum of max_parallel_maintenance_workers. 
>>
>> If the parallel vacuum is enabled by default, then I could think of the following ways:
>> (a) Vacuum (disable_parallel) <tbl_name>;  Vacuum (Parallel <parallel_degree>) <tbl_name>;
>> (b) Vacuum (Parallel <parallel_degree>) <tbl_name>;  If user specifies parallel_degree as 0, then disable
parallelism.
>> (c) ... Any better ideas?
>>
>
> AFAICT, every parallel-able statement use parallelisation by default, so it wouldn't be consistent if VACUUM behaves
someother way. 
>

Fair enough.

> So, (c) has my vote.
>

I don't understand this.  What do you mean by voting (c) option?  Do
you mean that you didn't like any of (a) or (b)?  If so, then feel
free to suggest something else.  One more possibility could be to
allow users to specify parallel degree or disable parallelism via guc
'max_parallel_maintenance_workers'.  Basically, if the user wants to
disable parallelism, it needs to set the value of guc
max_parallel_maintenance_workers as zero and if it wants to increase
the parallel degree than the default value (which is two), then it can
set it via max_parallel_maintenance_workers before running vacuum
command.  Now, this can certainly work, but I feel setting/resetting a
guc before a vacuum
command can be a bit inconvenient for users, but if others prefer that
way, then we can do that.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: parallel vacuum options/syntax

From
Guillaume Lelarge
Date:
Le ven. 3 janv. 2020 à 09:06, Amit Kapila <amit.kapila16@gmail.com> a écrit :
On Thu, Jan 2, 2020 at 7:09 PM Guillaume Lelarge <guillaume@lelarge.info> wrote:
>
> Le jeu. 2 janv. 2020 à 13:09, Amit Kapila <amit.kapila16@gmail.com> a écrit :
>>
>> If parallel vacuum is *not* enabled by default, then I think the current way to enable is fine which is as follows:
>> Vacuum (Parallel 2) <tbl_name>;
>>
>> Here, if the user doesn't specify parallel_degree, then we internally decide based on number of indexes that support a parallel vacuum with a maximum of max_parallel_maintenance_workers.
>>
>> If the parallel vacuum is enabled by default, then I could think of the following ways:
>> (a) Vacuum (disable_parallel) <tbl_name>;  Vacuum (Parallel <parallel_degree>) <tbl_name>;
>> (b) Vacuum (Parallel <parallel_degree>) <tbl_name>;  If user specifies parallel_degree as 0, then disable parallelism.
>> (c) ... Any better ideas?
>>
>
> AFAICT, every parallel-able statement use parallelisation by default, so it wouldn't be consistent if VACUUM behaves some other way.
>

Fair enough.

> So, (c) has my vote.
>

I don't understand this.  What do you mean by voting (c) option?  Do
you mean that you didn't like any of (a) or (b)?

I meant (b), sorry :)

  If so, then feel
free to suggest something else.  One more possibility could be to
allow users to specify parallel degree or disable parallelism via guc
'max_parallel_maintenance_workers'.  Basically, if the user wants to
disable parallelism, it needs to set the value of guc
max_parallel_maintenance_workers as zero and if it wants to increase
the parallel degree than the default value (which is two), then it can
set it via max_parallel_maintenance_workers before running vacuum
command.  Now, this can certainly work, but I feel setting/resetting a
guc before a vacuum
command can be a bit inconvenient for users, but if others prefer that
way, then we can do that.



--
Guillaume.

Re: parallel vacuum options/syntax

From
Mahendra Singh
Date:
On Fri, 3 Jan 2020 at 08:51, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Thu, Jan 2, 2020 at 5:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > Hi,
> >
> > I am starting a new thread for some of the decisions for a parallel vacuum in the hope to get feedback from more
people. There are mainly two points for which we need some feedback. 
> >
> > 1. Tomas Vondra has pointed out on the main thread [1] that by default the parallel vacuum should be enabled
similarto what we do for Create Index.  As proposed, the patch enables it only when the user specifies it (ex. Vacuum
(Parallel2) <tbl_name>;).   One of the arguments in favor of enabling it by default as mentioned by Tomas is "It's
prettymuch the same thing we did with vacuum throttling - it's disabled for explicit vacuum by default, but you can
enableit. If you're worried about VACUUM causing issues, you should set cost delay.".  Some of the arguments against
enablingit are that it will lead to use of more resources (like CPU, I/O) which users might or might like. 
> >
> > Now, if we want to enable it by default, we need a way to disable it as well and along with that, we need a way for
usersto specify a parallel degree.  I have mentioned a few reasons why we need a parallel degree for this operation in
theemail [2] on the main thread. 
> >
> > If parallel vacuum is *not* enabled by default, then I think the current way to enable is fine which is as follows:
> > Vacuum (Parallel 2) <tbl_name>;
> >
> > Here, if the user doesn't specify parallel_degree, then we internally decide based on number of indexes that
supporta parallel vacuum with a maximum of max_parallel_maintenance_workers. 
> >
> > If the parallel vacuum is enabled by default, then I could think of the following ways:
> > (a) Vacuum (disable_parallel) <tbl_name>;  Vacuum (Parallel <parallel_degree>) <tbl_name>;
> > (b) Vacuum (Parallel <parallel_degree>) <tbl_name>;  If user specifies parallel_degree as 0, then disable
parallelism.
> > (c) ... Any better ideas?
>
> IMHO, it's better to keep the parallelism enables by default.  Because
> if the user is giving an explicit vacuum then better to keep it fast
> by default.  However, I agree that we can provide an option for the
> user to disable it and provide the parallel degree with the vacuum
> command something like option (b).

+1

Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com



Re: parallel vacuum options/syntax

From
Masahiko Sawada
Date:
On Thu, Jan 2, 2020 at 9:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Hi,
>
> I am starting a new thread for some of the decisions for a parallel vacuum in the hope to get feedback from more
people. There are mainly two points for which we need some feedback. 
>
> 1. Tomas Vondra has pointed out on the main thread [1] that by default the parallel vacuum should be enabled similar
towhat we do for Create Index.  As proposed, the patch enables it only when the user specifies it (ex. Vacuum (Parallel
2)<tbl_name>;).   One of the arguments in favor of enabling it by default as mentioned by Tomas is "It's pretty much
thesame thing we did with vacuum throttling - it's disabled for explicit vacuum by default, but you can enable it. If
you'reworried about VACUUM causing issues, you should set cost delay.".  Some of the arguments against enabling it are
thatit will lead to use of more resources (like CPU, I/O) which users might or might like. 
>

I'm a bit wary of making parallel vacuum enabled by default. Single
process vacuum does sequential reads/writes on most of indexes but
parallel vacuum does random access random reads/writes. I've tested
parallel vacuum on HDD and confirmed the performance is good but I'm
concerned that it might be cause of more disk I/O than user expected.

> Now, if we want to enable it by default, we need a way to disable it as well and along with that, we need a way for
usersto specify a parallel degree.  I have mentioned a few reasons why we need a parallel degree for this operation in
theemail [2] on the main thread. 
>
> If parallel vacuum is *not* enabled by default, then I think the current way to enable is fine which is as follows:
> Vacuum (Parallel 2) <tbl_name>;
>
> Here, if the user doesn't specify parallel_degree, then we internally decide based on number of indexes that support
aparallel vacuum with a maximum of max_parallel_maintenance_workers. 
>
> If the parallel vacuum is enabled by default, then I could think of the following ways:
> (a) Vacuum (disable_parallel) <tbl_name>;  Vacuum (Parallel <parallel_degree>) <tbl_name>;
> (b) Vacuum (Parallel <parallel_degree>) <tbl_name>;  If user specifies parallel_degree as 0, then disable
parallelism.
> (c) ... Any better ideas?
>

If parallel vacuum is enabled by default, I would prefer (b) but I
don't think it's a good idea to accept 0 as parallel degree. If we
want to disable parallel vacuum we should
max_parallel_maintenance_workers to 0 instead.

Regards,

--
Masahiko Sawada  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: parallel vacuum options/syntax

From
Tomas Vondra
Date:
On Sun, Jan 05, 2020 at 08:54:15AM +0900, Masahiko Sawada wrote:
>On Thu, Jan 2, 2020 at 9:09 PM Amit Kapila <amit.kapila16@gmail.com>
>wrote:
>>
>> Hi,
>>
>> I am starting a new thread for some of the decisions for a parallel
>> vacuum in the hope to get feedback from more people.  There are
>> mainly two points for which we need some feedback.
>>
>> 1. Tomas Vondra has pointed out on the main thread [1] that by
>> default the parallel vacuum should be enabled similar to what we do
>> for Create Index.  As proposed, the patch enables it only when the
>> user specifies it (ex. Vacuum (Parallel 2) <tbl_name>;).   One of the
>> arguments in favor of enabling it by default as mentioned by Tomas is
>> "It's pretty much the same thing we did with vacuum throttling - it's
>> disabled for explicit vacuum by default, but you can enable it. If
>> you're worried about VACUUM causing issues, you should set cost
>> delay.".  Some of the arguments against enabling it are that it will
>> lead to use of more resources (like CPU, I/O) which users might or
>> might like.
>>
>
>I'm a bit wary of making parallel vacuum enabled by default. Single
>process vacuum does sequential reads/writes on most of indexes but
>parallel vacuum does random access random reads/writes. I've tested
>parallel vacuum on HDD and confirmed the performance is good but I'm
>concerned that it might be cause of more disk I/O than user expected.
>

I understand the concern, but it's not clear to me why to apply this
defensive approach just to vacuum and not to all commands. Especially
when we do have a way to throttle vacuum (unlike pretty much any other
command) if I/O really is a scarce resource.

As the vacuum workers are separate processes, each generating requests
with a sequential pattern, so I'd expect readahead to kick in and keep
the efficiency of sequential access pattern.

>> Now, if we want to enable it by default, we need a way to disable it
>> as well and along with that, we need a way for users to specify a
>> parallel degree.  I have mentioned a few reasons why we need a
>> parallel degree for this operation in the email [2] on the main
>> thread.
>>
>> If parallel vacuum is *not* enabled by default, then I think the
>> current way to enable is fine which is as follows: Vacuum (Parallel
>> 2) <tbl_name>;
>>
>> Here, if the user doesn't specify parallel_degree, then we internally
>> decide based on number of indexes that support a parallel vacuum with
>> a maximum of max_parallel_maintenance_workers.
>>
>> If the parallel vacuum is enabled by default, then I could think of
>> the following ways:
>>
>> (a) Vacuum (disable_parallel) <tbl_name>;  Vacuum (Parallel
>> <parallel_degree>) <tbl_name>;
>>
>> (b) Vacuum (Parallel <parallel_degree>) <tbl_name>;  If user
>> specifies parallel_degree as 0, then disable parallelism.
>>
>> (c) ... Any better ideas?
>>
>
>If parallel vacuum is enabled by default, I would prefer (b) but I
>don't think it's a good idea to accept 0 as parallel degree. If we want
>to disable parallel vacuum we should max_parallel_maintenance_workers
>to 0 instead.
>

IMO that just makes the interaction between vacuum options and the GUC
even more complicated/confusing.

If we want to have a vacuum option to determine parallel degree, we
should probably have a vacuum option to disable parallelism using just a
vacuum option. I don't think 0 is too bad, and disable_parallel seems a
bit awkward. Maybe we could use NOPARALLEL (in addition to PARALLEL n).
That's what Oracle does, so it's not entirely without a precedent.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: parallel vacuum options/syntax

From
Amit Kapila
Date:
On Sun, Jan 5, 2020 at 6:40 AM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
>
> On Sun, Jan 05, 2020 at 08:54:15AM +0900, Masahiko Sawada wrote:
> >On Thu, Jan 2, 2020 at 9:09 PM Amit Kapila <amit.kapila16@gmail.com>
> >wrote:
> >>
> >> Hi,
> >>
> >> I am starting a new thread for some of the decisions for a parallel
> >> vacuum in the hope to get feedback from more people.  There are
> >> mainly two points for which we need some feedback.
> >>
> >> 1. Tomas Vondra has pointed out on the main thread [1] that by
> >> default the parallel vacuum should be enabled similar to what we do
> >> for Create Index.  As proposed, the patch enables it only when the
> >> user specifies it (ex. Vacuum (Parallel 2) <tbl_name>;).   One of the
> >> arguments in favor of enabling it by default as mentioned by Tomas is
> >> "It's pretty much the same thing we did with vacuum throttling - it's
> >> disabled for explicit vacuum by default, but you can enable it. If
> >> you're worried about VACUUM causing issues, you should set cost
> >> delay.".  Some of the arguments against enabling it are that it will
> >> lead to use of more resources (like CPU, I/O) which users might or
> >> might like.
> >>
> >
> >I'm a bit wary of making parallel vacuum enabled by default. Single
> >process vacuum does sequential reads/writes on most of indexes but
> >parallel vacuum does random access random reads/writes. I've tested
> >parallel vacuum on HDD and confirmed the performance is good but I'm
> >concerned that it might be cause of more disk I/O than user expected.
> >
>
> I understand the concern, but it's not clear to me why to apply this
> defensive approach just to vacuum and not to all commands. Especially
> when we do have a way to throttle vacuum (unlike pretty much any other
> command) if I/O really is a scarce resource.
>
> As the vacuum workers are separate processes, each generating requests
> with a sequential pattern, so I'd expect readahead to kick in and keep
> the efficiency of sequential access pattern.
>

Right, I also think so.

> >> Now, if we want to enable it by default, we need a way to disable it
> >> as well and along with that, we need a way for users to specify a
> >> parallel degree.  I have mentioned a few reasons why we need a
> >> parallel degree for this operation in the email [2] on the main
> >> thread.
> >>
> >> If parallel vacuum is *not* enabled by default, then I think the
> >> current way to enable is fine which is as follows: Vacuum (Parallel
> >> 2) <tbl_name>;
> >>
> >> Here, if the user doesn't specify parallel_degree, then we internally
> >> decide based on number of indexes that support a parallel vacuum with
> >> a maximum of max_parallel_maintenance_workers.
> >>
> >> If the parallel vacuum is enabled by default, then I could think of
> >> the following ways:
> >>
> >> (a) Vacuum (disable_parallel) <tbl_name>;  Vacuum (Parallel
> >> <parallel_degree>) <tbl_name>;
> >>
> >> (b) Vacuum (Parallel <parallel_degree>) <tbl_name>;  If user
> >> specifies parallel_degree as 0, then disable parallelism.
> >>
> >> (c) ... Any better ideas?
> >>
> >
> >If parallel vacuum is enabled by default, I would prefer (b) but I
> >don't think it's a good idea to accept 0 as parallel degree. If we want
> >to disable parallel vacuum we should max_parallel_maintenance_workers
> >to 0 instead.
> >
>
> IMO that just makes the interaction between vacuum options and the GUC
> even more complicated/confusing.
>

Yeah, I am also not sure if that will be a good idea.

> If we want to have a vacuum option to determine parallel degree, we
> should probably have a vacuum option to disable parallelism using just a
> vacuum option. I don't think 0 is too bad, and disable_parallel seems a
> bit awkward. Maybe we could use NOPARALLEL (in addition to PARALLEL n).
> That's what Oracle does, so it's not entirely without a precedent.
>

We can go either way (using 0 for parallel to indicate disable
parallelism or by introducing a new option like NOPARALLEL).  I think
initially we can avoid introducing more options and just go with
'Parallel 0' and if we find a lot of people find it inconvenient, then
we can always introduce a new option later.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: parallel vacuum options/syntax

From
Tomas Vondra
Date:
On Sun, Jan 05, 2020 at 03:56:35PM +0530, Amit Kapila wrote:
>>
>> ...
>>
>> >If parallel vacuum is enabled by default, I would prefer (b) but I
>> >don't think it's a good idea to accept 0 as parallel degree. If we want
>> >to disable parallel vacuum we should max_parallel_maintenance_workers
>> >to 0 instead.
>> >
>>
>> IMO that just makes the interaction between vacuum options and the GUC
>> even more complicated/confusing.
>>
>
>Yeah, I am also not sure if that will be a good idea.
>
>> If we want to have a vacuum option to determine parallel degree, we
>> should probably have a vacuum option to disable parallelism using just a
>> vacuum option. I don't think 0 is too bad, and disable_parallel seems a
>> bit awkward. Maybe we could use NOPARALLEL (in addition to PARALLEL n).
>> That's what Oracle does, so it's not entirely without a precedent.
>>
>
>We can go either way (using 0 for parallel to indicate disable
>parallelism or by introducing a new option like NOPARALLEL).  I think
>initially we can avoid introducing more options and just go with
>'Parallel 0' and if we find a lot of people find it inconvenient, then
>we can always introduce a new option later.
>

I don't think starting with "parallel 0" and then maybe introducing
NOPARALLEL sometime in the future is a good plan, because after adding
NOPARALLEL we'd either have to remove "parallel 0" (breaking backwards
compatibility unnecessarily) or supporting both approaches.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: parallel vacuum options/syntax

From
Masahiko Sawada
Date:
On Sun, Jan 5, 2020 at 7:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Jan 5, 2020 at 6:40 AM Tomas Vondra
> <tomas.vondra@2ndquadrant.com> wrote:
> >
> > On Sun, Jan 05, 2020 at 08:54:15AM +0900, Masahiko Sawada wrote:
> > >On Thu, Jan 2, 2020 at 9:09 PM Amit Kapila <amit.kapila16@gmail.com>
> > >wrote:
> > >>
> > >> Hi,
> > >>
> > >> I am starting a new thread for some of the decisions for a parallel
> > >> vacuum in the hope to get feedback from more people.  There are
> > >> mainly two points for which we need some feedback.
> > >>
> > >> 1. Tomas Vondra has pointed out on the main thread [1] that by
> > >> default the parallel vacuum should be enabled similar to what we do
> > >> for Create Index.  As proposed, the patch enables it only when the
> > >> user specifies it (ex. Vacuum (Parallel 2) <tbl_name>;).   One of the
> > >> arguments in favor of enabling it by default as mentioned by Tomas is
> > >> "It's pretty much the same thing we did with vacuum throttling - it's
> > >> disabled for explicit vacuum by default, but you can enable it. If
> > >> you're worried about VACUUM causing issues, you should set cost
> > >> delay.".  Some of the arguments against enabling it are that it will
> > >> lead to use of more resources (like CPU, I/O) which users might or
> > >> might like.
> > >>
> > >
> > >I'm a bit wary of making parallel vacuum enabled by default. Single
> > >process vacuum does sequential reads/writes on most of indexes but
> > >parallel vacuum does random access random reads/writes. I've tested
> > >parallel vacuum on HDD and confirmed the performance is good but I'm
> > >concerned that it might be cause of more disk I/O than user expected.
> > >
> >
> > I understand the concern, but it's not clear to me why to apply this
> > defensive approach just to vacuum and not to all commands. Especially
> > when we do have a way to throttle vacuum (unlike pretty much any other
> > command) if I/O really is a scarce resource.
> >
> > As the vacuum workers are separate processes, each generating requests
> > with a sequential pattern, so I'd expect readahead to kick in and keep
> > the efficiency of sequential access pattern.
> >
>
> Right, I also think so.

Okay I understand.

>
> > >> Now, if we want to enable it by default, we need a way to disable it
> > >> as well and along with that, we need a way for users to specify a
> > >> parallel degree.  I have mentioned a few reasons why we need a
> > >> parallel degree for this operation in the email [2] on the main
> > >> thread.
> > >>
> > >> If parallel vacuum is *not* enabled by default, then I think the
> > >> current way to enable is fine which is as follows: Vacuum (Parallel
> > >> 2) <tbl_name>;
> > >>
> > >> Here, if the user doesn't specify parallel_degree, then we internally
> > >> decide based on number of indexes that support a parallel vacuum with
> > >> a maximum of max_parallel_maintenance_workers.
> > >>
> > >> If the parallel vacuum is enabled by default, then I could think of
> > >> the following ways:
> > >>
> > >> (a) Vacuum (disable_parallel) <tbl_name>;  Vacuum (Parallel
> > >> <parallel_degree>) <tbl_name>;
> > >>
> > >> (b) Vacuum (Parallel <parallel_degree>) <tbl_name>;  If user
> > >> specifies parallel_degree as 0, then disable parallelism.
> > >>
> > >> (c) ... Any better ideas?
> > >>
> > >
> > >If parallel vacuum is enabled by default, I would prefer (b) but I
> > >don't think it's a good idea to accept 0 as parallel degree. If we want
> > >to disable parallel vacuum we should max_parallel_maintenance_workers
> > >to 0 instead.
> > >
> >
> > IMO that just makes the interaction between vacuum options and the GUC
> > even more complicated/confusing.
> >
>
> Yeah, I am also not sure if that will be a good idea.
>
> > If we want to have a vacuum option to determine parallel degree, we
> > should probably have a vacuum option to disable parallelism using just a
> > vacuum option. I don't think 0 is too bad, and disable_parallel seems a
> > bit awkward. Maybe we could use NOPARALLEL (in addition to PARALLEL n).
> > That's what Oracle does, so it's not entirely without a precedent.
> >
>
> We can go either way (using 0 for parallel to indicate disable
> parallelism or by introducing a new option like NOPARALLEL).  I think
> initially we can avoid introducing more options and just go with
> 'Parallel 0' and if we find a lot of people find it inconvenient, then
> we can always introduce a new option later.

Hmm I'm confused. Specifying NOPARALLEL or PARALLEL 0 is the same as
setting max_parallel_maintenance_workers to 0, right? We normally set
max_parallel_workers_per_gather to 0 to disable parallel queries on a
query. So I think that disabling parallel vacuum by setting
max_parallel_maintenance_workers to 0 is the same concept. Regarding
proposed two options we already have storage parameter
parallel_workers and it accepts 0 but PARALLEL 0 looks like
contradicted at a glance. And NOPARALLEL is inconsistent with existing
DISABLE_XXX options and it's a bit awkward to specify like (NOPARALLEL
off).

Regards,

--
Masahiko Sawada  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: parallel vacuum options/syntax

From
Tomas Vondra
Date:
On Sun, Jan 05, 2020 at 09:17:57PM +0900, Masahiko Sawada wrote:
>On Sun, Jan 5, 2020 at 7:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> ...
>>
>> > If we want to have a vacuum option to determine parallel degree, we
>> > should probably have a vacuum option to disable parallelism using just a
>> > vacuum option. I don't think 0 is too bad, and disable_parallel seems a
>> > bit awkward. Maybe we could use NOPARALLEL (in addition to PARALLEL n).
>> > That's what Oracle does, so it's not entirely without a precedent.
>> >
>>
>> We can go either way (using 0 for parallel to indicate disable
>> parallelism or by introducing a new option like NOPARALLEL).  I think
>> initially we can avoid introducing more options and just go with
>> 'Parallel 0' and if we find a lot of people find it inconvenient, then
>> we can always introduce a new option later.
>
>Hmm I'm confused. Specifying NOPARALLEL or PARALLEL 0 is the same as
>setting max_parallel_maintenance_workers to 0, right? We normally set
>max_parallel_workers_per_gather to 0 to disable parallel queries on a
>query. So I think that disabling parallel vacuum by setting
>max_parallel_maintenance_workers to 0 is the same concept. Regarding
>proposed two options we already have storage parameter
>parallel_workers and it accepts 0 but PARALLEL 0 looks like
>contradicted at a glance. And NOPARALLEL is inconsistent with existing
>DISABLE_XXX options and it's a bit awkward to specify like (NOPARALLEL
>off).
>

My understanding is the motivation for new vacuum options is a claim
that m_p_m_w is not sufficient/suitable for the vacuum case. I've
expressed my doubts about this, but let's assume it's the right
solution. To me it seems a bit confusing to just fall back to m_p_m_w
when it comes to disabling the parallel vacuum.

So if we think we need an option to determine vacuum parallel degree, we
should have an option to disable parallelism too. I don't care much if
it's called DISABLE_PARALLEL, NOPARALLEL or PARALLEL 0, as long as we
make our mind and don't unnecessarily break it in the next release.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: parallel vacuum options/syntax

From
Masahiko Sawada
Date:
On Sun, 5 Jan 2020 at 22:39, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> On Sun, Jan 05, 2020 at 09:17:57PM +0900, Masahiko Sawada wrote:
> >On Sun, Jan 5, 2020 at 7:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >>
> >> ...
> >>
> >> > If we want to have a vacuum option to determine parallel degree, we
> >> > should probably have a vacuum option to disable parallelism using just a
> >> > vacuum option. I don't think 0 is too bad, and disable_parallel seems a
> >> > bit awkward. Maybe we could use NOPARALLEL (in addition to PARALLEL n).
> >> > That's what Oracle does, so it's not entirely without a precedent.
> >> >
> >>
> >> We can go either way (using 0 for parallel to indicate disable
> >> parallelism or by introducing a new option like NOPARALLEL).  I think
> >> initially we can avoid introducing more options and just go with
> >> 'Parallel 0' and if we find a lot of people find it inconvenient, then
> >> we can always introduce a new option later.
> >
> >Hmm I'm confused. Specifying NOPARALLEL or PARALLEL 0 is the same as
> >setting max_parallel_maintenance_workers to 0, right? We normally set
> >max_parallel_workers_per_gather to 0 to disable parallel queries on a
> >query. So I think that disabling parallel vacuum by setting
> >max_parallel_maintenance_workers to 0 is the same concept. Regarding
> >proposed two options we already have storage parameter
> >parallel_workers and it accepts 0 but PARALLEL 0 looks like
> >contradicted at a glance. And NOPARALLEL is inconsistent with existing
> >DISABLE_XXX options and it's a bit awkward to specify like (NOPARALLEL
> >off).
> >
>
> My understanding is the motivation for new vacuum options is a claim
> that m_p_m_w is not sufficient/suitable for the vacuum case. I've
> expressed my doubts about this, but let's assume it's the right
> solution. To me it seems a bit confusing to just fall back to m_p_m_w
> when it comes to disabling the parallel vacuum.
>
> So if we think we need an option to determine vacuum parallel degree, we
> should have an option to disable parallelism too. I don't care much if
> it's called DISABLE_PARALLEL, NOPARALLEL or PARALLEL 0, as long as we
> make our mind and don't unnecessarily break it in the next release.

Okay I got your point. It's just an idea but how about controlling
parallel vacuum using two options. That is, we have PARALLEL option
that takes a boolean value (true by default) and enables/disables
parallel vacuum. And we have WORKERS option that takes an integer more
than 1 to specify the number of workers. Of course we should raise an
error if only WORKERS option is specified. WORKERS option is optional.
If WORKERS option is omitted the number of workers is determined based
on the number of indexes on the table.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: parallel vacuum options/syntax

From
Amit Kapila
Date:
On Sun, Jan 5, 2020 at 7:38 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Sun, 5 Jan 2020 at 22:39, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> >
> >
> > So if we think we need an option to determine vacuum parallel degree, we
> > should have an option to disable parallelism too. I don't care much if
> > it's called DISABLE_PARALLEL, NOPARALLEL or PARALLEL 0, as long as we
> > make our mind and don't unnecessarily break it in the next release.
> >

Fair point.  I favor parallel 0 as that avoids adding more options and
also it is not very clear whether that is required at all.  Till now,
if I see most people who have shared their opinion seems to favor this
as compared to another idea where we need to introduce more options.

>
> Okay I got your point. It's just an idea but how about controlling
> parallel vacuum using two options. That is, we have PARALLEL option
> that takes a boolean value (true by default) and enables/disables
> parallel vacuum. And we have WORKERS option that takes an integer more
> than 1 to specify the number of workers. Of course we should raise an
> error if only WORKERS option is specified. WORKERS option is optional.
> If WORKERS option is omitted the number of workers is determined based
> on the number of indexes on the table.
>

I think this would add failure modes without serving any additional
purpose.  Sure, it might give the better feeling that we have separate
options to enable/disable parallelism and then specify the number of
workers with a separate option, but we already have various examples
as shared by me previously where setting the value as zero means the
option is disabled, so why to invent something new here?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: parallel vacuum options/syntax

From
Masahiko Sawada
Date:
On Sun, 5 Jan 2020 at 23:28, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Jan 5, 2020 at 7:38 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Sun, 5 Jan 2020 at 22:39, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> > >
> > >
> > > So if we think we need an option to determine vacuum parallel degree, we
> > > should have an option to disable parallelism too. I don't care much if
> > > it's called DISABLE_PARALLEL, NOPARALLEL or PARALLEL 0, as long as we
> > > make our mind and don't unnecessarily break it in the next release.
> > >
>
> Fair point.  I favor parallel 0 as that avoids adding more options and
> also it is not very clear whether that is required at all.  Till now,
> if I see most people who have shared their opinion seems to favor this
> as compared to another idea where we need to introduce more options.
>
> >
> > Okay I got your point. It's just an idea but how about controlling
> > parallel vacuum using two options. That is, we have PARALLEL option
> > that takes a boolean value (true by default) and enables/disables
> > parallel vacuum. And we have WORKERS option that takes an integer more
> > than 1 to specify the number of workers. Of course we should raise an
> > error if only WORKERS option is specified. WORKERS option is optional.
> > If WORKERS option is omitted the number of workers is determined based
> > on the number of indexes on the table.
> >
>
> I think this would add failure modes without serving any additional
> purpose.  Sure, it might give the better feeling that we have separate
> options to enable/disable parallelism and then specify the number of
> workers with a separate option, but we already have various examples
> as shared by me previously where setting the value as zero means the
> option is disabled, so why to invent something new here?

I just felt it's not intuitive that specifying parallel degree to 0
means to disable parallel vacuum. But since majority of hackers seem
to agree with this syntax I'm not going to insist on that any further.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: parallel vacuum options/syntax

From
Masahiko Sawada
Date:
On Mon, 6 Jan 2020 at 15:27, Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Sun, 5 Jan 2020 at 23:28, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sun, Jan 5, 2020 at 7:38 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > On Sun, 5 Jan 2020 at 22:39, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> > > >
> > > >
> > > > So if we think we need an option to determine vacuum parallel degree, we
> > > > should have an option to disable parallelism too. I don't care much if
> > > > it's called DISABLE_PARALLEL, NOPARALLEL or PARALLEL 0, as long as we
> > > > make our mind and don't unnecessarily break it in the next release.
> > > >
> >
> > Fair point.  I favor parallel 0 as that avoids adding more options and
> > also it is not very clear whether that is required at all.  Till now,
> > if I see most people who have shared their opinion seems to favor this
> > as compared to another idea where we need to introduce more options.
> >
> > >
> > > Okay I got your point. It's just an idea but how about controlling
> > > parallel vacuum using two options. That is, we have PARALLEL option
> > > that takes a boolean value (true by default) and enables/disables
> > > parallel vacuum. And we have WORKERS option that takes an integer more
> > > than 1 to specify the number of workers. Of course we should raise an
> > > error if only WORKERS option is specified. WORKERS option is optional.
> > > If WORKERS option is omitted the number of workers is determined based
> > > on the number of indexes on the table.
> > >
> >
> > I think this would add failure modes without serving any additional
> > purpose.  Sure, it might give the better feeling that we have separate
> > options to enable/disable parallelism and then specify the number of
> > workers with a separate option, but we already have various examples
> > as shared by me previously where setting the value as zero means the
> > option is disabled, so why to invent something new here?
>
> I just felt it's not intuitive that specifying parallel degree to 0
> means to disable parallel vacuum. But since majority of hackers seem
> to agree with this syntax I'm not going to insist on that any further.
>

Okay I'm going to go with enabling parallel vacuum by default and
disabling it by specifying PARALLEL 0.

For combination of VACUUM command options, although parallel vacuum is
enabled by default and VACUUM FULL doesn't support it yet, 'VACUUM
(FULL)' would work. On the other hand 'VACUUM (FULL, PARALLEL)' and
'VACUUM(FULL, PARALLEL 1)' would not work with error. And I think it
is better if 'VACUUM (FULL, PARALLEL 0)' also work but I'd like to
hear opinions.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: parallel vacuum options/syntax

From
Amit Kapila
Date:
On Wed, Jan 8, 2020 at 11:31 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Mon, 6 Jan 2020 at 15:27, Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > I just felt it's not intuitive that specifying parallel degree to 0
> > means to disable parallel vacuum. But since majority of hackers seem
> > to agree with this syntax I'm not going to insist on that any further.
> >
>
> Okay I'm going to go with enabling parallel vacuum by default and
> disabling it by specifying PARALLEL 0.
>

Sounds fine to me.  However, I have already started updating the patch
for that.  I shall post the new version today or tomorrow.  Is that
fine with you?

> For combination of VACUUM command options, although parallel vacuum is
> enabled by default and VACUUM FULL doesn't support it yet, 'VACUUM
> (FULL)' would work. On the other hand 'VACUUM (FULL, PARALLEL)' and
> 'VACUUM(FULL, PARALLEL 1)' would not work with error. And I think it
> is better if 'VACUUM (FULL, PARALLEL 0)' also work but I'd like to
> hear opinions.
>

I agree with all these points.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: parallel vacuum options/syntax

From
Masahiko Sawada
Date:
On Wed, 8 Jan 2020 at 15:31, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Jan 8, 2020 at 11:31 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Mon, 6 Jan 2020 at 15:27, Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > I just felt it's not intuitive that specifying parallel degree to 0
> > > means to disable parallel vacuum. But since majority of hackers seem
> > > to agree with this syntax I'm not going to insist on that any further.
> > >
> >
> > Okay I'm going to go with enabling parallel vacuum by default and
> > disabling it by specifying PARALLEL 0.
> >
>
> Sounds fine to me.  However, I have already started updating the patch
> for that.  I shall post the new version today or tomorrow.  Is that
> fine with you?

Yes, that's fine. Thanks.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: parallel vacuum options/syntax

From
Amit Kapila
Date:
On Wed, Jan 8, 2020 at 12:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
>
> > For combination of VACUUM command options, although parallel vacuum is
> > enabled by default and VACUUM FULL doesn't support it yet, 'VACUUM
> > (FULL)' would work. On the other hand 'VACUUM (FULL, PARALLEL)' and
> > 'VACUUM(FULL, PARALLEL 1)' would not work with error. And I think it
> > is better if 'VACUUM (FULL, PARALLEL 0)' also work but I'd like to
> > hear opinions.
> >

On again thinking about whether we should allow VACUUM (FULL, PARALLEL
0) case, I am not sure, so, for now, the patch [1] is throwing error
for that case, but we can modify it if we want.

[1] - https://www.postgresql.org/message-id/CAA4eK1JxWAYTSM4NpTi7Tz%3DsPetbWBWZPpHKxLoEKb%3DgMi%3DGGA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com