Thread: parallel vacuum options/syntax
Hi,
I am starting a new thread for some of the decisions for a parallel vacuum in the hope to get feedback from more people. There are mainly two points for which we need some feedback.
1. Tomas Vondra has pointed out on the main thread [1] that by default the parallel vacuum should be enabled similar to what we do for Create Index. As proposed, the patch enables it only when the user specifies it (ex. Vacuum (Parallel 2) <tbl_name>;). One of the arguments in favor of enabling it by default as mentioned by Tomas is "It's pretty much the same thing we did with vacuum throttling - it's disabled for explicit vacuum by default, but you can enable it. If you're worried about VACUUM causing issues, you should set cost delay.". Some of the arguments against enabling it are that it will lead to use of more resources (like CPU, I/O) which users might or might like.
Now, if we want to enable it by default, we need a way to disable it as well and along with that, we need a way for users to specify a parallel degree. I have mentioned a few reasons why we need a parallel degree for this operation in the email [2] on the main thread.
If parallel vacuum is *not* enabled by default, then I think the current way to enable is fine which is as follows:
Vacuum (Parallel 2) <tbl_name>;
Here, if the user doesn't specify parallel_degree, then we internally decide based on number of indexes that support a parallel vacuum with a maximum of max_parallel_maintenance_workers.
If the parallel vacuum is enabled by default, then I could think of the following ways:
(a) Vacuum (disable_parallel) <tbl_name>; Vacuum (Parallel <parallel_degree>) <tbl_name>;
(b) Vacuum (Parallel <parallel_degree>) <tbl_name>; If user specifies parallel_degree as 0, then disable parallelism.
(c) ... Any better ideas?
2. The patch provides a FAST option (based on suggestion by Robert) for a parallel vacuum which will make it behave like vacuum_cost_delay = 0 which means it will disable throttling. So,
VACUUM (PARALLEL n, FAST) <tbl_name> will allow the parallel vacuum to run without resource throttling. Tomas thinks that we don't need such an option as the same can be served by setting vacuum_cost_delay = 0 which is a valid argument, but OTOH, providing an option to the user which can make his life easier is not a bad idea either.
Thoughts?
[1] - https://www.postgresql.org/message-id/20191229212354.tqivttn23lxjg2jz%40development
[2] - https://www.postgresql.org/message-id/CAA4eK1%2B1o-BaPvJnK7BPThTryx3MRDS%2BmCf9eVVZT%3DSVJ8mwLg%40mail.gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
I am starting a new thread for some of the decisions for a parallel vacuum in the hope to get feedback from more people. There are mainly two points for which we need some feedback.
1. Tomas Vondra has pointed out on the main thread [1] that by default the parallel vacuum should be enabled similar to what we do for Create Index. As proposed, the patch enables it only when the user specifies it (ex. Vacuum (Parallel 2) <tbl_name>;). One of the arguments in favor of enabling it by default as mentioned by Tomas is "It's pretty much the same thing we did with vacuum throttling - it's disabled for explicit vacuum by default, but you can enable it. If you're worried about VACUUM causing issues, you should set cost delay.". Some of the arguments against enabling it are that it will lead to use of more resources (like CPU, I/O) which users might or might like.
Now, if we want to enable it by default, we need a way to disable it as well and along with that, we need a way for users to specify a parallel degree. I have mentioned a few reasons why we need a parallel degree for this operation in the email [2] on the main thread.
If parallel vacuum is *not* enabled by default, then I think the current way to enable is fine which is as follows:
Vacuum (Parallel 2) <tbl_name>;
Here, if the user doesn't specify parallel_degree, then we internally decide based on number of indexes that support a parallel vacuum with a maximum of max_parallel_maintenance_workers.
If the parallel vacuum is enabled by default, then I could think of the following ways:
(a) Vacuum (disable_parallel) <tbl_name>; Vacuum (Parallel <parallel_degree>) <tbl_name>;
(b) Vacuum (Parallel <parallel_degree>) <tbl_name>; If user specifies parallel_degree as 0, then disable parallelism.
(c) ... Any better ideas?
2. The patch provides a FAST option (based on suggestion by Robert) for a parallel vacuum which will make it behave like vacuum_cost_delay = 0 which means it will disable throttling. So,
VACUUM (PARALLEL n, FAST) <tbl_name> will allow the parallel vacuum to run without resource throttling. Tomas thinks that we don't need such an option as the same can be served by setting vacuum_cost_delay = 0 which is a valid argument, but OTOH, providing an option to the user which can make his life easier is not a bad idea either.
Thoughts?
[1] - https://www.postgresql.org/message-id/20191229212354.tqivttn23lxjg2jz%40development
[2] - https://www.postgresql.org/message-id/CAA4eK1%2B1o-BaPvJnK7BPThTryx3MRDS%2BmCf9eVVZT%3DSVJ8mwLg%40mail.gmail.com
--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Le jeu. 2 janv. 2020 à 13:09, Amit Kapila <amit.kapila16@gmail.com> a écrit :
Hi,
I am starting a new thread for some of the decisions for a parallel vacuum in the hope to get feedback from more people. There are mainly two points for which we need some feedback.
1. Tomas Vondra has pointed out on the main thread [1] that by default the parallel vacuum should be enabled similar to what we do for Create Index. As proposed, the patch enables it only when the user specifies it (ex. Vacuum (Parallel 2) <tbl_name>;). One of the arguments in favor of enabling it by default as mentioned by Tomas is "It's pretty much the same thing we did with vacuum throttling - it's disabled for explicit vacuum by default, but you can enable it. If you're worried about VACUUM causing issues, you should set cost delay.". Some of the arguments against enabling it are that it will lead to use of more resources (like CPU, I/O) which users might or might like.
Now, if we want to enable it by default, we need a way to disable it as well and along with that, we need a way for users to specify a parallel degree. I have mentioned a few reasons why we need a parallel degree for this operation in the email [2] on the main thread.
If parallel vacuum is *not* enabled by default, then I think the current way to enable is fine which is as follows:
Vacuum (Parallel 2) <tbl_name>;
Here, if the user doesn't specify parallel_degree, then we internally decide based on number of indexes that support a parallel vacuum with a maximum of max_parallel_maintenance_workers.
If the parallel vacuum is enabled by default, then I could think of the following ways:
(a) Vacuum (disable_parallel) <tbl_name>; Vacuum (Parallel <parallel_degree>) <tbl_name>;
(b) Vacuum (Parallel <parallel_degree>) <tbl_name>; If user specifies parallel_degree as 0, then disable parallelism.
(c) ... Any better ideas?
AFAICT, every parallel-able statement use parallelisation by default, so it wouldn't be consistent if VACUUM behaves some other way.
So, (c) has my vote.
2. The patch provides a FAST option (based on suggestion by Robert) for a parallel vacuum which will make it behave like vacuum_cost_delay = 0 which means it will disable throttling. So,
VACUUM (PARALLEL n, FAST) <tbl_name> will allow the parallel vacuum to run without resource throttling. Tomas thinks that we don't need such an option as the same can be served by setting vacuum_cost_delay = 0 which is a valid argument, but OTOH, providing an option to the user which can make his life easier is not a bad idea either.
The user already has an option (the vacuum_cost_delay GUC). So I kinda agree with Tomas on this.
--
Guillaume.
On Thu, Jan 2, 2020 at 5:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Hi, > > I am starting a new thread for some of the decisions for a parallel vacuum in the hope to get feedback from more people. There are mainly two points for which we need some feedback. > > 1. Tomas Vondra has pointed out on the main thread [1] that by default the parallel vacuum should be enabled similar towhat we do for Create Index. As proposed, the patch enables it only when the user specifies it (ex. Vacuum (Parallel 2)<tbl_name>;). One of the arguments in favor of enabling it by default as mentioned by Tomas is "It's pretty much thesame thing we did with vacuum throttling - it's disabled for explicit vacuum by default, but you can enable it. If you'reworried about VACUUM causing issues, you should set cost delay.". Some of the arguments against enabling it are thatit will lead to use of more resources (like CPU, I/O) which users might or might like. > > Now, if we want to enable it by default, we need a way to disable it as well and along with that, we need a way for usersto specify a parallel degree. I have mentioned a few reasons why we need a parallel degree for this operation in theemail [2] on the main thread. > > If parallel vacuum is *not* enabled by default, then I think the current way to enable is fine which is as follows: > Vacuum (Parallel 2) <tbl_name>; > > Here, if the user doesn't specify parallel_degree, then we internally decide based on number of indexes that support aparallel vacuum with a maximum of max_parallel_maintenance_workers. > > If the parallel vacuum is enabled by default, then I could think of the following ways: > (a) Vacuum (disable_parallel) <tbl_name>; Vacuum (Parallel <parallel_degree>) <tbl_name>; > (b) Vacuum (Parallel <parallel_degree>) <tbl_name>; If user specifies parallel_degree as 0, then disable parallelism. > (c) ... Any better ideas? IMHO, it's better to keep the parallelism enables by default. Because if the user is giving an explicit vacuum then better to keep it fast by default. However, I agree that we can provide an option for the user to disable it and provide the parallel degree with the vacuum command something like option (b). > > 2. The patch provides a FAST option (based on suggestion by Robert) for a parallel vacuum which will make it behave likevacuum_cost_delay = 0 which means it will disable throttling. So, > VACUUM (PARALLEL n, FAST) <tbl_name> will allow the parallel vacuum to run without resource throttling. Tomas thinks thatwe don't need such an option as the same can be served by setting vacuum_cost_delay = 0 which is a valid argument, butOTOH, providing an option to the user which can make his life easier is not a bad idea either. I agree that there is already an option to run it without cost delay but there is no harm in providing extra power to the user where he can run a particular vacuum command without IO throttling. So +1 for the FAST option. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
On Fri, Jan 3, 2020 at 8:50 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Jan 2, 2020 at 5:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > If parallel vacuum is *not* enabled by default, then I think the current way to enable is fine which is as follows: > > Vacuum (Parallel 2) <tbl_name>; > > > > Here, if the user doesn't specify parallel_degree, then we internally decide based on number of indexes that supporta parallel vacuum with a maximum of max_parallel_maintenance_workers. > > > > If the parallel vacuum is enabled by default, then I could think of the following ways: > > (a) Vacuum (disable_parallel) <tbl_name>; Vacuum (Parallel <parallel_degree>) <tbl_name>; > > (b) Vacuum (Parallel <parallel_degree>) <tbl_name>; If user specifies parallel_degree as 0, then disable parallelism. > > (c) ... Any better ideas? > > IMHO, it's better to keep the parallelism enables by default. Because > if the user is giving an explicit vacuum then better to keep it fast > by default. Okay. > However, I agree that we can provide an option for the > user to disable it and provide the parallel degree with the vacuum > command something like option (b). > The option (b) has some advantage over (a) that we don't need to invent multiple options to enable/disable parallelism for vacuum. However, it might appear awkward to set parallel_degree as 0 (Vacuum (Parallel 0) tbl_name) to disable parallelism. Having said that, we already have some precedence wherein if we set parameters like statement_timeout, lock_timeout, etc to zero, it disables the timeout. So, it won't be insane if we choose this option. Does anyone else have any opinion on what makes sense here? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Thu, Jan 2, 2020 at 7:09 PM Guillaume Lelarge <guillaume@lelarge.info> wrote: > > Le jeu. 2 janv. 2020 à 13:09, Amit Kapila <amit.kapila16@gmail.com> a écrit : >> >> If parallel vacuum is *not* enabled by default, then I think the current way to enable is fine which is as follows: >> Vacuum (Parallel 2) <tbl_name>; >> >> Here, if the user doesn't specify parallel_degree, then we internally decide based on number of indexes that support aparallel vacuum with a maximum of max_parallel_maintenance_workers. >> >> If the parallel vacuum is enabled by default, then I could think of the following ways: >> (a) Vacuum (disable_parallel) <tbl_name>; Vacuum (Parallel <parallel_degree>) <tbl_name>; >> (b) Vacuum (Parallel <parallel_degree>) <tbl_name>; If user specifies parallel_degree as 0, then disable parallelism. >> (c) ... Any better ideas? >> > > AFAICT, every parallel-able statement use parallelisation by default, so it wouldn't be consistent if VACUUM behaves someother way. > Fair enough. > So, (c) has my vote. > I don't understand this. What do you mean by voting (c) option? Do you mean that you didn't like any of (a) or (b)? If so, then feel free to suggest something else. One more possibility could be to allow users to specify parallel degree or disable parallelism via guc 'max_parallel_maintenance_workers'. Basically, if the user wants to disable parallelism, it needs to set the value of guc max_parallel_maintenance_workers as zero and if it wants to increase the parallel degree than the default value (which is two), then it can set it via max_parallel_maintenance_workers before running vacuum command. Now, this can certainly work, but I feel setting/resetting a guc before a vacuum command can be a bit inconvenient for users, but if others prefer that way, then we can do that. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Le ven. 3 janv. 2020 à 09:06, Amit Kapila <amit.kapila16@gmail.com> a écrit :
On Thu, Jan 2, 2020 at 7:09 PM Guillaume Lelarge <guillaume@lelarge.info> wrote:
>
> Le jeu. 2 janv. 2020 à 13:09, Amit Kapila <amit.kapila16@gmail.com> a écrit :
>>
>> If parallel vacuum is *not* enabled by default, then I think the current way to enable is fine which is as follows:
>> Vacuum (Parallel 2) <tbl_name>;
>>
>> Here, if the user doesn't specify parallel_degree, then we internally decide based on number of indexes that support a parallel vacuum with a maximum of max_parallel_maintenance_workers.
>>
>> If the parallel vacuum is enabled by default, then I could think of the following ways:
>> (a) Vacuum (disable_parallel) <tbl_name>; Vacuum (Parallel <parallel_degree>) <tbl_name>;
>> (b) Vacuum (Parallel <parallel_degree>) <tbl_name>; If user specifies parallel_degree as 0, then disable parallelism.
>> (c) ... Any better ideas?
>>
>
> AFAICT, every parallel-able statement use parallelisation by default, so it wouldn't be consistent if VACUUM behaves some other way.
>
Fair enough.
> So, (c) has my vote.
>
I don't understand this. What do you mean by voting (c) option? Do
you mean that you didn't like any of (a) or (b)?
I meant (b), sorry :)
If so, then feel
free to suggest something else. One more possibility could be to
allow users to specify parallel degree or disable parallelism via guc
'max_parallel_maintenance_workers'. Basically, if the user wants to
disable parallelism, it needs to set the value of guc
max_parallel_maintenance_workers as zero and if it wants to increase
the parallel degree than the default value (which is two), then it can
set it via max_parallel_maintenance_workers before running vacuum
command. Now, this can certainly work, but I feel setting/resetting a
guc before a vacuum
command can be a bit inconvenient for users, but if others prefer that
way, then we can do that.
--
Guillaume.
On Fri, 3 Jan 2020 at 08:51, Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Thu, Jan 2, 2020 at 5:39 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > Hi, > > > > I am starting a new thread for some of the decisions for a parallel vacuum in the hope to get feedback from more people. There are mainly two points for which we need some feedback. > > > > 1. Tomas Vondra has pointed out on the main thread [1] that by default the parallel vacuum should be enabled similarto what we do for Create Index. As proposed, the patch enables it only when the user specifies it (ex. Vacuum (Parallel2) <tbl_name>;). One of the arguments in favor of enabling it by default as mentioned by Tomas is "It's prettymuch the same thing we did with vacuum throttling - it's disabled for explicit vacuum by default, but you can enableit. If you're worried about VACUUM causing issues, you should set cost delay.". Some of the arguments against enablingit are that it will lead to use of more resources (like CPU, I/O) which users might or might like. > > > > Now, if we want to enable it by default, we need a way to disable it as well and along with that, we need a way for usersto specify a parallel degree. I have mentioned a few reasons why we need a parallel degree for this operation in theemail [2] on the main thread. > > > > If parallel vacuum is *not* enabled by default, then I think the current way to enable is fine which is as follows: > > Vacuum (Parallel 2) <tbl_name>; > > > > Here, if the user doesn't specify parallel_degree, then we internally decide based on number of indexes that supporta parallel vacuum with a maximum of max_parallel_maintenance_workers. > > > > If the parallel vacuum is enabled by default, then I could think of the following ways: > > (a) Vacuum (disable_parallel) <tbl_name>; Vacuum (Parallel <parallel_degree>) <tbl_name>; > > (b) Vacuum (Parallel <parallel_degree>) <tbl_name>; If user specifies parallel_degree as 0, then disable parallelism. > > (c) ... Any better ideas? > > IMHO, it's better to keep the parallelism enables by default. Because > if the user is giving an explicit vacuum then better to keep it fast > by default. However, I agree that we can provide an option for the > user to disable it and provide the parallel degree with the vacuum > command something like option (b). +1 Thanks and Regards Mahendra Singh Thalor EnterpriseDB: http://www.enterprisedb.com
On Thu, Jan 2, 2020 at 9:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > Hi, > > I am starting a new thread for some of the decisions for a parallel vacuum in the hope to get feedback from more people. There are mainly two points for which we need some feedback. > > 1. Tomas Vondra has pointed out on the main thread [1] that by default the parallel vacuum should be enabled similar towhat we do for Create Index. As proposed, the patch enables it only when the user specifies it (ex. Vacuum (Parallel 2)<tbl_name>;). One of the arguments in favor of enabling it by default as mentioned by Tomas is "It's pretty much thesame thing we did with vacuum throttling - it's disabled for explicit vacuum by default, but you can enable it. If you'reworried about VACUUM causing issues, you should set cost delay.". Some of the arguments against enabling it are thatit will lead to use of more resources (like CPU, I/O) which users might or might like. > I'm a bit wary of making parallel vacuum enabled by default. Single process vacuum does sequential reads/writes on most of indexes but parallel vacuum does random access random reads/writes. I've tested parallel vacuum on HDD and confirmed the performance is good but I'm concerned that it might be cause of more disk I/O than user expected. > Now, if we want to enable it by default, we need a way to disable it as well and along with that, we need a way for usersto specify a parallel degree. I have mentioned a few reasons why we need a parallel degree for this operation in theemail [2] on the main thread. > > If parallel vacuum is *not* enabled by default, then I think the current way to enable is fine which is as follows: > Vacuum (Parallel 2) <tbl_name>; > > Here, if the user doesn't specify parallel_degree, then we internally decide based on number of indexes that support aparallel vacuum with a maximum of max_parallel_maintenance_workers. > > If the parallel vacuum is enabled by default, then I could think of the following ways: > (a) Vacuum (disable_parallel) <tbl_name>; Vacuum (Parallel <parallel_degree>) <tbl_name>; > (b) Vacuum (Parallel <parallel_degree>) <tbl_name>; If user specifies parallel_degree as 0, then disable parallelism. > (c) ... Any better ideas? > If parallel vacuum is enabled by default, I would prefer (b) but I don't think it's a good idea to accept 0 as parallel degree. If we want to disable parallel vacuum we should max_parallel_maintenance_workers to 0 instead. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Jan 05, 2020 at 08:54:15AM +0900, Masahiko Sawada wrote: >On Thu, Jan 2, 2020 at 9:09 PM Amit Kapila <amit.kapila16@gmail.com> >wrote: >> >> Hi, >> >> I am starting a new thread for some of the decisions for a parallel >> vacuum in the hope to get feedback from more people. There are >> mainly two points for which we need some feedback. >> >> 1. Tomas Vondra has pointed out on the main thread [1] that by >> default the parallel vacuum should be enabled similar to what we do >> for Create Index. As proposed, the patch enables it only when the >> user specifies it (ex. Vacuum (Parallel 2) <tbl_name>;). One of the >> arguments in favor of enabling it by default as mentioned by Tomas is >> "It's pretty much the same thing we did with vacuum throttling - it's >> disabled for explicit vacuum by default, but you can enable it. If >> you're worried about VACUUM causing issues, you should set cost >> delay.". Some of the arguments against enabling it are that it will >> lead to use of more resources (like CPU, I/O) which users might or >> might like. >> > >I'm a bit wary of making parallel vacuum enabled by default. Single >process vacuum does sequential reads/writes on most of indexes but >parallel vacuum does random access random reads/writes. I've tested >parallel vacuum on HDD and confirmed the performance is good but I'm >concerned that it might be cause of more disk I/O than user expected. > I understand the concern, but it's not clear to me why to apply this defensive approach just to vacuum and not to all commands. Especially when we do have a way to throttle vacuum (unlike pretty much any other command) if I/O really is a scarce resource. As the vacuum workers are separate processes, each generating requests with a sequential pattern, so I'd expect readahead to kick in and keep the efficiency of sequential access pattern. >> Now, if we want to enable it by default, we need a way to disable it >> as well and along with that, we need a way for users to specify a >> parallel degree. I have mentioned a few reasons why we need a >> parallel degree for this operation in the email [2] on the main >> thread. >> >> If parallel vacuum is *not* enabled by default, then I think the >> current way to enable is fine which is as follows: Vacuum (Parallel >> 2) <tbl_name>; >> >> Here, if the user doesn't specify parallel_degree, then we internally >> decide based on number of indexes that support a parallel vacuum with >> a maximum of max_parallel_maintenance_workers. >> >> If the parallel vacuum is enabled by default, then I could think of >> the following ways: >> >> (a) Vacuum (disable_parallel) <tbl_name>; Vacuum (Parallel >> <parallel_degree>) <tbl_name>; >> >> (b) Vacuum (Parallel <parallel_degree>) <tbl_name>; If user >> specifies parallel_degree as 0, then disable parallelism. >> >> (c) ... Any better ideas? >> > >If parallel vacuum is enabled by default, I would prefer (b) but I >don't think it's a good idea to accept 0 as parallel degree. If we want >to disable parallel vacuum we should max_parallel_maintenance_workers >to 0 instead. > IMO that just makes the interaction between vacuum options and the GUC even more complicated/confusing. If we want to have a vacuum option to determine parallel degree, we should probably have a vacuum option to disable parallelism using just a vacuum option. I don't think 0 is too bad, and disable_parallel seems a bit awkward. Maybe we could use NOPARALLEL (in addition to PARALLEL n). That's what Oracle does, so it's not entirely without a precedent. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Jan 5, 2020 at 6:40 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Sun, Jan 05, 2020 at 08:54:15AM +0900, Masahiko Sawada wrote: > >On Thu, Jan 2, 2020 at 9:09 PM Amit Kapila <amit.kapila16@gmail.com> > >wrote: > >> > >> Hi, > >> > >> I am starting a new thread for some of the decisions for a parallel > >> vacuum in the hope to get feedback from more people. There are > >> mainly two points for which we need some feedback. > >> > >> 1. Tomas Vondra has pointed out on the main thread [1] that by > >> default the parallel vacuum should be enabled similar to what we do > >> for Create Index. As proposed, the patch enables it only when the > >> user specifies it (ex. Vacuum (Parallel 2) <tbl_name>;). One of the > >> arguments in favor of enabling it by default as mentioned by Tomas is > >> "It's pretty much the same thing we did with vacuum throttling - it's > >> disabled for explicit vacuum by default, but you can enable it. If > >> you're worried about VACUUM causing issues, you should set cost > >> delay.". Some of the arguments against enabling it are that it will > >> lead to use of more resources (like CPU, I/O) which users might or > >> might like. > >> > > > >I'm a bit wary of making parallel vacuum enabled by default. Single > >process vacuum does sequential reads/writes on most of indexes but > >parallel vacuum does random access random reads/writes. I've tested > >parallel vacuum on HDD and confirmed the performance is good but I'm > >concerned that it might be cause of more disk I/O than user expected. > > > > I understand the concern, but it's not clear to me why to apply this > defensive approach just to vacuum and not to all commands. Especially > when we do have a way to throttle vacuum (unlike pretty much any other > command) if I/O really is a scarce resource. > > As the vacuum workers are separate processes, each generating requests > with a sequential pattern, so I'd expect readahead to kick in and keep > the efficiency of sequential access pattern. > Right, I also think so. > >> Now, if we want to enable it by default, we need a way to disable it > >> as well and along with that, we need a way for users to specify a > >> parallel degree. I have mentioned a few reasons why we need a > >> parallel degree for this operation in the email [2] on the main > >> thread. > >> > >> If parallel vacuum is *not* enabled by default, then I think the > >> current way to enable is fine which is as follows: Vacuum (Parallel > >> 2) <tbl_name>; > >> > >> Here, if the user doesn't specify parallel_degree, then we internally > >> decide based on number of indexes that support a parallel vacuum with > >> a maximum of max_parallel_maintenance_workers. > >> > >> If the parallel vacuum is enabled by default, then I could think of > >> the following ways: > >> > >> (a) Vacuum (disable_parallel) <tbl_name>; Vacuum (Parallel > >> <parallel_degree>) <tbl_name>; > >> > >> (b) Vacuum (Parallel <parallel_degree>) <tbl_name>; If user > >> specifies parallel_degree as 0, then disable parallelism. > >> > >> (c) ... Any better ideas? > >> > > > >If parallel vacuum is enabled by default, I would prefer (b) but I > >don't think it's a good idea to accept 0 as parallel degree. If we want > >to disable parallel vacuum we should max_parallel_maintenance_workers > >to 0 instead. > > > > IMO that just makes the interaction between vacuum options and the GUC > even more complicated/confusing. > Yeah, I am also not sure if that will be a good idea. > If we want to have a vacuum option to determine parallel degree, we > should probably have a vacuum option to disable parallelism using just a > vacuum option. I don't think 0 is too bad, and disable_parallel seems a > bit awkward. Maybe we could use NOPARALLEL (in addition to PARALLEL n). > That's what Oracle does, so it's not entirely without a precedent. > We can go either way (using 0 for parallel to indicate disable parallelism or by introducing a new option like NOPARALLEL). I think initially we can avoid introducing more options and just go with 'Parallel 0' and if we find a lot of people find it inconvenient, then we can always introduce a new option later. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sun, Jan 05, 2020 at 03:56:35PM +0530, Amit Kapila wrote: >> >> ... >> >> >If parallel vacuum is enabled by default, I would prefer (b) but I >> >don't think it's a good idea to accept 0 as parallel degree. If we want >> >to disable parallel vacuum we should max_parallel_maintenance_workers >> >to 0 instead. >> > >> >> IMO that just makes the interaction between vacuum options and the GUC >> even more complicated/confusing. >> > >Yeah, I am also not sure if that will be a good idea. > >> If we want to have a vacuum option to determine parallel degree, we >> should probably have a vacuum option to disable parallelism using just a >> vacuum option. I don't think 0 is too bad, and disable_parallel seems a >> bit awkward. Maybe we could use NOPARALLEL (in addition to PARALLEL n). >> That's what Oracle does, so it's not entirely without a precedent. >> > >We can go either way (using 0 for parallel to indicate disable >parallelism or by introducing a new option like NOPARALLEL). I think >initially we can avoid introducing more options and just go with >'Parallel 0' and if we find a lot of people find it inconvenient, then >we can always introduce a new option later. > I don't think starting with "parallel 0" and then maybe introducing NOPARALLEL sometime in the future is a good plan, because after adding NOPARALLEL we'd either have to remove "parallel 0" (breaking backwards compatibility unnecessarily) or supporting both approaches. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Jan 5, 2020 at 7:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sun, Jan 5, 2020 at 6:40 AM Tomas Vondra > <tomas.vondra@2ndquadrant.com> wrote: > > > > On Sun, Jan 05, 2020 at 08:54:15AM +0900, Masahiko Sawada wrote: > > >On Thu, Jan 2, 2020 at 9:09 PM Amit Kapila <amit.kapila16@gmail.com> > > >wrote: > > >> > > >> Hi, > > >> > > >> I am starting a new thread for some of the decisions for a parallel > > >> vacuum in the hope to get feedback from more people. There are > > >> mainly two points for which we need some feedback. > > >> > > >> 1. Tomas Vondra has pointed out on the main thread [1] that by > > >> default the parallel vacuum should be enabled similar to what we do > > >> for Create Index. As proposed, the patch enables it only when the > > >> user specifies it (ex. Vacuum (Parallel 2) <tbl_name>;). One of the > > >> arguments in favor of enabling it by default as mentioned by Tomas is > > >> "It's pretty much the same thing we did with vacuum throttling - it's > > >> disabled for explicit vacuum by default, but you can enable it. If > > >> you're worried about VACUUM causing issues, you should set cost > > >> delay.". Some of the arguments against enabling it are that it will > > >> lead to use of more resources (like CPU, I/O) which users might or > > >> might like. > > >> > > > > > >I'm a bit wary of making parallel vacuum enabled by default. Single > > >process vacuum does sequential reads/writes on most of indexes but > > >parallel vacuum does random access random reads/writes. I've tested > > >parallel vacuum on HDD and confirmed the performance is good but I'm > > >concerned that it might be cause of more disk I/O than user expected. > > > > > > > I understand the concern, but it's not clear to me why to apply this > > defensive approach just to vacuum and not to all commands. Especially > > when we do have a way to throttle vacuum (unlike pretty much any other > > command) if I/O really is a scarce resource. > > > > As the vacuum workers are separate processes, each generating requests > > with a sequential pattern, so I'd expect readahead to kick in and keep > > the efficiency of sequential access pattern. > > > > Right, I also think so. Okay I understand. > > > >> Now, if we want to enable it by default, we need a way to disable it > > >> as well and along with that, we need a way for users to specify a > > >> parallel degree. I have mentioned a few reasons why we need a > > >> parallel degree for this operation in the email [2] on the main > > >> thread. > > >> > > >> If parallel vacuum is *not* enabled by default, then I think the > > >> current way to enable is fine which is as follows: Vacuum (Parallel > > >> 2) <tbl_name>; > > >> > > >> Here, if the user doesn't specify parallel_degree, then we internally > > >> decide based on number of indexes that support a parallel vacuum with > > >> a maximum of max_parallel_maintenance_workers. > > >> > > >> If the parallel vacuum is enabled by default, then I could think of > > >> the following ways: > > >> > > >> (a) Vacuum (disable_parallel) <tbl_name>; Vacuum (Parallel > > >> <parallel_degree>) <tbl_name>; > > >> > > >> (b) Vacuum (Parallel <parallel_degree>) <tbl_name>; If user > > >> specifies parallel_degree as 0, then disable parallelism. > > >> > > >> (c) ... Any better ideas? > > >> > > > > > >If parallel vacuum is enabled by default, I would prefer (b) but I > > >don't think it's a good idea to accept 0 as parallel degree. If we want > > >to disable parallel vacuum we should max_parallel_maintenance_workers > > >to 0 instead. > > > > > > > IMO that just makes the interaction between vacuum options and the GUC > > even more complicated/confusing. > > > > Yeah, I am also not sure if that will be a good idea. > > > If we want to have a vacuum option to determine parallel degree, we > > should probably have a vacuum option to disable parallelism using just a > > vacuum option. I don't think 0 is too bad, and disable_parallel seems a > > bit awkward. Maybe we could use NOPARALLEL (in addition to PARALLEL n). > > That's what Oracle does, so it's not entirely without a precedent. > > > > We can go either way (using 0 for parallel to indicate disable > parallelism or by introducing a new option like NOPARALLEL). I think > initially we can avoid introducing more options and just go with > 'Parallel 0' and if we find a lot of people find it inconvenient, then > we can always introduce a new option later. Hmm I'm confused. Specifying NOPARALLEL or PARALLEL 0 is the same as setting max_parallel_maintenance_workers to 0, right? We normally set max_parallel_workers_per_gather to 0 to disable parallel queries on a query. So I think that disabling parallel vacuum by setting max_parallel_maintenance_workers to 0 is the same concept. Regarding proposed two options we already have storage parameter parallel_workers and it accepts 0 but PARALLEL 0 looks like contradicted at a glance. And NOPARALLEL is inconsistent with existing DISABLE_XXX options and it's a bit awkward to specify like (NOPARALLEL off). Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Jan 05, 2020 at 09:17:57PM +0900, Masahiko Sawada wrote: >On Sun, Jan 5, 2020 at 7:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> ... >> >> > If we want to have a vacuum option to determine parallel degree, we >> > should probably have a vacuum option to disable parallelism using just a >> > vacuum option. I don't think 0 is too bad, and disable_parallel seems a >> > bit awkward. Maybe we could use NOPARALLEL (in addition to PARALLEL n). >> > That's what Oracle does, so it's not entirely without a precedent. >> > >> >> We can go either way (using 0 for parallel to indicate disable >> parallelism or by introducing a new option like NOPARALLEL). I think >> initially we can avoid introducing more options and just go with >> 'Parallel 0' and if we find a lot of people find it inconvenient, then >> we can always introduce a new option later. > >Hmm I'm confused. Specifying NOPARALLEL or PARALLEL 0 is the same as >setting max_parallel_maintenance_workers to 0, right? We normally set >max_parallel_workers_per_gather to 0 to disable parallel queries on a >query. So I think that disabling parallel vacuum by setting >max_parallel_maintenance_workers to 0 is the same concept. Regarding >proposed two options we already have storage parameter >parallel_workers and it accepts 0 but PARALLEL 0 looks like >contradicted at a glance. And NOPARALLEL is inconsistent with existing >DISABLE_XXX options and it's a bit awkward to specify like (NOPARALLEL >off). > My understanding is the motivation for new vacuum options is a claim that m_p_m_w is not sufficient/suitable for the vacuum case. I've expressed my doubts about this, but let's assume it's the right solution. To me it seems a bit confusing to just fall back to m_p_m_w when it comes to disabling the parallel vacuum. So if we think we need an option to determine vacuum parallel degree, we should have an option to disable parallelism too. I don't care much if it's called DISABLE_PARALLEL, NOPARALLEL or PARALLEL 0, as long as we make our mind and don't unnecessarily break it in the next release. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, 5 Jan 2020 at 22:39, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > On Sun, Jan 05, 2020 at 09:17:57PM +0900, Masahiko Sawada wrote: > >On Sun, Jan 5, 2020 at 7:26 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > >> > >> ... > >> > >> > If we want to have a vacuum option to determine parallel degree, we > >> > should probably have a vacuum option to disable parallelism using just a > >> > vacuum option. I don't think 0 is too bad, and disable_parallel seems a > >> > bit awkward. Maybe we could use NOPARALLEL (in addition to PARALLEL n). > >> > That's what Oracle does, so it's not entirely without a precedent. > >> > > >> > >> We can go either way (using 0 for parallel to indicate disable > >> parallelism or by introducing a new option like NOPARALLEL). I think > >> initially we can avoid introducing more options and just go with > >> 'Parallel 0' and if we find a lot of people find it inconvenient, then > >> we can always introduce a new option later. > > > >Hmm I'm confused. Specifying NOPARALLEL or PARALLEL 0 is the same as > >setting max_parallel_maintenance_workers to 0, right? We normally set > >max_parallel_workers_per_gather to 0 to disable parallel queries on a > >query. So I think that disabling parallel vacuum by setting > >max_parallel_maintenance_workers to 0 is the same concept. Regarding > >proposed two options we already have storage parameter > >parallel_workers and it accepts 0 but PARALLEL 0 looks like > >contradicted at a glance. And NOPARALLEL is inconsistent with existing > >DISABLE_XXX options and it's a bit awkward to specify like (NOPARALLEL > >off). > > > > My understanding is the motivation for new vacuum options is a claim > that m_p_m_w is not sufficient/suitable for the vacuum case. I've > expressed my doubts about this, but let's assume it's the right > solution. To me it seems a bit confusing to just fall back to m_p_m_w > when it comes to disabling the parallel vacuum. > > So if we think we need an option to determine vacuum parallel degree, we > should have an option to disable parallelism too. I don't care much if > it's called DISABLE_PARALLEL, NOPARALLEL or PARALLEL 0, as long as we > make our mind and don't unnecessarily break it in the next release. Okay I got your point. It's just an idea but how about controlling parallel vacuum using two options. That is, we have PARALLEL option that takes a boolean value (true by default) and enables/disables parallel vacuum. And we have WORKERS option that takes an integer more than 1 to specify the number of workers. Of course we should raise an error if only WORKERS option is specified. WORKERS option is optional. If WORKERS option is omitted the number of workers is determined based on the number of indexes on the table. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Jan 5, 2020 at 7:38 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Sun, 5 Jan 2020 at 22:39, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > > > > > So if we think we need an option to determine vacuum parallel degree, we > > should have an option to disable parallelism too. I don't care much if > > it's called DISABLE_PARALLEL, NOPARALLEL or PARALLEL 0, as long as we > > make our mind and don't unnecessarily break it in the next release. > > Fair point. I favor parallel 0 as that avoids adding more options and also it is not very clear whether that is required at all. Till now, if I see most people who have shared their opinion seems to favor this as compared to another idea where we need to introduce more options. > > Okay I got your point. It's just an idea but how about controlling > parallel vacuum using two options. That is, we have PARALLEL option > that takes a boolean value (true by default) and enables/disables > parallel vacuum. And we have WORKERS option that takes an integer more > than 1 to specify the number of workers. Of course we should raise an > error if only WORKERS option is specified. WORKERS option is optional. > If WORKERS option is omitted the number of workers is determined based > on the number of indexes on the table. > I think this would add failure modes without serving any additional purpose. Sure, it might give the better feeling that we have separate options to enable/disable parallelism and then specify the number of workers with a separate option, but we already have various examples as shared by me previously where setting the value as zero means the option is disabled, so why to invent something new here? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Sun, 5 Jan 2020 at 23:28, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sun, Jan 5, 2020 at 7:38 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Sun, 5 Jan 2020 at 22:39, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > > > > > > > > So if we think we need an option to determine vacuum parallel degree, we > > > should have an option to disable parallelism too. I don't care much if > > > it's called DISABLE_PARALLEL, NOPARALLEL or PARALLEL 0, as long as we > > > make our mind and don't unnecessarily break it in the next release. > > > > > Fair point. I favor parallel 0 as that avoids adding more options and > also it is not very clear whether that is required at all. Till now, > if I see most people who have shared their opinion seems to favor this > as compared to another idea where we need to introduce more options. > > > > > Okay I got your point. It's just an idea but how about controlling > > parallel vacuum using two options. That is, we have PARALLEL option > > that takes a boolean value (true by default) and enables/disables > > parallel vacuum. And we have WORKERS option that takes an integer more > > than 1 to specify the number of workers. Of course we should raise an > > error if only WORKERS option is specified. WORKERS option is optional. > > If WORKERS option is omitted the number of workers is determined based > > on the number of indexes on the table. > > > > I think this would add failure modes without serving any additional > purpose. Sure, it might give the better feeling that we have separate > options to enable/disable parallelism and then specify the number of > workers with a separate option, but we already have various examples > as shared by me previously where setting the value as zero means the > option is disabled, so why to invent something new here? I just felt it's not intuitive that specifying parallel degree to 0 means to disable parallel vacuum. But since majority of hackers seem to agree with this syntax I'm not going to insist on that any further. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, 6 Jan 2020 at 15:27, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Sun, 5 Jan 2020 at 23:28, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sun, Jan 5, 2020 at 7:38 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > On Sun, 5 Jan 2020 at 22:39, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > > > > > > > > > > > So if we think we need an option to determine vacuum parallel degree, we > > > > should have an option to disable parallelism too. I don't care much if > > > > it's called DISABLE_PARALLEL, NOPARALLEL or PARALLEL 0, as long as we > > > > make our mind and don't unnecessarily break it in the next release. > > > > > > > > Fair point. I favor parallel 0 as that avoids adding more options and > > also it is not very clear whether that is required at all. Till now, > > if I see most people who have shared their opinion seems to favor this > > as compared to another idea where we need to introduce more options. > > > > > > > > Okay I got your point. It's just an idea but how about controlling > > > parallel vacuum using two options. That is, we have PARALLEL option > > > that takes a boolean value (true by default) and enables/disables > > > parallel vacuum. And we have WORKERS option that takes an integer more > > > than 1 to specify the number of workers. Of course we should raise an > > > error if only WORKERS option is specified. WORKERS option is optional. > > > If WORKERS option is omitted the number of workers is determined based > > > on the number of indexes on the table. > > > > > > > I think this would add failure modes without serving any additional > > purpose. Sure, it might give the better feeling that we have separate > > options to enable/disable parallelism and then specify the number of > > workers with a separate option, but we already have various examples > > as shared by me previously where setting the value as zero means the > > option is disabled, so why to invent something new here? > > I just felt it's not intuitive that specifying parallel degree to 0 > means to disable parallel vacuum. But since majority of hackers seem > to agree with this syntax I'm not going to insist on that any further. > Okay I'm going to go with enabling parallel vacuum by default and disabling it by specifying PARALLEL 0. For combination of VACUUM command options, although parallel vacuum is enabled by default and VACUUM FULL doesn't support it yet, 'VACUUM (FULL)' would work. On the other hand 'VACUUM (FULL, PARALLEL)' and 'VACUUM(FULL, PARALLEL 1)' would not work with error. And I think it is better if 'VACUUM (FULL, PARALLEL 0)' also work but I'd like to hear opinions. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jan 8, 2020 at 11:31 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Mon, 6 Jan 2020 at 15:27, Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > I just felt it's not intuitive that specifying parallel degree to 0 > > means to disable parallel vacuum. But since majority of hackers seem > > to agree with this syntax I'm not going to insist on that any further. > > > > Okay I'm going to go with enabling parallel vacuum by default and > disabling it by specifying PARALLEL 0. > Sounds fine to me. However, I have already started updating the patch for that. I shall post the new version today or tomorrow. Is that fine with you? > For combination of VACUUM command options, although parallel vacuum is > enabled by default and VACUUM FULL doesn't support it yet, 'VACUUM > (FULL)' would work. On the other hand 'VACUUM (FULL, PARALLEL)' and > 'VACUUM(FULL, PARALLEL 1)' would not work with error. And I think it > is better if 'VACUUM (FULL, PARALLEL 0)' also work but I'd like to > hear opinions. > I agree with all these points. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, 8 Jan 2020 at 15:31, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Jan 8, 2020 at 11:31 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > On Mon, 6 Jan 2020 at 15:27, Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > I just felt it's not intuitive that specifying parallel degree to 0 > > > means to disable parallel vacuum. But since majority of hackers seem > > > to agree with this syntax I'm not going to insist on that any further. > > > > > > > Okay I'm going to go with enabling parallel vacuum by default and > > disabling it by specifying PARALLEL 0. > > > > Sounds fine to me. However, I have already started updating the patch > for that. I shall post the new version today or tomorrow. Is that > fine with you? Yes, that's fine. Thanks. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Jan 8, 2020 at 12:01 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > For combination of VACUUM command options, although parallel vacuum is > > enabled by default and VACUUM FULL doesn't support it yet, 'VACUUM > > (FULL)' would work. On the other hand 'VACUUM (FULL, PARALLEL)' and > > 'VACUUM(FULL, PARALLEL 1)' would not work with error. And I think it > > is better if 'VACUUM (FULL, PARALLEL 0)' also work but I'd like to > > hear opinions. > > On again thinking about whether we should allow VACUUM (FULL, PARALLEL 0) case, I am not sure, so, for now, the patch [1] is throwing error for that case, but we can modify it if we want. [1] - https://www.postgresql.org/message-id/CAA4eK1JxWAYTSM4NpTi7Tz%3DsPetbWBWZPpHKxLoEKb%3DgMi%3DGGA%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com