Re: parallel vacuum options/syntax - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: parallel vacuum options/syntax |
Date | |
Msg-id | 20200105011034.ufg6oa2ziyebaild@development Whole thread Raw |
In response to | Re: parallel vacuum options/syntax (Masahiko Sawada <sawada.mshk@gmail.com>) |
Responses |
Re: parallel vacuum options/syntax
|
List | pgsql-hackers |
On Sun, Jan 05, 2020 at 08:54:15AM +0900, Masahiko Sawada wrote: >On Thu, Jan 2, 2020 at 9:09 PM Amit Kapila <amit.kapila16@gmail.com> >wrote: >> >> Hi, >> >> I am starting a new thread for some of the decisions for a parallel >> vacuum in the hope to get feedback from more people. There are >> mainly two points for which we need some feedback. >> >> 1. Tomas Vondra has pointed out on the main thread [1] that by >> default the parallel vacuum should be enabled similar to what we do >> for Create Index. As proposed, the patch enables it only when the >> user specifies it (ex. Vacuum (Parallel 2) <tbl_name>;). One of the >> arguments in favor of enabling it by default as mentioned by Tomas is >> "It's pretty much the same thing we did with vacuum throttling - it's >> disabled for explicit vacuum by default, but you can enable it. If >> you're worried about VACUUM causing issues, you should set cost >> delay.". Some of the arguments against enabling it are that it will >> lead to use of more resources (like CPU, I/O) which users might or >> might like. >> > >I'm a bit wary of making parallel vacuum enabled by default. Single >process vacuum does sequential reads/writes on most of indexes but >parallel vacuum does random access random reads/writes. I've tested >parallel vacuum on HDD and confirmed the performance is good but I'm >concerned that it might be cause of more disk I/O than user expected. > I understand the concern, but it's not clear to me why to apply this defensive approach just to vacuum and not to all commands. Especially when we do have a way to throttle vacuum (unlike pretty much any other command) if I/O really is a scarce resource. As the vacuum workers are separate processes, each generating requests with a sequential pattern, so I'd expect readahead to kick in and keep the efficiency of sequential access pattern. >> Now, if we want to enable it by default, we need a way to disable it >> as well and along with that, we need a way for users to specify a >> parallel degree. I have mentioned a few reasons why we need a >> parallel degree for this operation in the email [2] on the main >> thread. >> >> If parallel vacuum is *not* enabled by default, then I think the >> current way to enable is fine which is as follows: Vacuum (Parallel >> 2) <tbl_name>; >> >> Here, if the user doesn't specify parallel_degree, then we internally >> decide based on number of indexes that support a parallel vacuum with >> a maximum of max_parallel_maintenance_workers. >> >> If the parallel vacuum is enabled by default, then I could think of >> the following ways: >> >> (a) Vacuum (disable_parallel) <tbl_name>; Vacuum (Parallel >> <parallel_degree>) <tbl_name>; >> >> (b) Vacuum (Parallel <parallel_degree>) <tbl_name>; If user >> specifies parallel_degree as 0, then disable parallelism. >> >> (c) ... Any better ideas? >> > >If parallel vacuum is enabled by default, I would prefer (b) but I >don't think it's a good idea to accept 0 as parallel degree. If we want >to disable parallel vacuum we should max_parallel_maintenance_workers >to 0 instead. > IMO that just makes the interaction between vacuum options and the GUC even more complicated/confusing. If we want to have a vacuum option to determine parallel degree, we should probably have a vacuum option to disable parallelism using just a vacuum option. I don't think 0 is too bad, and disable_parallel seems a bit awkward. Maybe we could use NOPARALLEL (in addition to PARALLEL n). That's what Oracle does, so it's not entirely without a precedent. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: