Re: parallel vacuum options/syntax - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: parallel vacuum options/syntax
Date
Msg-id 20200105011034.ufg6oa2ziyebaild@development
Whole thread Raw
In response to Re: parallel vacuum options/syntax  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: parallel vacuum options/syntax
List pgsql-hackers
On Sun, Jan 05, 2020 at 08:54:15AM +0900, Masahiko Sawada wrote:
>On Thu, Jan 2, 2020 at 9:09 PM Amit Kapila <amit.kapila16@gmail.com>
>wrote:
>>
>> Hi,
>>
>> I am starting a new thread for some of the decisions for a parallel
>> vacuum in the hope to get feedback from more people.  There are
>> mainly two points for which we need some feedback.
>>
>> 1. Tomas Vondra has pointed out on the main thread [1] that by
>> default the parallel vacuum should be enabled similar to what we do
>> for Create Index.  As proposed, the patch enables it only when the
>> user specifies it (ex. Vacuum (Parallel 2) <tbl_name>;).   One of the
>> arguments in favor of enabling it by default as mentioned by Tomas is
>> "It's pretty much the same thing we did with vacuum throttling - it's
>> disabled for explicit vacuum by default, but you can enable it. If
>> you're worried about VACUUM causing issues, you should set cost
>> delay.".  Some of the arguments against enabling it are that it will
>> lead to use of more resources (like CPU, I/O) which users might or
>> might like.
>>
>
>I'm a bit wary of making parallel vacuum enabled by default. Single
>process vacuum does sequential reads/writes on most of indexes but
>parallel vacuum does random access random reads/writes. I've tested
>parallel vacuum on HDD and confirmed the performance is good but I'm
>concerned that it might be cause of more disk I/O than user expected.
>

I understand the concern, but it's not clear to me why to apply this
defensive approach just to vacuum and not to all commands. Especially
when we do have a way to throttle vacuum (unlike pretty much any other
command) if I/O really is a scarce resource.

As the vacuum workers are separate processes, each generating requests
with a sequential pattern, so I'd expect readahead to kick in and keep
the efficiency of sequential access pattern.

>> Now, if we want to enable it by default, we need a way to disable it
>> as well and along with that, we need a way for users to specify a
>> parallel degree.  I have mentioned a few reasons why we need a
>> parallel degree for this operation in the email [2] on the main
>> thread.
>>
>> If parallel vacuum is *not* enabled by default, then I think the
>> current way to enable is fine which is as follows: Vacuum (Parallel
>> 2) <tbl_name>;
>>
>> Here, if the user doesn't specify parallel_degree, then we internally
>> decide based on number of indexes that support a parallel vacuum with
>> a maximum of max_parallel_maintenance_workers.
>>
>> If the parallel vacuum is enabled by default, then I could think of
>> the following ways:
>>
>> (a) Vacuum (disable_parallel) <tbl_name>;  Vacuum (Parallel
>> <parallel_degree>) <tbl_name>;
>>
>> (b) Vacuum (Parallel <parallel_degree>) <tbl_name>;  If user
>> specifies parallel_degree as 0, then disable parallelism.
>>
>> (c) ... Any better ideas?
>>
>
>If parallel vacuum is enabled by default, I would prefer (b) but I
>don't think it's a good idea to accept 0 as parallel degree. If we want
>to disable parallel vacuum we should max_parallel_maintenance_workers
>to 0 instead.
>

IMO that just makes the interaction between vacuum options and the GUC
even more complicated/confusing.

If we want to have a vacuum option to determine parallel degree, we
should probably have a vacuum option to disable parallelism using just a
vacuum option. I don't think 0 is too bad, and disable_parallel seems a
bit awkward. Maybe we could use NOPARALLEL (in addition to PARALLEL n).
That's what Oracle does, so it's not entirely without a precedent.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: pgsql: Add basic TAP tests for psql's tab-completion logic.
Next
From: Noah Misch
Date:
Subject: Re: Spurious "apparent wraparound" via SimpleLruTruncate() rounding