Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: [HACKERS] Block level parallel vacuum
Date
Msg-id 20191230130714.5kc62bwwrcuj6gtk@development
Whole thread Raw
In response to Re: [HACKERS] Block level parallel vacuum  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [HACKERS] Block level parallel vacuum
List pgsql-hackers
On Mon, Dec 30, 2019 at 10:40:39AM +0530, Amit Kapila wrote:
>On Mon, Dec 30, 2019 at 2:53 AM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Sun, Dec 29, 2019 at 10:06:23PM +0900, Masahiko Sawada wrote:
>> >>
>> >> v40-0003-Add-FAST-option-to-vacuum-command.patch
>> >> ------------------------------------------------
>> >>
>> >> I do have a bit of an issue with this part - I'm not quite convinved we
>> >> actually need a FAST option, and I actually suspect we'll come to regret
>> >> it sooner than later. AFAIK it pretty much does exactly the same thing
>> >> as setting vacuum_cost_delay to 0, and IMO it's confusing to provide
>> >> multiple ways to do the same thing - I do expect reports from confused
>> >> users on pgsql-bugs etc. Why is setting vacuum_cost_delay directly not a
>> >> sufficient solution?
>> >
>> >I think the motivation of this option is similar to FREEZE. I think
>> >it's sometimes a good idea to have a shortcut of popular usage and
>> >make it have an name corresponding to its job. From that perspective I
>> >think having FAST option would make sense but maybe we need more
>> >discussion the combination parallel vacuum and vacuum delay.
>> >
>>
>> OK. I think it's mostly independent piece, so maybe we should move it to
>> a separate patch. It's more likely to get attention/feedback when not
>> buried in this thread.
>>
>
>+1.  It is already a separate patch and I think we can even discuss
>more on it in a new thread once the main patch is committed or do you
>think we should have a conclusion about it now itself?  To me, this
>option appears to be an extension to the main feature which can be
>useful for some users and people might like to have a separate option,
>so we can discuss it and get broader feedback after the main patch is
>committed.
>

I don't think it's an extension of the main feature - it does not depend
on it, it could be committed before or after the parallel vacuum (with
some conflicts, but the feature itself is not affected).

My point was that by moving it into a separate thread we're more likely
to get feedback on it, e.g. from people who don't feel like reviewing
the parallel vacuum feature and/or feel intimidated by t100+ messages in
this thread.

>> >>
>> >> The same thing applies to the PARALLEL flag, added in 0002, BTW. Why do
>> >> we need a separate VACUUM option, instead of just using the existing
>> >> max_parallel_maintenance_workers GUC?
>> >>
>
>How will user specify parallel degree?  The parallel degree is helpful
>because in some cases users can decide how many workers should be
>launched based on size and type of indexes.
>

By setting max_maintenance_parallel_workers.

>> >> It's good enough for CREATE INDEX
>> >> so why not here?
>> >
>
>That is a different feature and I think here users can make a better
>judgment based on the size of indexes.  Moreover, users have an option
>to control a parallel degree for 'Create Index' via Alter Table
><tbl_name> Set (parallel_workers = <n>) which I am not sure is a good
>idea for parallel vacuum as the parallelism is more derived from size
>and type of indexes.  Now, we can think of a similar parameter at the
>table/index level for parallel vacuum, but I don't see it equally
>useful in this case.
>

I'm a bit skeptical about users being able to pick good parallel degree.
If we (i.e. experienced developers/hackers with quite a bit of
knowledge) can't come up with a reasonable heuristics, how likely is it
that a regular user will come up with something better?

Not sure I understand why "parallel_workers" would not be suitable for
parallel vacuum? I mean, even for CREATE INDEX it certainly matters the
size/type of indexes, no?

I may be wrong in both cases, of course.

>> >AFAIR There was no such discussion so far but I think one reason could
>> >be that parallel vacuum should be disabled by default. If the parallel
>> >vacuum uses max_parallel_maintenance_workers (2 by default) rather
>> >than having the option the parallel vacuum would work with default
>> >setting but I think that it would become a big impact for user because
>> >the disk access could become random reads and writes when some indexes
>> >are on the same tablespace.
>> >
>>
>> I'm not quite convinced VACUUM should have parallelism disabled by
>> default. I know some people argued we should do that because making
>> vacuum faster may put pressure on other parts of the system. Which is
>> true, but I don't think the solution is to make vacuum slower by
>> default. IMHO we should do the opposite - have it parallel by default
>> (as driven by max_parallel_maintenance_workers), and have an option
>> to disable parallelism.
>>
>
>I think driving parallelism for vacuum by
>max_parallel_maintenance_workers might not be sufficient.  We need to
>give finer control as it depends a lot on the size of indexes. Also,
>unlike Create Index, Vacuum can be performed on an entire database and
>it is quite possible that some tables/indexes are relatively smaller
>and forcing parallelism on them by default might slow down the
>operation.
>

Why wouldn't it be sufficient? Why couldn't this use similar logic to
what we have in plan_create_index_workers for CREATE INDEX?

Sure, it may be useful to give power users a way to override the default
logic, but I very much doubt users can make reliable judgments about
parallelism.

Also, it's not like the risks are comparable in those two cases. If you
have very large table with a lot of indexes, the gains with parallel
vacuum are pretty much bound to be significant, possibly 10x or more.
OTOH if the table is small, parallelism may not give you much and it may
even be less efficient, but I doubt it's going to be 10x slower. And
considering min_parallel_index_scan_size already protects us against
this, at least partially.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pgsql-hackers by date:

Previous
From: Pierre Ducroquet
Date:
Subject: Re: [PATCH] fix a performance issue with multiple logical-decoding walsenders
Next
From: Tomas Vondra
Date:
Subject: Re: [HACKERS] Block level parallel vacuum