Thread: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Peter Eisentraut
Date:
If I read the code correctly, the VACUUM PARALLEL option is capped by 
the active max_parallel_maintenance_workers setting.  So if I write 
VACUUM (PARALLEL 5), it will still only do 2 by default.  Is that 
correct?  The documentation (VACUUM man page) seems to indicate a 
different behavior.

I haven't been able to set up something to test or verify this either way.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Amit Kapila
Date:
On Sat, Sep 19, 2020 at 1:58 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> If I read the code correctly, the VACUUM PARALLEL option is capped by
> the active max_parallel_maintenance_workers setting.  So if I write
> VACUUM (PARALLEL 5), it will still only do 2 by default.  Is that
> correct?

Yeah, but there is another factor also which is the number of indexes
that support parallel vacuum operation.

>  The documentation (VACUUM man page) seems to indicate a
> different behavior.
>

I think we can change the documentation for parallel option to explain
it better. How about: "Perform index vacuum and index cleanup phases
of VACUUM in parallel using integer background workers (for the
details of each vacuum phase, please refer to Table 27.37). The number
of workers is determined based on the number of indexes on the
relation that support parallel vacuum operation which is limited by
number of workers specified with PARALLEL option if any which is
further limited by max_parallel_maintenance_workers." instead of what
is currently there?

-- 
With Regards,
Amit Kapila.



Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Peter Eisentraut
Date:
On 2020-09-19 11:37, Amit Kapila wrote:
> I think we can change the documentation for parallel option to explain
> it better. How about: "Perform index vacuum and index cleanup phases
> of VACUUM in parallel using integer background workers (for the
> details of each vacuum phase, please refer to Table 27.37). The number
> of workers is determined based on the number of indexes on the
> relation that support parallel vacuum operation which is limited by
> number of workers specified with PARALLEL option if any which is
> further limited by max_parallel_maintenance_workers." instead of what
> is currently there?

I think the implemented behavior is wrong.  The VACUUM PARALLEL option 
should override the max_parallel_maintenance_worker setting.

Otherwise, what's the point of the command option?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Amit Kapila
Date:
On Sat, Sep 19, 2020 at 4:28 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> On 2020-09-19 11:37, Amit Kapila wrote:
> > I think we can change the documentation for parallel option to explain
> > it better. How about: "Perform index vacuum and index cleanup phases
> > of VACUUM in parallel using integer background workers (for the
> > details of each vacuum phase, please refer to Table 27.37). The number
> > of workers is determined based on the number of indexes on the
> > relation that support parallel vacuum operation which is limited by
> > number of workers specified with PARALLEL option if any which is
> > further limited by max_parallel_maintenance_workers." instead of what
> > is currently there?
>
> I think the implemented behavior is wrong.
>

It is the same as what we do for other parallel operations, for
example, we limit the number of parallel workers for parallel create
index by 'max_parallel_maintenance_workers' and parallel scan
operations are limited by 'max_parallel_workers_per_gather'.

>  The VACUUM PARALLEL option
> should override the max_parallel_maintenance_worker setting.
>
> Otherwise, what's the point of the command option?
>

It is for the cases where the user has a better idea of workload. We
can launch only a limited number of parallel workers
'max_parallel_workers' in the system, so sometimes users would like to
use it as per their requirement. If the user omits this option, then
we internally compute the required number of workers but again those
are limited by max_* guc's.

-- 
With Regards,
Amit Kapila.



Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Peter Eisentraut
Date:
On 2020-09-19 13:24, Amit Kapila wrote:
>> I think the implemented behavior is wrong.
> 
> It is the same as what we do for other parallel operations, for
> example, we limit the number of parallel workers for parallel create
> index by 'max_parallel_maintenance_workers' and parallel scan
> operations are limited by 'max_parallel_workers_per_gather'.

But in those cases we don't provide user-visible options to specify a 
per-command setting, so it's not the same thing, is it?

>>   The VACUUM PARALLEL option
>> should override the max_parallel_maintenance_worker setting.
>>
>> Otherwise, what's the point of the command option?
> 
> It is for the cases where the user has a better idea of workload. We
> can launch only a limited number of parallel workers
> 'max_parallel_workers' in the system, so sometimes users would like to
> use it as per their requirement.

Right, but my point is, it doesn't actually do that correctly.  I can't 
just say, oh, I have a maintenance window, I'd like to run a really fast 
VACUUM.  The PARALLEL option is capped by the setting you'd normally use 
anyway, so specifying it is useless.

The only thing it can do right now is if you want to run a manual VACUUM 
less parallel than by default.  But I don't see how that is often useful.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Amit Kapila
Date:
On Sun, Sep 20, 2020 at 7:15 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> On 2020-09-19 13:24, Amit Kapila wrote:
> >> I think the implemented behavior is wrong.
> >
> > It is the same as what we do for other parallel operations, for
> > example, we limit the number of parallel workers for parallel create
> > index by 'max_parallel_maintenance_workers' and parallel scan
> > operations are limited by 'max_parallel_workers_per_gather'.
>
> But in those cases we don't provide user-visible options to specify a
> per-command setting, so it's not the same thing, is it?
>

Not exactly but there also we have a way for the user to set the value
(using 'parallel_workers' during Create Table or Alter Table) which
will guide the parallel scans.

> >>   The VACUUM PARALLEL option
> >> should override the max_parallel_maintenance_worker setting.
> >>
> >> Otherwise, what's the point of the command option?
> >
> > It is for the cases where the user has a better idea of workload. We
> > can launch only a limited number of parallel workers
> > 'max_parallel_workers' in the system, so sometimes users would like to
> > use it as per their requirement.
>
> Right, but my point is, it doesn't actually do that correctly.  I can't
> just say, oh, I have a maintenance window, I'd like to run a really fast
> VACUUM.  The PARALLEL option is capped by the setting you'd normally use
> anyway, so specifying it is useless.
>

Yeah, because by default we choose the maximum number of possible
workers for Vacuum.

> The only thing it can do right now is if you want to run a manual VACUUM
> less parallel than by default.  But I don't see how that is often useful.
>

Say when indexes that support parallel scan are not very big then we
don't need the default behavior because it will use more resources
while providing not much additional benefit.

What according to you should be the behavior here and how will it be
better than current?

-- 
With Regards,
Amit Kapila.



Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Peter Eisentraut
Date:
On 2020-09-21 05:48, Amit Kapila wrote:
> What according to you should be the behavior here and how will it be
> better than current?

I think if I write VACUUM (PARALLEL 5), it should use up to 5 workers 
(up to the number of indexes), even if max_parallel_maintenance_workers 
is 2.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Amit Kapila
Date:
On Mon, Sep 21, 2020 at 12:45 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> On 2020-09-21 05:48, Amit Kapila wrote:
> > What according to you should be the behavior here and how will it be
> > better than current?
>
> I think if I write VACUUM (PARALLEL 5), it should use up to 5 workers
> (up to the number of indexes), even if max_parallel_maintenance_workers
> is 2.
>

So you want it to disregard max_parallel_maintenance_workers but all
parallel operations have to regard one of the max_parallel_* option so
that it can respect max_parallel_workers beyond which the system won't
allow more parallel workers. Now, if we won't respect one of the
max_parallel_* option, it will unnecessarily try to register those
many workers even though it won't be able to start those many workers.
I think it is better to keep the limit for workers for scans and
maintenance operations separately so that the user is allowed to
perform different parallel operations in the system.

-- 
With Regards,
Amit Kapila.



Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
David Rowley
Date:
On Mon, 21 Sep 2020 at 19:15, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> On 2020-09-21 05:48, Amit Kapila wrote:
> > What according to you should be the behavior here and how will it be
> > better than current?
>
> I think if I write VACUUM (PARALLEL 5), it should use up to 5 workers
> (up to the number of indexes), even if max_parallel_maintenance_workers
> is 2.

It would be good if we were consistent with these parallel options.
Right now max_parallel_workers_per_gather will restrict the
parallel_workers reloption.  I'd say this
max_parallel_workers_per_gather is similar to
max_parallel_maintenance_workers here and the PARALLEL vacuum option
is like the parallel_workers reloption.

If we want VACUUM's parallel option to work the same way as that then
max_parallel_maintenance_workers should restrict whatever is mentioned
in VACUUM PARALLEL.

Or perhaps this is slightly different as the user is explicitly asking
for this in the command, but you could likely say the same about ALTER
TABLE <table> SET (parallel_workers = N); too.

David



Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Amit Kapila
Date:
On Tue, Sep 22, 2020 at 12:50 PM David Rowley <dgrowleyml@gmail.com> wrote:
>
> On Mon, 21 Sep 2020 at 19:15, Peter Eisentraut
> <peter.eisentraut@2ndquadrant.com> wrote:
> >
> > On 2020-09-21 05:48, Amit Kapila wrote:
> > > What according to you should be the behavior here and how will it be
> > > better than current?
> >
> > I think if I write VACUUM (PARALLEL 5), it should use up to 5 workers
> > (up to the number of indexes), even if max_parallel_maintenance_workers
> > is 2.
>
> It would be good if we were consistent with these parallel options.
> Right now max_parallel_workers_per_gather will restrict the
> parallel_workers reloption.  I'd say this
> max_parallel_workers_per_gather is similar to
> max_parallel_maintenance_workers here and the PARALLEL vacuum option
> is like the parallel_workers reloption.
>
> If we want VACUUM's parallel option to work the same way as that then
> max_parallel_maintenance_workers should restrict whatever is mentioned
> in VACUUM PARALLEL.
>
> Or perhaps this is slightly different as the user is explicitly asking
> for this in the command, but you could likely say the same about ALTER
> TABLE <table> SET (parallel_workers = N); too.
>

This is exactly my feeling too. But how about changing documentation a
bit as proposed above [1] to make it precise.

[1] - https://www.postgresql.org/message-id/CAA4eK1LQWXS_4RwLo%2BWT7jusGnBkUvXO73xQOCsydWLYBpLBEg%40mail.gmail.com

-- 
With Regards,
Amit Kapila.



Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Peter Eisentraut
Date:
On 2020-09-26 07:32, Amit Kapila wrote:
> This is exactly my feeling too. But how about changing documentation a
> bit as proposed above [1] to make it precise.
> 
> [1] - https://www.postgresql.org/message-id/CAA4eK1LQWXS_4RwLo%2BWT7jusGnBkUvXO73xQOCsydWLYBpLBEg%40mail.gmail.com

Yes, making the documentation more precise would be good.  Right now, 
it's a bit confusing and unclear (using phrases like "based on"). 
Someone who wants to the the VACUUM PARALLEL option presumably  wants 
precise control, so specifying the exact rules would be desirable.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Amit Kapila
Date:
On Tue, Sep 29, 2020 at 3:13 PM Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
>
> On 2020-09-26 07:32, Amit Kapila wrote:
> > This is exactly my feeling too. But how about changing documentation a
> > bit as proposed above [1] to make it precise.
> >
> > [1] - https://www.postgresql.org/message-id/CAA4eK1LQWXS_4RwLo%2BWT7jusGnBkUvXO73xQOCsydWLYBpLBEg%40mail.gmail.com
>
> Yes, making the documentation more precise would be good.  Right now,
> it's a bit confusing and unclear (using phrases like "based on").
> Someone who wants to the the VACUUM PARALLEL option presumably  wants
> precise control, so specifying the exact rules would be desirable.
>

I have changed the docs to make this clear. Let me know what you think?

-- 
With Regards,
Amit Kapila.

Attachment

Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Robert Haas
Date:
On Tue, Sep 22, 2020 at 3:20 AM David Rowley <dgrowleyml@gmail.com> wrote:
> It would be good if we were consistent with these parallel options.
> Right now max_parallel_workers_per_gather will restrict the
> parallel_workers reloption.  I'd say this
> max_parallel_workers_per_gather is similar to
> max_parallel_maintenance_workers here and the PARALLEL vacuum option
> is like the parallel_workers reloption.
>
> If we want VACUUM's parallel option to work the same way as that then
> max_parallel_maintenance_workers should restrict whatever is mentioned
> in VACUUM PARALLEL.
>
> Or perhaps this is slightly different as the user is explicitly asking
> for this in the command, but you could likely say the same about ALTER
> TABLE <table> SET (parallel_workers = N); too.

There is a subtle difference between these two cases. In the case of a
query, there may be multiple table scans involved, all under the same
Gather node. So a limit on the Gather node is to some degree a
separate constraint on the overall query plan from the reloption
applied to a particular table. So there is at least some kind of an
argument that it's sensible to combine those limits somehow. I'm not
sure I believe it, though. The user probably wants exactly the number
of workers they specify, not the GUC value.

However, in the VACUUM case, there's no possibility of distinguishing
between the parallel operation as a whole and the expectations for a
particular table. It's a single operation. It doesn't have a Gather
node that might be subject to one limit and a Seq Scan that can be
subjected to some other limit. So to me the idea that the command line
option doesn't override the GUC is completely strange, especially
because OTHER parameters to VACUUM *do* have precisely the effect of
overriding GUCs or reloptions. In particular, FREEZE overrides
vacuum_freeze_min_age and vacuum_freeze_table_age, and TRUNCATE and
INDEX_CLEANUP override the vacuum_truncate and vacuum_index_cleanup
reloptions.

So I agree with Peter that the implemented behavior is wrong.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Amit Kapila
Date:
On Wed, Sep 30, 2020 at 9:23 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Sep 22, 2020 at 3:20 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > It would be good if we were consistent with these parallel options.
> > Right now max_parallel_workers_per_gather will restrict the
> > parallel_workers reloption.  I'd say this
> > max_parallel_workers_per_gather is similar to
> > max_parallel_maintenance_workers here and the PARALLEL vacuum option
> > is like the parallel_workers reloption.
> >
> > If we want VACUUM's parallel option to work the same way as that then
> > max_parallel_maintenance_workers should restrict whatever is mentioned
> > in VACUUM PARALLEL.
> >
> > Or perhaps this is slightly different as the user is explicitly asking
> > for this in the command, but you could likely say the same about ALTER
> > TABLE <table> SET (parallel_workers = N); too.
>
> There is a subtle difference between these two cases. In the case of a
> query, there may be multiple table scans involved, all under the same
> Gather node. So a limit on the Gather node is to some degree a
> separate constraint on the overall query plan from the reloption
> applied to a particular table. So there is at least some kind of an
> argument that it's sensible to combine those limits somehow. I'm not
> sure I believe it, though. The user probably wants exactly the number
> of workers they specify, not the GUC value.
>
> However, in the VACUUM case, there's no possibility of distinguishing
> between the parallel operation as a whole and the expectations for a
> particular table. It's a single operation.
>


But the same is true for the 'Create Index' operation as well where we
follow the same thing. We will use the number of workers as specified
in reloption (parallel_workers) which is then limited by
max_parallel_maintenance_workers.

-- 
With Regards,
Amit Kapila.



Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Masahiko Sawada
Date:
On Sat, 3 Oct 2020 at 20:03, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Sep 30, 2020 at 9:23 PM Robert Haas <robertmhaas@gmail.com> wrote:
> >
> > On Tue, Sep 22, 2020 at 3:20 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > > It would be good if we were consistent with these parallel options.
> > > Right now max_parallel_workers_per_gather will restrict the
> > > parallel_workers reloption.  I'd say this
> > > max_parallel_workers_per_gather is similar to
> > > max_parallel_maintenance_workers here and the PARALLEL vacuum option
> > > is like the parallel_workers reloption.
> > >
> > > If we want VACUUM's parallel option to work the same way as that then
> > > max_parallel_maintenance_workers should restrict whatever is mentioned
> > > in VACUUM PARALLEL.
> > >
> > > Or perhaps this is slightly different as the user is explicitly asking
> > > for this in the command, but you could likely say the same about ALTER
> > > TABLE <table> SET (parallel_workers = N); too.
> >
> > There is a subtle difference between these two cases. In the case of a
> > query, there may be multiple table scans involved, all under the same
> > Gather node. So a limit on the Gather node is to some degree a
> > separate constraint on the overall query plan from the reloption
> > applied to a particular table. So there is at least some kind of an
> > argument that it's sensible to combine those limits somehow. I'm not
> > sure I believe it, though. The user probably wants exactly the number
> > of workers they specify, not the GUC value.
> >
> > However, in the VACUUM case, there's no possibility of distinguishing
> > between the parallel operation as a whole and the expectations for a
> > particular table. It's a single operation.
> >
>
>
> But the same is true for the 'Create Index' operation as well where we
> follow the same thing. We will use the number of workers as specified
> in reloption (parallel_workers) which is then limited by
> max_parallel_maintenance_workers.

Both opinions have a valid point.

To make the behavior of parallel vacuum more consistent with other
parallel maintenance commands (i.g., only parallel INDEX CREATE for
now), as a second idea, can we make use of parallel_workers reloption
in parallel vacuum case as well? That is, when PARALLEL option without
an integer is specified or VACUUM command without PARALLEL option, the
parallel degree is the number of indexes that support parallel vacuum
and are bigger than min_parallel_index_scan_size. If the
parallel_workers reloption of the table is set we use it instead. In
both cases, the parallel degree is capped by
max_parallel_maintenance_workers. OTOH when PARALLEL option with an
integer is specified, the parallel degree is the specified integer
value and it's capped by max_parallel_workers and the number of
indexes that support parallel vacuum and are bigger than
min_parallel_index_scan_size.

That way the default behavior and the behavior of PARALLEL option
without an integer is similar to parallel CREATE INDEX. In addition to
it, VACUUM command has an additional way to control the parallel
degree beyond max_parallel_maintenance_workers limit by using the
command option.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Robert Haas
Date:
On Sat, Oct 3, 2020 at 7:03 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> But the same is true for the 'Create Index' operation as well where we
> follow the same thing. We will use the number of workers as specified
> in reloption (parallel_workers) which is then limited by
> max_parallel_maintenance_workers.

Well, that seems pretty weird to me too, but surely we want them both
to be consistent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Robert Haas
Date:
On Sat, Oct 3, 2020 at 9:25 AM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
> To make the behavior of parallel vacuum more consistent with other
> parallel maintenance commands (i.g., only parallel INDEX CREATE for
> now), as a second idea, can we make use of parallel_workers reloption
> in parallel vacuum case as well?

That seems like a terrible idea to me. I don't see why the number of
workers that some user thinks should be used to perform a scan on the
table as part of the query should be the same as the number of workers
that should be used for a maintenance operation. We get in trouble
every time we try to reuse a setting for an unrelated purpose.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Amit Kapila
Date:
On Sat, Oct 3, 2020 at 6:55 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Sat, 3 Oct 2020 at 20:03, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Sep 30, 2020 at 9:23 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > >
> > > On Tue, Sep 22, 2020 at 3:20 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > > > It would be good if we were consistent with these parallel options.
> > > > Right now max_parallel_workers_per_gather will restrict the
> > > > parallel_workers reloption.  I'd say this
> > > > max_parallel_workers_per_gather is similar to
> > > > max_parallel_maintenance_workers here and the PARALLEL vacuum option
> > > > is like the parallel_workers reloption.
> > > >
> > > > If we want VACUUM's parallel option to work the same way as that then
> > > > max_parallel_maintenance_workers should restrict whatever is mentioned
> > > > in VACUUM PARALLEL.
> > > >
> > > > Or perhaps this is slightly different as the user is explicitly asking
> > > > for this in the command, but you could likely say the same about ALTER
> > > > TABLE <table> SET (parallel_workers = N); too.
> > >
> > > There is a subtle difference between these two cases. In the case of a
> > > query, there may be multiple table scans involved, all under the same
> > > Gather node. So a limit on the Gather node is to some degree a
> > > separate constraint on the overall query plan from the reloption
> > > applied to a particular table. So there is at least some kind of an
> > > argument that it's sensible to combine those limits somehow. I'm not
> > > sure I believe it, though. The user probably wants exactly the number
> > > of workers they specify, not the GUC value.
> > >
> > > However, in the VACUUM case, there's no possibility of distinguishing
> > > between the parallel operation as a whole and the expectations for a
> > > particular table. It's a single operation.
> > >
> >
> >
> > But the same is true for the 'Create Index' operation as well where we
> > follow the same thing. We will use the number of workers as specified
> > in reloption (parallel_workers) which is then limited by
> > max_parallel_maintenance_workers.
>
> Both opinions have a valid point.
>
> To make the behavior of parallel vacuum more consistent with other
> parallel maintenance commands (i.g., only parallel INDEX CREATE for
> now), as a second idea, can we make use of parallel_workers reloption
> in parallel vacuum case as well? That is, when PARALLEL option without
> an integer is specified or VACUUM command without PARALLEL option, the
> parallel degree is the number of indexes that support parallel vacuum
> and are bigger than min_parallel_index_scan_size. If the
> parallel_workers reloption of the table is set we use it instead. In
> both cases, the parallel degree is capped by
> max_parallel_maintenance_workers. OTOH when PARALLEL option with an
> integer is specified, the parallel degree is the specified integer
> value and it's capped by max_parallel_workers and the number of
> indexes that support parallel vacuum and are bigger than
> min_parallel_index_scan_size.
>

This seems more difficult to explain and have more variable parts. I
think one of the blogs I recently read about this work [1] (see
section:
Parallel VACUUM & Better Support for Append-only Workloads) explains
the currently implemented behavior (related to the workers) nicely and
in simple words. Now unless I or the person who wrote that blog missed
something it appears to me that the current implemented behavior is
understood by others who might not be even directly involved in this
work which to some extent indicates that users will be able to use
currently implemented behavior without difficulty. I think we can keep
the current behavior as it is and wait to see if we see any complaints
from the users trying to use it.

[1] -
https://pganalyze.com/blog/postgres13-better-performance-monitoring-usability?utm_source=PostgresWeeklyPrimary09302020

-- 
With Regards,
Amit Kapila.



Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Kyotaro Horiguchi
Date:
At Sat, 3 Oct 2020 22:25:14 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in 
> On Sat, 3 Oct 2020 at 20:03, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Wed, Sep 30, 2020 at 9:23 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > >
> > > On Tue, Sep 22, 2020 at 3:20 AM David Rowley <dgrowleyml@gmail.com> wrote:
> > > > It would be good if we were consistent with these parallel options.
> > > > Right now max_parallel_workers_per_gather will restrict the
> > > > parallel_workers reloption.  I'd say this
> > > > max_parallel_workers_per_gather is similar to
> > > > max_parallel_maintenance_workers here and the PARALLEL vacuum option
> > > > is like the parallel_workers reloption.
> > > >
> > > > If we want VACUUM's parallel option to work the same way as that then
> > > > max_parallel_maintenance_workers should restrict whatever is mentioned
> > > > in VACUUM PARALLEL.
> > > >
> > > > Or perhaps this is slightly different as the user is explicitly asking
> > > > for this in the command, but you could likely say the same about ALTER
> > > > TABLE <table> SET (parallel_workers = N); too.
> > >
> > > There is a subtle difference between these two cases. In the case of a
> > > query, there may be multiple table scans involved, all under the same
> > > Gather node. So a limit on the Gather node is to some degree a
> > > separate constraint on the overall query plan from the reloption
> > > applied to a particular table. So there is at least some kind of an
> > > argument that it's sensible to combine those limits somehow. I'm not
> > > sure I believe it, though. The user probably wants exactly the number
> > > of workers they specify, not the GUC value.
> > >
> > > However, in the VACUUM case, there's no possibility of distinguishing
> > > between the parallel operation as a whole and the expectations for a
> > > particular table. It's a single operation.
> > >
> >
> >
> > But the same is true for the 'Create Index' operation as well where we
> > follow the same thing. We will use the number of workers as specified
> > in reloption (parallel_workers) which is then limited by
> > max_parallel_maintenance_workers.
> 
> Both opinions have a valid point.

I think the purpose of the variable is to cap the number of workers
that the system *automatically determines*. It seems reasolable to
ignore the limit as far as it is commanded by a super user. But I
don't think a non-superuser doesn't have such a pvigilege. On the
other hand I'm not sure whether it's the right thing to allow super
users to exhaust the reserved capacity and whether it's worth that
complexity.

> To make the behavior of parallel vacuum more consistent with other
> parallel maintenance commands (i.g., only parallel INDEX CREATE for
> now), as a second idea, can we make use of parallel_workers reloption
> in parallel vacuum case as well? That is, when PARALLEL option without

The varialble is thougt as the number of workers for paralle-scan of
create index. It is totally different characteristcs from that for
parallel vacuum.  If we had parallel_maintenance_workers, it'd be
usable for vacuum, but I don't want to add that reloption too much..

> an integer is specified or VACUUM command without PARALLEL option, the
> parallel degree is the number of indexes that support parallel vacuum
> and are bigger than min_parallel_index_scan_size. If the
> parallel_workers reloption of the table is set we use it instead. In
> both cases, the parallel degree is capped by
> max_parallel_maintenance_workers. OTOH when PARALLEL option with an
> integer is specified, the parallel degree is the specified integer
> value and it's capped by max_parallel_workers and the number of
> indexes that support parallel vacuum and are bigger than
> min_parallel_index_scan_size.
> 
> That way the default behavior and the behavior of PARALLEL option
> without an integer is similar to parallel CREATE INDEX. In addition to
> it, VACUUM command has an additional way to control the parallel
> degree beyond max_parallel_maintenance_workers limit by using the
> command option.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Masahiko Sawada
Date:
On Mon, 5 Oct 2020 at 11:21, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sat, Oct 3, 2020 at 9:25 AM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> > To make the behavior of parallel vacuum more consistent with other
> > parallel maintenance commands (i.g., only parallel INDEX CREATE for
> > now), as a second idea, can we make use of parallel_workers reloption
> > in parallel vacuum case as well?
>
> That seems like a terrible idea to me. I don't see why the number of
> workers that some user thinks should be used to perform a scan on the
> table as part of the query should be the same as the number of workers
> that should be used for a maintenance operation.

Agreed. But the same is true for parallel REINDEX? It's also a
maintenance operation.

In any case, the thing would get more complex if lazy vacuum or vacuum
full were to support parallel operation on table scan in the future.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Amit Kapila
Date:
On Mon, Oct 5, 2020 at 8:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sat, Oct 3, 2020 at 6:55 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > To make the behavior of parallel vacuum more consistent with other
> > parallel maintenance commands (i.g., only parallel INDEX CREATE for
> > now), as a second idea, can we make use of parallel_workers reloption
> > in parallel vacuum case as well? That is, when PARALLEL option without
> > an integer is specified or VACUUM command without PARALLEL option, the
> > parallel degree is the number of indexes that support parallel vacuum
> > and are bigger than min_parallel_index_scan_size. If the
> > parallel_workers reloption of the table is set we use it instead. In
> > both cases, the parallel degree is capped by
> > max_parallel_maintenance_workers. OTOH when PARALLEL option with an
> > integer is specified, the parallel degree is the specified integer
> > value and it's capped by max_parallel_workers and the number of
> > indexes that support parallel vacuum and are bigger than
> > min_parallel_index_scan_size.
> >
>
> This seems more difficult to explain and have more variable parts. I
> think one of the blogs I recently read about this work [1] (see
> section:
> Parallel VACUUM & Better Support for Append-only Workloads) explains
> the currently implemented behavior (related to the workers) nicely and
> in simple words. Now unless I or the person who wrote that blog missed
> something it appears to me that the current implemented behavior is
> understood by others who might not be even directly involved in this
> work which to some extent indicates that users will be able to use
> currently implemented behavior without difficulty. I think we can keep
> the current behavior as it is and wait to see if we see any complaints
> from the users trying to use it.
>

I am planning to commit the patch (by early next week) posted above
thread [1] to make the docs consistent with what we have in code. Do
let me know if you think otherwise or if you have better ideas? We can
think of improving the code as a separate patch if we think it is
important.

[1] - https://www.postgresql.org/message-id/CAA4eK1Km5VVmdPpdMNSA414uRFJKVw8r_A7ORpL-0pHnYfGpLw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.



Re: VACUUM PARALLEL option vs. max_parallel_maintenance_workers

From
Amit Kapila
Date:
On Thu, Oct 15, 2020 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Mon, Oct 5, 2020 at 8:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > On Sat, Oct 3, 2020 at 6:55 PM Masahiko Sawada
> > <masahiko.sawada@2ndquadrant.com> wrote:
> > >
> > > To make the behavior of parallel vacuum more consistent with other
> > > parallel maintenance commands (i.g., only parallel INDEX CREATE for
> > > now), as a second idea, can we make use of parallel_workers reloption
> > > in parallel vacuum case as well? That is, when PARALLEL option without
> > > an integer is specified or VACUUM command without PARALLEL option, the
> > > parallel degree is the number of indexes that support parallel vacuum
> > > and are bigger than min_parallel_index_scan_size. If the
> > > parallel_workers reloption of the table is set we use it instead. In
> > > both cases, the parallel degree is capped by
> > > max_parallel_maintenance_workers. OTOH when PARALLEL option with an
> > > integer is specified, the parallel degree is the specified integer
> > > value and it's capped by max_parallel_workers and the number of
> > > indexes that support parallel vacuum and are bigger than
> > > min_parallel_index_scan_size.
> > >
> >
> > This seems more difficult to explain and have more variable parts. I
> > think one of the blogs I recently read about this work [1] (see
> > section:
> > Parallel VACUUM & Better Support for Append-only Workloads) explains
> > the currently implemented behavior (related to the workers) nicely and
> > in simple words. Now unless I or the person who wrote that blog missed
> > something it appears to me that the current implemented behavior is
> > understood by others who might not be even directly involved in this
> > work which to some extent indicates that users will be able to use
> > currently implemented behavior without difficulty. I think we can keep
> > the current behavior as it is and wait to see if we see any complaints
> > from the users trying to use it.
> >
>
> I am planning to commit the patch (by early next week) posted above
> thread [1] to make the docs consistent with what we have in code.
>

Pushed.

-- 
With Regards,
Amit Kapila.