Thread: VACUUM PARALLEL option vs. max_parallel_maintenance_workers
If I read the code correctly, the VACUUM PARALLEL option is capped by the active max_parallel_maintenance_workers setting. So if I write VACUUM (PARALLEL 5), it will still only do 2 by default. Is that correct? The documentation (VACUUM man page) seems to indicate a different behavior. I haven't been able to set up something to test or verify this either way. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Sep 19, 2020 at 1:58 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > If I read the code correctly, the VACUUM PARALLEL option is capped by > the active max_parallel_maintenance_workers setting. So if I write > VACUUM (PARALLEL 5), it will still only do 2 by default. Is that > correct? Yeah, but there is another factor also which is the number of indexes that support parallel vacuum operation. > The documentation (VACUUM man page) seems to indicate a > different behavior. > I think we can change the documentation for parallel option to explain it better. How about: "Perform index vacuum and index cleanup phases of VACUUM in parallel using integer background workers (for the details of each vacuum phase, please refer to Table 27.37). The number of workers is determined based on the number of indexes on the relation that support parallel vacuum operation which is limited by number of workers specified with PARALLEL option if any which is further limited by max_parallel_maintenance_workers." instead of what is currently there? -- With Regards, Amit Kapila.
On 2020-09-19 11:37, Amit Kapila wrote: > I think we can change the documentation for parallel option to explain > it better. How about: "Perform index vacuum and index cleanup phases > of VACUUM in parallel using integer background workers (for the > details of each vacuum phase, please refer to Table 27.37). The number > of workers is determined based on the number of indexes on the > relation that support parallel vacuum operation which is limited by > number of workers specified with PARALLEL option if any which is > further limited by max_parallel_maintenance_workers." instead of what > is currently there? I think the implemented behavior is wrong. The VACUUM PARALLEL option should override the max_parallel_maintenance_worker setting. Otherwise, what's the point of the command option? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Sep 19, 2020 at 4:28 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2020-09-19 11:37, Amit Kapila wrote: > > I think we can change the documentation for parallel option to explain > > it better. How about: "Perform index vacuum and index cleanup phases > > of VACUUM in parallel using integer background workers (for the > > details of each vacuum phase, please refer to Table 27.37). The number > > of workers is determined based on the number of indexes on the > > relation that support parallel vacuum operation which is limited by > > number of workers specified with PARALLEL option if any which is > > further limited by max_parallel_maintenance_workers." instead of what > > is currently there? > > I think the implemented behavior is wrong. > It is the same as what we do for other parallel operations, for example, we limit the number of parallel workers for parallel create index by 'max_parallel_maintenance_workers' and parallel scan operations are limited by 'max_parallel_workers_per_gather'. > The VACUUM PARALLEL option > should override the max_parallel_maintenance_worker setting. > > Otherwise, what's the point of the command option? > It is for the cases where the user has a better idea of workload. We can launch only a limited number of parallel workers 'max_parallel_workers' in the system, so sometimes users would like to use it as per their requirement. If the user omits this option, then we internally compute the required number of workers but again those are limited by max_* guc's. -- With Regards, Amit Kapila.
On 2020-09-19 13:24, Amit Kapila wrote: >> I think the implemented behavior is wrong. > > It is the same as what we do for other parallel operations, for > example, we limit the number of parallel workers for parallel create > index by 'max_parallel_maintenance_workers' and parallel scan > operations are limited by 'max_parallel_workers_per_gather'. But in those cases we don't provide user-visible options to specify a per-command setting, so it's not the same thing, is it? >> The VACUUM PARALLEL option >> should override the max_parallel_maintenance_worker setting. >> >> Otherwise, what's the point of the command option? > > It is for the cases where the user has a better idea of workload. We > can launch only a limited number of parallel workers > 'max_parallel_workers' in the system, so sometimes users would like to > use it as per their requirement. Right, but my point is, it doesn't actually do that correctly. I can't just say, oh, I have a maintenance window, I'd like to run a really fast VACUUM. The PARALLEL option is capped by the setting you'd normally use anyway, so specifying it is useless. The only thing it can do right now is if you want to run a manual VACUUM less parallel than by default. But I don't see how that is often useful. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sun, Sep 20, 2020 at 7:15 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2020-09-19 13:24, Amit Kapila wrote: > >> I think the implemented behavior is wrong. > > > > It is the same as what we do for other parallel operations, for > > example, we limit the number of parallel workers for parallel create > > index by 'max_parallel_maintenance_workers' and parallel scan > > operations are limited by 'max_parallel_workers_per_gather'. > > But in those cases we don't provide user-visible options to specify a > per-command setting, so it's not the same thing, is it? > Not exactly but there also we have a way for the user to set the value (using 'parallel_workers' during Create Table or Alter Table) which will guide the parallel scans. > >> The VACUUM PARALLEL option > >> should override the max_parallel_maintenance_worker setting. > >> > >> Otherwise, what's the point of the command option? > > > > It is for the cases where the user has a better idea of workload. We > > can launch only a limited number of parallel workers > > 'max_parallel_workers' in the system, so sometimes users would like to > > use it as per their requirement. > > Right, but my point is, it doesn't actually do that correctly. I can't > just say, oh, I have a maintenance window, I'd like to run a really fast > VACUUM. The PARALLEL option is capped by the setting you'd normally use > anyway, so specifying it is useless. > Yeah, because by default we choose the maximum number of possible workers for Vacuum. > The only thing it can do right now is if you want to run a manual VACUUM > less parallel than by default. But I don't see how that is often useful. > Say when indexes that support parallel scan are not very big then we don't need the default behavior because it will use more resources while providing not much additional benefit. What according to you should be the behavior here and how will it be better than current? -- With Regards, Amit Kapila.
On 2020-09-21 05:48, Amit Kapila wrote: > What according to you should be the behavior here and how will it be > better than current? I think if I write VACUUM (PARALLEL 5), it should use up to 5 workers (up to the number of indexes), even if max_parallel_maintenance_workers is 2. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Sep 21, 2020 at 12:45 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2020-09-21 05:48, Amit Kapila wrote: > > What according to you should be the behavior here and how will it be > > better than current? > > I think if I write VACUUM (PARALLEL 5), it should use up to 5 workers > (up to the number of indexes), even if max_parallel_maintenance_workers > is 2. > So you want it to disregard max_parallel_maintenance_workers but all parallel operations have to regard one of the max_parallel_* option so that it can respect max_parallel_workers beyond which the system won't allow more parallel workers. Now, if we won't respect one of the max_parallel_* option, it will unnecessarily try to register those many workers even though it won't be able to start those many workers. I think it is better to keep the limit for workers for scans and maintenance operations separately so that the user is allowed to perform different parallel operations in the system. -- With Regards, Amit Kapila.
On Mon, 21 Sep 2020 at 19:15, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2020-09-21 05:48, Amit Kapila wrote: > > What according to you should be the behavior here and how will it be > > better than current? > > I think if I write VACUUM (PARALLEL 5), it should use up to 5 workers > (up to the number of indexes), even if max_parallel_maintenance_workers > is 2. It would be good if we were consistent with these parallel options. Right now max_parallel_workers_per_gather will restrict the parallel_workers reloption. I'd say this max_parallel_workers_per_gather is similar to max_parallel_maintenance_workers here and the PARALLEL vacuum option is like the parallel_workers reloption. If we want VACUUM's parallel option to work the same way as that then max_parallel_maintenance_workers should restrict whatever is mentioned in VACUUM PARALLEL. Or perhaps this is slightly different as the user is explicitly asking for this in the command, but you could likely say the same about ALTER TABLE <table> SET (parallel_workers = N); too. David
On Tue, Sep 22, 2020 at 12:50 PM David Rowley <dgrowleyml@gmail.com> wrote: > > On Mon, 21 Sep 2020 at 19:15, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > > > On 2020-09-21 05:48, Amit Kapila wrote: > > > What according to you should be the behavior here and how will it be > > > better than current? > > > > I think if I write VACUUM (PARALLEL 5), it should use up to 5 workers > > (up to the number of indexes), even if max_parallel_maintenance_workers > > is 2. > > It would be good if we were consistent with these parallel options. > Right now max_parallel_workers_per_gather will restrict the > parallel_workers reloption. I'd say this > max_parallel_workers_per_gather is similar to > max_parallel_maintenance_workers here and the PARALLEL vacuum option > is like the parallel_workers reloption. > > If we want VACUUM's parallel option to work the same way as that then > max_parallel_maintenance_workers should restrict whatever is mentioned > in VACUUM PARALLEL. > > Or perhaps this is slightly different as the user is explicitly asking > for this in the command, but you could likely say the same about ALTER > TABLE <table> SET (parallel_workers = N); too. > This is exactly my feeling too. But how about changing documentation a bit as proposed above [1] to make it precise. [1] - https://www.postgresql.org/message-id/CAA4eK1LQWXS_4RwLo%2BWT7jusGnBkUvXO73xQOCsydWLYBpLBEg%40mail.gmail.com -- With Regards, Amit Kapila.
On 2020-09-26 07:32, Amit Kapila wrote: > This is exactly my feeling too. But how about changing documentation a > bit as proposed above [1] to make it precise. > > [1] - https://www.postgresql.org/message-id/CAA4eK1LQWXS_4RwLo%2BWT7jusGnBkUvXO73xQOCsydWLYBpLBEg%40mail.gmail.com Yes, making the documentation more precise would be good. Right now, it's a bit confusing and unclear (using phrases like "based on"). Someone who wants to the the VACUUM PARALLEL option presumably wants precise control, so specifying the exact rules would be desirable. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Sep 29, 2020 at 3:13 PM Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > > On 2020-09-26 07:32, Amit Kapila wrote: > > This is exactly my feeling too. But how about changing documentation a > > bit as proposed above [1] to make it precise. > > > > [1] - https://www.postgresql.org/message-id/CAA4eK1LQWXS_4RwLo%2BWT7jusGnBkUvXO73xQOCsydWLYBpLBEg%40mail.gmail.com > > Yes, making the documentation more precise would be good. Right now, > it's a bit confusing and unclear (using phrases like "based on"). > Someone who wants to the the VACUUM PARALLEL option presumably wants > precise control, so specifying the exact rules would be desirable. > I have changed the docs to make this clear. Let me know what you think? -- With Regards, Amit Kapila.
Attachment
On Tue, Sep 22, 2020 at 3:20 AM David Rowley <dgrowleyml@gmail.com> wrote: > It would be good if we were consistent with these parallel options. > Right now max_parallel_workers_per_gather will restrict the > parallel_workers reloption. I'd say this > max_parallel_workers_per_gather is similar to > max_parallel_maintenance_workers here and the PARALLEL vacuum option > is like the parallel_workers reloption. > > If we want VACUUM's parallel option to work the same way as that then > max_parallel_maintenance_workers should restrict whatever is mentioned > in VACUUM PARALLEL. > > Or perhaps this is slightly different as the user is explicitly asking > for this in the command, but you could likely say the same about ALTER > TABLE <table> SET (parallel_workers = N); too. There is a subtle difference between these two cases. In the case of a query, there may be multiple table scans involved, all under the same Gather node. So a limit on the Gather node is to some degree a separate constraint on the overall query plan from the reloption applied to a particular table. So there is at least some kind of an argument that it's sensible to combine those limits somehow. I'm not sure I believe it, though. The user probably wants exactly the number of workers they specify, not the GUC value. However, in the VACUUM case, there's no possibility of distinguishing between the parallel operation as a whole and the expectations for a particular table. It's a single operation. It doesn't have a Gather node that might be subject to one limit and a Seq Scan that can be subjected to some other limit. So to me the idea that the command line option doesn't override the GUC is completely strange, especially because OTHER parameters to VACUUM *do* have precisely the effect of overriding GUCs or reloptions. In particular, FREEZE overrides vacuum_freeze_min_age and vacuum_freeze_table_age, and TRUNCATE and INDEX_CLEANUP override the vacuum_truncate and vacuum_index_cleanup reloptions. So I agree with Peter that the implemented behavior is wrong. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Sep 30, 2020 at 9:23 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Tue, Sep 22, 2020 at 3:20 AM David Rowley <dgrowleyml@gmail.com> wrote: > > It would be good if we were consistent with these parallel options. > > Right now max_parallel_workers_per_gather will restrict the > > parallel_workers reloption. I'd say this > > max_parallel_workers_per_gather is similar to > > max_parallel_maintenance_workers here and the PARALLEL vacuum option > > is like the parallel_workers reloption. > > > > If we want VACUUM's parallel option to work the same way as that then > > max_parallel_maintenance_workers should restrict whatever is mentioned > > in VACUUM PARALLEL. > > > > Or perhaps this is slightly different as the user is explicitly asking > > for this in the command, but you could likely say the same about ALTER > > TABLE <table> SET (parallel_workers = N); too. > > There is a subtle difference between these two cases. In the case of a > query, there may be multiple table scans involved, all under the same > Gather node. So a limit on the Gather node is to some degree a > separate constraint on the overall query plan from the reloption > applied to a particular table. So there is at least some kind of an > argument that it's sensible to combine those limits somehow. I'm not > sure I believe it, though. The user probably wants exactly the number > of workers they specify, not the GUC value. > > However, in the VACUUM case, there's no possibility of distinguishing > between the parallel operation as a whole and the expectations for a > particular table. It's a single operation. > But the same is true for the 'Create Index' operation as well where we follow the same thing. We will use the number of workers as specified in reloption (parallel_workers) which is then limited by max_parallel_maintenance_workers. -- With Regards, Amit Kapila.
On Sat, 3 Oct 2020 at 20:03, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Sep 30, 2020 at 9:23 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > On Tue, Sep 22, 2020 at 3:20 AM David Rowley <dgrowleyml@gmail.com> wrote: > > > It would be good if we were consistent with these parallel options. > > > Right now max_parallel_workers_per_gather will restrict the > > > parallel_workers reloption. I'd say this > > > max_parallel_workers_per_gather is similar to > > > max_parallel_maintenance_workers here and the PARALLEL vacuum option > > > is like the parallel_workers reloption. > > > > > > If we want VACUUM's parallel option to work the same way as that then > > > max_parallel_maintenance_workers should restrict whatever is mentioned > > > in VACUUM PARALLEL. > > > > > > Or perhaps this is slightly different as the user is explicitly asking > > > for this in the command, but you could likely say the same about ALTER > > > TABLE <table> SET (parallel_workers = N); too. > > > > There is a subtle difference between these two cases. In the case of a > > query, there may be multiple table scans involved, all under the same > > Gather node. So a limit on the Gather node is to some degree a > > separate constraint on the overall query plan from the reloption > > applied to a particular table. So there is at least some kind of an > > argument that it's sensible to combine those limits somehow. I'm not > > sure I believe it, though. The user probably wants exactly the number > > of workers they specify, not the GUC value. > > > > However, in the VACUUM case, there's no possibility of distinguishing > > between the parallel operation as a whole and the expectations for a > > particular table. It's a single operation. > > > > > But the same is true for the 'Create Index' operation as well where we > follow the same thing. We will use the number of workers as specified > in reloption (parallel_workers) which is then limited by > max_parallel_maintenance_workers. Both opinions have a valid point. To make the behavior of parallel vacuum more consistent with other parallel maintenance commands (i.g., only parallel INDEX CREATE for now), as a second idea, can we make use of parallel_workers reloption in parallel vacuum case as well? That is, when PARALLEL option without an integer is specified or VACUUM command without PARALLEL option, the parallel degree is the number of indexes that support parallel vacuum and are bigger than min_parallel_index_scan_size. If the parallel_workers reloption of the table is set we use it instead. In both cases, the parallel degree is capped by max_parallel_maintenance_workers. OTOH when PARALLEL option with an integer is specified, the parallel degree is the specified integer value and it's capped by max_parallel_workers and the number of indexes that support parallel vacuum and are bigger than min_parallel_index_scan_size. That way the default behavior and the behavior of PARALLEL option without an integer is similar to parallel CREATE INDEX. In addition to it, VACUUM command has an additional way to control the parallel degree beyond max_parallel_maintenance_workers limit by using the command option. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Oct 3, 2020 at 7:03 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > But the same is true for the 'Create Index' operation as well where we > follow the same thing. We will use the number of workers as specified > in reloption (parallel_workers) which is then limited by > max_parallel_maintenance_workers. Well, that seems pretty weird to me too, but surely we want them both to be consistent. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Oct 3, 2020 at 9:25 AM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > To make the behavior of parallel vacuum more consistent with other > parallel maintenance commands (i.g., only parallel INDEX CREATE for > now), as a second idea, can we make use of parallel_workers reloption > in parallel vacuum case as well? That seems like a terrible idea to me. I don't see why the number of workers that some user thinks should be used to perform a scan on the table as part of the query should be the same as the number of workers that should be used for a maintenance operation. We get in trouble every time we try to reuse a setting for an unrelated purpose. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Oct 3, 2020 at 6:55 PM Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote: > > On Sat, 3 Oct 2020 at 20:03, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Sep 30, 2020 at 9:23 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > > > On Tue, Sep 22, 2020 at 3:20 AM David Rowley <dgrowleyml@gmail.com> wrote: > > > > It would be good if we were consistent with these parallel options. > > > > Right now max_parallel_workers_per_gather will restrict the > > > > parallel_workers reloption. I'd say this > > > > max_parallel_workers_per_gather is similar to > > > > max_parallel_maintenance_workers here and the PARALLEL vacuum option > > > > is like the parallel_workers reloption. > > > > > > > > If we want VACUUM's parallel option to work the same way as that then > > > > max_parallel_maintenance_workers should restrict whatever is mentioned > > > > in VACUUM PARALLEL. > > > > > > > > Or perhaps this is slightly different as the user is explicitly asking > > > > for this in the command, but you could likely say the same about ALTER > > > > TABLE <table> SET (parallel_workers = N); too. > > > > > > There is a subtle difference between these two cases. In the case of a > > > query, there may be multiple table scans involved, all under the same > > > Gather node. So a limit on the Gather node is to some degree a > > > separate constraint on the overall query plan from the reloption > > > applied to a particular table. So there is at least some kind of an > > > argument that it's sensible to combine those limits somehow. I'm not > > > sure I believe it, though. The user probably wants exactly the number > > > of workers they specify, not the GUC value. > > > > > > However, in the VACUUM case, there's no possibility of distinguishing > > > between the parallel operation as a whole and the expectations for a > > > particular table. It's a single operation. > > > > > > > > > But the same is true for the 'Create Index' operation as well where we > > follow the same thing. We will use the number of workers as specified > > in reloption (parallel_workers) which is then limited by > > max_parallel_maintenance_workers. > > Both opinions have a valid point. > > To make the behavior of parallel vacuum more consistent with other > parallel maintenance commands (i.g., only parallel INDEX CREATE for > now), as a second idea, can we make use of parallel_workers reloption > in parallel vacuum case as well? That is, when PARALLEL option without > an integer is specified or VACUUM command without PARALLEL option, the > parallel degree is the number of indexes that support parallel vacuum > and are bigger than min_parallel_index_scan_size. If the > parallel_workers reloption of the table is set we use it instead. In > both cases, the parallel degree is capped by > max_parallel_maintenance_workers. OTOH when PARALLEL option with an > integer is specified, the parallel degree is the specified integer > value and it's capped by max_parallel_workers and the number of > indexes that support parallel vacuum and are bigger than > min_parallel_index_scan_size. > This seems more difficult to explain and have more variable parts. I think one of the blogs I recently read about this work [1] (see section: Parallel VACUUM & Better Support for Append-only Workloads) explains the currently implemented behavior (related to the workers) nicely and in simple words. Now unless I or the person who wrote that blog missed something it appears to me that the current implemented behavior is understood by others who might not be even directly involved in this work which to some extent indicates that users will be able to use currently implemented behavior without difficulty. I think we can keep the current behavior as it is and wait to see if we see any complaints from the users trying to use it. [1] - https://pganalyze.com/blog/postgres13-better-performance-monitoring-usability?utm_source=PostgresWeeklyPrimary09302020 -- With Regards, Amit Kapila.
At Sat, 3 Oct 2020 22:25:14 +0900, Masahiko Sawada <masahiko.sawada@2ndquadrant.com> wrote in > On Sat, 3 Oct 2020 at 20:03, Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Wed, Sep 30, 2020 at 9:23 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > > > On Tue, Sep 22, 2020 at 3:20 AM David Rowley <dgrowleyml@gmail.com> wrote: > > > > It would be good if we were consistent with these parallel options. > > > > Right now max_parallel_workers_per_gather will restrict the > > > > parallel_workers reloption. I'd say this > > > > max_parallel_workers_per_gather is similar to > > > > max_parallel_maintenance_workers here and the PARALLEL vacuum option > > > > is like the parallel_workers reloption. > > > > > > > > If we want VACUUM's parallel option to work the same way as that then > > > > max_parallel_maintenance_workers should restrict whatever is mentioned > > > > in VACUUM PARALLEL. > > > > > > > > Or perhaps this is slightly different as the user is explicitly asking > > > > for this in the command, but you could likely say the same about ALTER > > > > TABLE <table> SET (parallel_workers = N); too. > > > > > > There is a subtle difference between these two cases. In the case of a > > > query, there may be multiple table scans involved, all under the same > > > Gather node. So a limit on the Gather node is to some degree a > > > separate constraint on the overall query plan from the reloption > > > applied to a particular table. So there is at least some kind of an > > > argument that it's sensible to combine those limits somehow. I'm not > > > sure I believe it, though. The user probably wants exactly the number > > > of workers they specify, not the GUC value. > > > > > > However, in the VACUUM case, there's no possibility of distinguishing > > > between the parallel operation as a whole and the expectations for a > > > particular table. It's a single operation. > > > > > > > > > But the same is true for the 'Create Index' operation as well where we > > follow the same thing. We will use the number of workers as specified > > in reloption (parallel_workers) which is then limited by > > max_parallel_maintenance_workers. > > Both opinions have a valid point. I think the purpose of the variable is to cap the number of workers that the system *automatically determines*. It seems reasolable to ignore the limit as far as it is commanded by a super user. But I don't think a non-superuser doesn't have such a pvigilege. On the other hand I'm not sure whether it's the right thing to allow super users to exhaust the reserved capacity and whether it's worth that complexity. > To make the behavior of parallel vacuum more consistent with other > parallel maintenance commands (i.g., only parallel INDEX CREATE for > now), as a second idea, can we make use of parallel_workers reloption > in parallel vacuum case as well? That is, when PARALLEL option without The varialble is thougt as the number of workers for paralle-scan of create index. It is totally different characteristcs from that for parallel vacuum. If we had parallel_maintenance_workers, it'd be usable for vacuum, but I don't want to add that reloption too much.. > an integer is specified or VACUUM command without PARALLEL option, the > parallel degree is the number of indexes that support parallel vacuum > and are bigger than min_parallel_index_scan_size. If the > parallel_workers reloption of the table is set we use it instead. In > both cases, the parallel degree is capped by > max_parallel_maintenance_workers. OTOH when PARALLEL option with an > integer is specified, the parallel degree is the specified integer > value and it's capped by max_parallel_workers and the number of > indexes that support parallel vacuum and are bigger than > min_parallel_index_scan_size. > > That way the default behavior and the behavior of PARALLEL option > without an integer is similar to parallel CREATE INDEX. In addition to > it, VACUUM command has an additional way to control the parallel > degree beyond max_parallel_maintenance_workers limit by using the > command option. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, 5 Oct 2020 at 11:21, Robert Haas <robertmhaas@gmail.com> wrote: > > On Sat, Oct 3, 2020 at 9:25 AM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > To make the behavior of parallel vacuum more consistent with other > > parallel maintenance commands (i.g., only parallel INDEX CREATE for > > now), as a second idea, can we make use of parallel_workers reloption > > in parallel vacuum case as well? > > That seems like a terrible idea to me. I don't see why the number of > workers that some user thinks should be used to perform a scan on the > table as part of the query should be the same as the number of workers > that should be used for a maintenance operation. Agreed. But the same is true for parallel REINDEX? It's also a maintenance operation. In any case, the thing would get more complex if lazy vacuum or vacuum full were to support parallel operation on table scan in the future. Regards, -- Masahiko Sawada http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Mon, Oct 5, 2020 at 8:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Sat, Oct 3, 2020 at 6:55 PM Masahiko Sawada > <masahiko.sawada@2ndquadrant.com> wrote: > > > > To make the behavior of parallel vacuum more consistent with other > > parallel maintenance commands (i.g., only parallel INDEX CREATE for > > now), as a second idea, can we make use of parallel_workers reloption > > in parallel vacuum case as well? That is, when PARALLEL option without > > an integer is specified or VACUUM command without PARALLEL option, the > > parallel degree is the number of indexes that support parallel vacuum > > and are bigger than min_parallel_index_scan_size. If the > > parallel_workers reloption of the table is set we use it instead. In > > both cases, the parallel degree is capped by > > max_parallel_maintenance_workers. OTOH when PARALLEL option with an > > integer is specified, the parallel degree is the specified integer > > value and it's capped by max_parallel_workers and the number of > > indexes that support parallel vacuum and are bigger than > > min_parallel_index_scan_size. > > > > This seems more difficult to explain and have more variable parts. I > think one of the blogs I recently read about this work [1] (see > section: > Parallel VACUUM & Better Support for Append-only Workloads) explains > the currently implemented behavior (related to the workers) nicely and > in simple words. Now unless I or the person who wrote that blog missed > something it appears to me that the current implemented behavior is > understood by others who might not be even directly involved in this > work which to some extent indicates that users will be able to use > currently implemented behavior without difficulty. I think we can keep > the current behavior as it is and wait to see if we see any complaints > from the users trying to use it. > I am planning to commit the patch (by early next week) posted above thread [1] to make the docs consistent with what we have in code. Do let me know if you think otherwise or if you have better ideas? We can think of improving the code as a separate patch if we think it is important. [1] - https://www.postgresql.org/message-id/CAA4eK1Km5VVmdPpdMNSA414uRFJKVw8r_A7ORpL-0pHnYfGpLw%40mail.gmail.com -- With Regards, Amit Kapila.
On Thu, Oct 15, 2020 at 9:02 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Oct 5, 2020 at 8:11 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Sat, Oct 3, 2020 at 6:55 PM Masahiko Sawada > > <masahiko.sawada@2ndquadrant.com> wrote: > > > > > > To make the behavior of parallel vacuum more consistent with other > > > parallel maintenance commands (i.g., only parallel INDEX CREATE for > > > now), as a second idea, can we make use of parallel_workers reloption > > > in parallel vacuum case as well? That is, when PARALLEL option without > > > an integer is specified or VACUUM command without PARALLEL option, the > > > parallel degree is the number of indexes that support parallel vacuum > > > and are bigger than min_parallel_index_scan_size. If the > > > parallel_workers reloption of the table is set we use it instead. In > > > both cases, the parallel degree is capped by > > > max_parallel_maintenance_workers. OTOH when PARALLEL option with an > > > integer is specified, the parallel degree is the specified integer > > > value and it's capped by max_parallel_workers and the number of > > > indexes that support parallel vacuum and are bigger than > > > min_parallel_index_scan_size. > > > > > > > This seems more difficult to explain and have more variable parts. I > > think one of the blogs I recently read about this work [1] (see > > section: > > Parallel VACUUM & Better Support for Append-only Workloads) explains > > the currently implemented behavior (related to the workers) nicely and > > in simple words. Now unless I or the person who wrote that blog missed > > something it appears to me that the current implemented behavior is > > understood by others who might not be even directly involved in this > > work which to some extent indicates that users will be able to use > > currently implemented behavior without difficulty. I think we can keep > > the current behavior as it is and wait to see if we see any complaints > > from the users trying to use it. > > > > I am planning to commit the patch (by early next week) posted above > thread [1] to make the docs consistent with what we have in code. > Pushed. -- With Regards, Amit Kapila.