Thread: doc review for parallel vacuum

doc review for parallel vacuum

From
Justin Pryzby
Date:
Original, long thread

https://www.postgresql.org/message-id/flat/CAA4eK1%2Bnw1FBK3_sDnW%2B7kB%2Bx4qbDJqetgqwYW8k2xv82RZ%2BKw%40mail.gmail.com#b1745ee853b137043e584b500b41300f

diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index ab1b8c2398..140637983a 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -237,15 +237,15 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     <term><literal>PARALLEL</literal></term>
     <listitem>
      <para>
-      Perform vacuum index and cleanup index phases of <command>VACUUM</command>
+      Perform vacuum index and index cleanup phases of <command>VACUUM</command>
       in parallel using <replaceable class="parameter">integer</replaceable>
-      background workers (for the detail of each vacuum phases, please
+      background workers (for the detail of each vacuum phase, please
       refer to <xref linkend="vacuum-phases"/>).  If the
-      <literal>PARALLEL</literal> option is omitted, then
-      <command>VACUUM</command> decides the number of workers based on number
-      of indexes that support parallel vacuum operation on the relation which
-      is further limited by <xref linkend="guc-max-parallel-workers-maintenance"/>.
-      The index can participate in a parallel vacuum if and only if the size
+      <literal>PARALLEL</literal> option is omitted, then the number of workers
+      is determined based on the number of indexes that support parallel vacuum
+      operation on the relation, and is further limited by <xref
+      linkend="guc-max-parallel-workers-maintenance"/>.
+      An index can participate in parallel vacuum if and only if the size
       of the index is more than <xref linkend="guc-min-parallel-index-scan-size"/>.
       Please note that it is not guaranteed that the number of parallel workers
       specified in <replaceable class="parameter">integer</replaceable> will
@@ -253,7 +253,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
       workers than specified, or even with no workers at all.  Only one worker
       can be used per index.  So parallel workers are launched only when there
       are at least <literal>2</literal> indexes in the table.  Workers for
-      vacuum launches before starting each phase and exit at the end of
+      vacuum are launched before the start of each phase and terminate at the end of
       the phase.  These behaviors might change in a future release.  This
       option can't be used with the <literal>FULL</literal> option.
      </para>
@@ -372,7 +372,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ <replaceable class="paramet
     <command>VACUUM</command> causes a substantial increase in I/O traffic,
     which might cause poor performance for other active sessions.  Therefore,
     it is sometimes advisable to use the cost-based vacuum delay feature.  For
-    parallel vacuum, each worker sleeps proportional to the work done by that
+    parallel vacuum, each worker sleeps proportionally to the work done by that
     worker.  See <xref linkend="runtime-config-resource-vacuum-cost"/> for
     details.
    </para>

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index e017db4446..0ab0bea312 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -194,7 +194,7 @@ typedef struct LVShared
      * live tuples in the index vacuum case or the new live tuples in the
      * index cleanup case.
      *
-     * estimated_count is true if the reltuples is an estimated value.
+     * estimated_count is true if reltuples is an estimated value.
      */
     double        reltuples;
     bool        estimated_count;
@@ -757,7 +757,7 @@ skip_blocks(Relation onerel, VacuumParams *params, BlockNumber *next_unskippable
  *        to reclaim dead line pointers.
  *
  *        If the table has at least two indexes, we execute both index vacuum
- *        and index cleanup with parallel workers unless the parallel vacuum is
+ *        and index cleanup with parallel workers unless parallel vacuum is
  *        disabled.  In a parallel vacuum, we enter parallel mode and then
  *        create both the parallel context and the DSM segment before starting
  *        heap scan so that we can record dead tuples to the DSM segment.  All
@@ -836,7 +836,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
     vacrelstats->latestRemovedXid = InvalidTransactionId;
 
     /*
-     * Initialize the state for a parallel vacuum.  As of now, only one worker
+     * Initialize state for a parallel vacuum.  As of now, only one worker
      * can be used for an index, so we invoke parallelism only if there are at
      * least two indexes on a table.
      */
@@ -864,7 +864,7 @@ lazy_scan_heap(Relation onerel, VacuumParams *params, LVRelStats *vacrelstats,
     }
 
     /*
-     * Allocate the space for dead tuples in case the parallel vacuum is not
+     * Allocate the space for dead tuples in case parallel vacuum is not
      * initialized.
      */
     if (!ParallelVacuumIsActive(lps))
@@ -2111,7 +2111,7 @@ parallel_vacuum_index(Relation *Irel, IndexBulkDeleteResult **stats,
         shared_indstats = get_indstats(lvshared, idx);
 
         /*
-         * Skip processing indexes that doesn't participate in parallel
+         * Skip processing indexes that don't participate in parallel
          * operation
          */
         if (shared_indstats == NULL ||
@@ -2223,7 +2223,7 @@ vacuum_one_index(Relation indrel, IndexBulkDeleteResult **stats,
         shared_indstats->updated = true;
 
         /*
-         * Now that the stats[idx] points to the DSM segment, we don't need
+         * Now that stats[idx] points to the DSM segment, we don't need
          * the locally allocated results.
          */
         pfree(*stats);
@@ -2329,7 +2329,7 @@ lazy_vacuum_index(Relation indrel, IndexBulkDeleteResult **stats,
  *    lazy_cleanup_index() -- do post-vacuum cleanup for one index relation.
  *
  *        reltuples is the number of heap tuples and estimated_count is true
- *        if the reltuples is an estimated value.
+ *        if reltuples is an estimated value.
  */
 static void
 lazy_cleanup_index(Relation indrel,
@@ -2916,9 +2916,9 @@ heap_page_is_all_visible(Relation rel, Buffer buf,
 /*
  * Compute the number of parallel worker processes to request.  Both index
  * vacuum and index cleanup can be executed with parallel workers.  The index
- * is eligible for parallel vacuum iff it's size is greater than
+ * is eligible for parallel vacuum iff its size is greater than
  * min_parallel_index_scan_size as invoking workers for very small indexes
- * can hurt the performance.
+ * can hurt performance.
  *
  * nrequested is the number of parallel workers that user requested.  If
  * nrequested is 0, we compute the parallel degree based on nindexes, that is
@@ -2937,7 +2937,7 @@ compute_parallel_vacuum_workers(Relation *Irel, int nindexes, int nrequested,
     int            i;
 
     /*
-     * We don't allow to perform parallel operation in standalone backend or
+     * We don't allow performing parallel operation in standalone backend or
      * when parallelism is disabled.
      */
     if (!IsUnderPostmaster || max_parallel_maintenance_workers == 0)
@@ -3010,7 +3010,7 @@ prepare_index_statistics(LVShared *lvshared, bool *can_parallel_vacuum,
 }
 
 /*
- * Update index statistics in pg_class if the statistics is accurate.
+ * Update index statistics in pg_class if the statistics are accurate.
  */
 static void
 update_index_statistics(Relation *Irel, IndexBulkDeleteResult **stats,
@@ -3181,7 +3181,7 @@ begin_parallel_vacuum(Oid relid, Relation *Irel, LVRelStats *vacrelstats,
 /*
  * Destroy the parallel context, and end parallel mode.
  *
- * Since writes are not allowed during the parallel mode, so we copy the
+ * Since writes are not allowed during parallel mode, copy the
  * updated index statistics from DSM in local memory and then later use that
  * to update the index statistics.  One might think that we can exit from
  * parallel mode, update the index statistics and then destroy parallel
@@ -3288,7 +3288,7 @@ skip_parallel_vacuum_index(Relation indrel, LVShared *lvshared)
  * Perform work within a launched parallel process.
  *
  * Since parallel vacuum workers perform only index vacuum or index cleanup,
- * we don't need to report the progress information.
+ * we don't need to report progress information.
  */
 void
 parallel_vacuum_main(dsm_segment *seg, shm_toc *toc)
diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c
index df06e7d174..ac348b312c 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -493,7 +493,7 @@ ReinitializeParallelDSM(ParallelContext *pcxt)
 
 /*
  * Reinitialize parallel workers for a parallel context such that we could
- * launch the different number of workers.  This is required for cases where
+ * launch a different number of workers.  This is required for cases where
  * we need to reuse the same DSM segment, but the number of workers can
  * vary from run-to-run.
  */
diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c
index d625d17bf4..76d33b1ba2 100644
--- a/src/backend/commands/vacuum.c
+++ b/src/backend/commands/vacuum.c
@@ -2034,23 +2034,23 @@ vacuum_delay_point(void)
 /*
  * Computes the vacuum delay for parallel workers.
  *
- * The basic idea of a cost-based vacuum delay for parallel vacuum is to allow
- * each worker to sleep proportional to the work done by it.  We achieve this
+ * The basic idea of a cost-based delay for parallel vacuum is to force
+ * each worker to sleep in proportion to the share of work it's done.  We achieve this
  * by allowing all parallel vacuum workers including the leader process to
  * have a shared view of cost related parameters (mainly VacuumCostBalance).
- * We allow each worker to update it as and when it has incurred any cost and
+ * We allow each worker to update it AS AND WHEN it has incurred any cost and
  * then based on that decide whether it needs to sleep.  We compute the time
  * to sleep for a worker based on the cost it has incurred
  * (VacuumCostBalanceLocal) and then reduce the VacuumSharedCostBalance by
- * that amount.  This avoids letting the workers sleep who have done less or
- * no I/O as compared to other workers and therefore can ensure that workers
- * who are doing more I/O got throttled more.
+ * that amount.  This avoids putting to sleep those workers which have done less
+ * I/O than other workers and therefore ensure that workers
+ * which are doing more I/O got throttled more.
  *
- * We allow any worker to sleep only if it has performed the I/O above a
+ * We force a worker to sleep only if it has performed I/O above a
  * certain threshold, which is calculated based on the number of active
  * workers (VacuumActiveNWorkers), and the overall cost balance is more than
- * VacuumCostLimit set by the system.  The testing reveals that we achieve
- * the required throttling if we allow a worker that has done more than 50%
+ * VacuumCostLimit set by the system.  Testing reveals that we achieve
+ * the required throttling if we force a worker that has done more than 50%
  * of its share of work to sleep.
  */
 static double
diff --git a/src/include/commands/vacuum.h b/src/include/commands/vacuum.h
index 2779bea5c9..a4cd721400 100644
--- a/src/include/commands/vacuum.h
+++ b/src/include/commands/vacuum.h
@@ -225,7 +225,7 @@ typedef struct VacuumParams
 
     /*
      * The number of parallel vacuum workers.  0 by default which means choose
-     * based on the number of indexes.  -1 indicates a parallel vacuum is
+     * based on the number of indexes.  -1 indicates parallel vacuum is
      * disabled.
      */
     int            nworkers;
-- 
2.17.0


Attachment

Re: doc review for parallel vacuum

From
Amit Kapila
Date:
On Sun, Mar 22, 2020 at 7:48 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> Original, long thread
>
https://www.postgresql.org/message-id/flat/CAA4eK1%2Bnw1FBK3_sDnW%2B7kB%2Bx4qbDJqetgqwYW8k2xv82RZ%2BKw%40mail.gmail.com#b1745ee853b137043e584b500b41300f
>

I have comments/questions on the patches:
doc changes
-------------------------
1.
      <para>
-      Perform vacuum index and cleanup index phases of
<command>VACUUM</command>
+      Perform vacuum index and index cleanup phases of
<command>VACUUM</command>

Why is the proposed text improvement over what is already there?

2.
If the
-      <literal>PARALLEL</literal> option is omitted, then
-      <command>VACUUM</command> decides the number of workers based on number
-      of indexes that support parallel vacuum operation on the relation which
-      is further limited by <xref
linkend="guc-max-parallel-workers-maintenance"/>.
-      The index can participate in a parallel vacuum if and only if the size
+      <literal>PARALLEL</literal> option is omitted, then the number of workers
+      is determined based on the number of indexes that support parallel vacuum
+      operation on the relation, and is further limited by <xref
+      linkend="guc-max-parallel-workers-maintenance"/>.
+      An index can participate in parallel vacuum if and only if the size
       of the index is more than <xref
linkend="guc-min-parallel-index-scan-size"/>.

Here, it is not clear to me why the proposed text is better than what
we already have?

3.
..
-      vacuum launches before starting each phase and exit at the end of
+      vacuum are launched before the start of each phase and
terminate at the end of

It is better to use 'exit' instead of 'ternimate' as we are not
forcing the workers to end rather we wait for them to exit.



Patch changes
-------------------------
1.
- * and index cleanup with parallel workers unless the parallel vacuum is
+ * and index cleanup with parallel workers unless parallel vacuum is

As per my understanding 'parallel vacuum' is a noun phrase, so we
should have a determiner before it.

2.
- * Since writes are not allowed during the parallel mode, so we copy the
+ * Since writes are not allowed during parallel mode, copy the

Similar to above, I think here also we should have a determiner before
'parallel mode'.

3.
- * The basic idea of a cost-based vacuum delay for parallel vacuum is to allow
- * each worker to sleep proportional to the work done by it.  We achieve this
+ * The basic idea of a cost-based delay for parallel vacuum is to force
+ * each worker to sleep in proportion to the share of work it's done.
We achieve this

I am not sure if it is an improvement to use 'to force' instead of 'to
allow' because there is another criteria as well which decides whether
the worker will sleep or not.  I am also not sure if the second change
(share of work it's) in this sentence is a clear improvement.

4.
- * We allow each worker to update it as and when it has incurred any cost and
+ * We allow each worker to update it AS AND WHEN it has incurred any cost and

I don't see why it is necessary to make this part bold?  We are using
it at one other place in the code in the way it is used here.

5.
- * We allow any worker to sleep only if it has performed the I/O above a
+ * We force a worker to sleep only if it has performed I/O above a
  * certain threshold

Hmm, again I am not sure if we should use 'force' here instead of 'allow'.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: doc review for parallel vacuum

From
Amit Kapila
Date:
On Mon, Mar 23, 2020 at 10:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Sun, Mar 22, 2020 at 7:48 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > Original, long thread
> >
https://www.postgresql.org/message-id/flat/CAA4eK1%2Bnw1FBK3_sDnW%2B7kB%2Bx4qbDJqetgqwYW8k2xv82RZ%2BKw%40mail.gmail.com#b1745ee853b137043e584b500b41300f
> >
>
> I have comments/questions on the patches:
> doc changes
> -------------------------
> 1.
>       <para>
> -      Perform vacuum index and cleanup index phases of
> <command>VACUUM</command>
> +      Perform vacuum index and index cleanup phases of
> <command>VACUUM</command>
>
> Why is the proposed text improvement over what is already there?
>

I have kept the existing text as it is for this case.

> 2.
> If the
> -      <literal>PARALLEL</literal> option is omitted, then
> -      <command>VACUUM</command> decides the number of workers based on number
> -      of indexes that support parallel vacuum operation on the relation which
> -      is further limited by <xref
> linkend="guc-max-parallel-workers-maintenance"/>.
> -      The index can participate in a parallel vacuum if and only if the size
> +      <literal>PARALLEL</literal> option is omitted, then the number of workers
> +      is determined based on the number of indexes that support parallel vacuum
> +      operation on the relation, and is further limited by <xref
> +      linkend="guc-max-parallel-workers-maintenance"/>.
> +      An index can participate in parallel vacuum if and only if the size
>        of the index is more than <xref
> linkend="guc-min-parallel-index-scan-size"/>.
>
> Here, it is not clear to me why the proposed text is better than what
> we already have?
>

Changed as per your suggestion.

> 3.
> ..
> -      vacuum launches before starting each phase and exit at the end of
> +      vacuum are launched before the start of each phase and
> terminate at the end of
>
> It is better to use 'exit' instead of 'ternimate' as we are not
> forcing the workers to end rather we wait for them to exit.
>

I have used 'exit' instead of 'terminate' here.

>
>
> Patch changes
> -------------------------
> 1.
> - * and index cleanup with parallel workers unless the parallel vacuum is
> + * and index cleanup with parallel workers unless parallel vacuum is
>
> As per my understanding 'parallel vacuum' is a noun phrase, so we
> should have a determiner before it.
>
> 2.
> - * Since writes are not allowed during the parallel mode, so we copy the
> + * Since writes are not allowed during parallel mode, copy the
>
> Similar to above, I think here also we should have a determiner before
> 'parallel mode'.
>

Changed as per your suggestion.

> 3.
> - * The basic idea of a cost-based vacuum delay for parallel vacuum is to allow
> - * each worker to sleep proportional to the work done by it.  We achieve this
> + * The basic idea of a cost-based delay for parallel vacuum is to force
> + * each worker to sleep in proportion to the share of work it's done.
> We achieve this
>
> I am not sure if it is an improvement to use 'to force' instead of 'to
> allow' because there is another criteria as well which decides whether
> the worker will sleep or not.  I am also not sure if the second change
> (share of work it's) in this sentence is a clear improvement.
>

I have used 'to allow' in above text, otherwise, acceptted your suggestions.

> 4.
> - * We allow each worker to update it as and when it has incurred any cost and
> + * We allow each worker to update it AS AND WHEN it has incurred any cost and
>
> I don't see why it is necessary to make this part bold?  We are using
> it at one other place in the code in the way it is used here.
>

Kept the existing text as it is.

> 5.
> - * We allow any worker to sleep only if it has performed the I/O above a
> + * We force a worker to sleep only if it has performed I/O above a
>   * certain threshold
>
> Hmm, again I am not sure if we should use 'force' here instead of 'allow'.
>

Kept the usage of 'allow'.

I have combined both of your patches.  Let me know if you have any
more suggestions, otherwise, I am planning to push this tomorrow.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: doc review for parallel vacuum

From
Justin Pryzby
Date:
On Tue, Apr 07, 2020 at 09:57:46AM +0530, Amit Kapila wrote:
> On Mon, Mar 23, 2020 at 10:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > On Sun, Mar 22, 2020 at 7:48 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > >
> > > Original, long thread
> > >
https://www.postgresql.org/message-id/flat/CAA4eK1%2Bnw1FBK3_sDnW%2B7kB%2Bx4qbDJqetgqwYW8k2xv82RZ%2BKw%40mail.gmail.com#b1745ee853b137043e584b500b41300f
> > >
> >
> > I have comments/questions on the patches:
> > doc changes
> > -------------------------
> > 1.
> >       <para>
> > -      Perform vacuum index and cleanup index phases of <command>VACUUM</command>
> > +      Perform vacuum index and index cleanup phases of <command>VACUUM</command>
> >
> > Why is the proposed text improvement over what is already there?
> 
> I have kept the existing text as it is for this case.

Probably it should say "index vacuum and index cleanup", which is also what the
comment in vacuumlazy.c says.

> > 2.
> > If the
> > -      <literal>PARALLEL</literal> option is omitted, then
> > -      <command>VACUUM</command> decides the number of workers based on number
> > -      of indexes that support parallel vacuum operation on the relation which
> > -      is further limited by <xref
> > linkend="guc-max-parallel-workers-maintenance"/>.
> > -      The index can participate in a parallel vacuum if and only if the size
> > +      <literal>PARALLEL</literal> option is omitted, then the number of workers
> > +      is determined based on the number of indexes that support parallel vacuum
> > +      operation on the relation, and is further limited by <xref
> > +      linkend="guc-max-parallel-workers-maintenance"/>.
> > +      An index can participate in parallel vacuum if and only if the size
> >        of the index is more than <xref
> > linkend="guc-min-parallel-index-scan-size"/>.
> >
> > Here, it is not clear to me why the proposed text is better than what
> > we already have?
> Changed as per your suggestion.

To answer your question:

"VACUUM decides" doesn't sound consistent with docs.

"based on {+the+} number"
=> here, you're veritably missing an article...

"relation which" technically means that the *relation* is "is further limited
by"...

> > Patch changes
> > -------------------------
> > 1.
> > - * and index cleanup with parallel workers unless the parallel vacuum is
> > + * and index cleanup with parallel workers unless parallel vacuum is
> >
> > As per my understanding 'parallel vacuum' is a noun phrase, so we
> > should have a determiner before it.
> 
> Changed as per your suggestion.

Thanks (I think you meant an "article").

> > - * We allow each worker to update it as and when it has incurred any cost and
> > + * We allow each worker to update it AS AND WHEN it has incurred any cost and
> >
> > I don't see why it is necessary to make this part bold?  We are using
> > it at one other place in the code in the way it is used here.
> >
> 
> Kept the existing text as it is.

I meant this as a question.  I'm not sure what "as and when means" ?  "If and
when" ?

> I have combined both of your patches.  Let me know if you have any
> more suggestions, otherwise, I am planning to push this tomorrow.

In the meantime, I found some more issues, so I rebased on top of your patch so
you can review it.

-- 
Justin

Attachment

Re: doc review for parallel vacuum

From
Amit Kapila
Date:
On Tue, Apr 7, 2020 at 10:25 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Tue, Apr 07, 2020 at 09:57:46AM +0530, Amit Kapila wrote:
> > On Mon, Mar 23, 2020 at 10:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > On Sun, Mar 22, 2020 at 7:48 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > >
> > > > Original, long thread
> > > >
https://www.postgresql.org/message-id/flat/CAA4eK1%2Bnw1FBK3_sDnW%2B7kB%2Bx4qbDJqetgqwYW8k2xv82RZ%2BKw%40mail.gmail.com#b1745ee853b137043e584b500b41300f
> > > >
> > >
> > > I have comments/questions on the patches:
> > > doc changes
> > > -------------------------
> > > 1.
> > >       <para>
> > > -      Perform vacuum index and cleanup index phases of <command>VACUUM</command>
> > > +      Perform vacuum index and index cleanup phases of <command>VACUUM</command>
> > >
> > > Why is the proposed text improvement over what is already there?
> >
> > I have kept the existing text as it is for this case.
>
> Probably it should say "index vacuum and index cleanup", which is also what the
> comment in vacuumlazy.c says.
>

Okay, that makes sense.

>
> > > - * We allow each worker to update it as and when it has incurred any cost and
> > > + * We allow each worker to update it AS AND WHEN it has incurred any cost and
> > >
> > > I don't see why it is necessary to make this part bold?  We are using
> > > it at one other place in the code in the way it is used here.
> > >
> >
> > Kept the existing text as it is.
>
> I meant this as a question.  I'm not sure what "as and when means" ?  "If and
> when" ?
>

It means the "at the time when" worker performed any I/O.  This has
been used at some other place in code as well.

> > I have combined both of your patches.  Let me know if you have any
> > more suggestions, otherwise, I am planning to push this tomorrow.
>
> In the meantime, I found some more issues, so I rebased on top of your patch so
> you can review it.
>

-     The <option>PARALLEL</option> option is used only for vacuum purpose.
-     Even if this option is specified with <option>ANALYZE</option> option
-     it does not affect <option>ANALYZE</option>.
+     The <option>PARALLEL</option> option is used only for vacuum operations.
+     If specified along with <option>ANALYZE</option>, the behavior during
+     <literal>ANALYZE</literal> is unchanged.

I think the proposed text makes the above text unclear especially "the
behavior during ANALYZE is unchanged.".  Basically this option has
nothing to do with the behavior of vacuum or analyze.  I think we
should be more specific as the current text.

 * Copy the index bulk-deletion result returned from ambulkdelete and
- * amvacuumcleanup to the DSM segment if it's the first time to get it
- * from them, because they allocate it locally and it's possible that an
- * index will be vacuumed by the different vacuum process at the next
- * time.  The copying of the result normally happens only after the first
- * time of index vacuuming.  From the second time, we pass the result on
- * the DSM segment so that they then update it directly.
+ * amvacuumcleanup to the DSM segment if it's the first time to get a result
+ * from a worker, because workers allocate BulkDeleteResults locally,
and it's possible that an
+ * index will be vacuumed by a different vacuum process the next
+ * time.

This can be done by the leader backend as well, so we can't use
workers terminology here.  Also, I don't see the need to mention
BulkDeleteResults.  I will include some changes from this text.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: doc review for parallel vacuum

From
Masahiko Sawada
Date:
On Tue, 7 Apr 2020 at 13:55, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Tue, Apr 07, 2020 at 09:57:46AM +0530, Amit Kapila wrote:
> > On Mon, Mar 23, 2020 at 10:34 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > On Sun, Mar 22, 2020 at 7:48 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > >
> > > > Original, long thread
> > > >
https://www.postgresql.org/message-id/flat/CAA4eK1%2Bnw1FBK3_sDnW%2B7kB%2Bx4qbDJqetgqwYW8k2xv82RZ%2BKw%40mail.gmail.com#b1745ee853b137043e584b500b41300f
> > > >
> > >
> > > I have comments/questions on the patches:
> > > doc changes
> > > -------------------------
> > > 1.
> > >       <para>
> > > -      Perform vacuum index and cleanup index phases of <command>VACUUM</command>
> > > +      Perform vacuum index and index cleanup phases of <command>VACUUM</command>
> > >
> > > Why is the proposed text improvement over what is already there?
> >
> > I have kept the existing text as it is for this case.
>
> Probably it should say "index vacuum and index cleanup", which is also what the
> comment in vacuumlazy.c says.
>
> > > 2.
> > > If the
> > > -      <literal>PARALLEL</literal> option is omitted, then
> > > -      <command>VACUUM</command> decides the number of workers based on number
> > > -      of indexes that support parallel vacuum operation on the relation which
> > > -      is further limited by <xref
> > > linkend="guc-max-parallel-workers-maintenance"/>.
> > > -      The index can participate in a parallel vacuum if and only if the size
> > > +      <literal>PARALLEL</literal> option is omitted, then the number of workers
> > > +      is determined based on the number of indexes that support parallel vacuum
> > > +      operation on the relation, and is further limited by <xref
> > > +      linkend="guc-max-parallel-workers-maintenance"/>.
> > > +      An index can participate in parallel vacuum if and only if the size
> > >        of the index is more than <xref
> > > linkend="guc-min-parallel-index-scan-size"/>.
> > >
> > > Here, it is not clear to me why the proposed text is better than what
> > > we already have?
> > Changed as per your suggestion.
>
> To answer your question:
>
> "VACUUM decides" doesn't sound consistent with docs.
>
> "based on {+the+} number"
> => here, you're veritably missing an article...
>
> "relation which" technically means that the *relation* is "is further limited
> by"...
>
> > > Patch changes
> > > -------------------------
> > > 1.
> > > - * and index cleanup with parallel workers unless the parallel vacuum is
> > > + * and index cleanup with parallel workers unless parallel vacuum is
> > >
> > > As per my understanding 'parallel vacuum' is a noun phrase, so we
> > > should have a determiner before it.
> >
> > Changed as per your suggestion.
>
> Thanks (I think you meant an "article").
>
> > > - * We allow each worker to update it as and when it has incurred any cost and
> > > + * We allow each worker to update it AS AND WHEN it has incurred any cost and
> > >
> > > I don't see why it is necessary to make this part bold?  We are using
> > > it at one other place in the code in the way it is used here.
> > >
> >
> > Kept the existing text as it is.
>
> I meant this as a question.  I'm not sure what "as and when means" ?  "If and
> when" ?
>
> > I have combined both of your patches.  Let me know if you have any
> > more suggestions, otherwise, I am planning to push this tomorrow.
>
> In the meantime, I found some more issues, so I rebased on top of your patch so
> you can review it.
>

I don't have comments on your change other than the comments Amit
already sent. Thank you for reviewing this part!

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: doc review for parallel vacuum

From
Amit Kapila
Date:
On Wed, Apr 8, 2020 at 12:49 PM Masahiko Sawada
<masahiko.sawada@2ndquadrant.com> wrote:
>
> On Tue, 7 Apr 2020 at 13:55, Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
>
> I don't have comments on your change other than the comments Amit
> already sent. Thank you for reviewing this part!
>

I have made the modifications as per my comments.  What do you think
about the attached?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: doc review for parallel vacuum

From
Masahiko Sawada
Date:
On Fri, 10 Apr 2020 at 16:26, Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Apr 8, 2020 at 12:49 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 7 Apr 2020 at 13:55, Justin Pryzby <pryzby@telsasoft.com> wrote:
> > >
> >
> > I don't have comments on your change other than the comments Amit
> > already sent. Thank you for reviewing this part!
> >
>
> I have made the modifications as per my comments.  What do you think
> about the attached?

Thank you for updating the patch! Looks good to me.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: doc review for parallel vacuum

From
Justin Pryzby
Date:
On Fri, Apr 10, 2020 at 12:56:08PM +0530, Amit Kapila wrote:
> On Wed, Apr 8, 2020 at 12:49 PM Masahiko Sawada
> <masahiko.sawada@2ndquadrant.com> wrote:
> >
> > On Tue, 7 Apr 2020 at 13:55, Justin Pryzby <pryzby@telsasoft.com> wrote:
> > >
> >
> > I don't have comments on your change other than the comments Amit
> > already sent. Thank you for reviewing this part!
> >
> 
> I have made the modifications as per my comments.  What do you think
> about the attached?

Couple more changes (in bold):

-     The <option>PARALLEL</option> option is used only for vacuum PURPOSES.
-     Even if this option is specified with THE <option>ANALYZE</option> option

Also, this part still doesn't read well:

-        * amvacuumcleanup to the DSM segment if it's the first time to get it?
-        * from them? because they? allocate it locally and it's possible that an
-        * index will be vacuumed by the different vacuum process at the next

If you change "it" and "them" and "it" and say "*a* different", then it'll be
ok.

-- 
Justin



Re: doc review for parallel vacuum

From
Amit Kapila
Date:
On Fri, Apr 10, 2020 at 7:16 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> Also, this part still doesn't read well:
>
> -        * amvacuumcleanup to the DSM segment if it's the first time to get it?
> -        * from them? because they? allocate it locally and it's possible that an
> -        * index will be vacuumed by the different vacuum process at the next
>
> If you change "it" and "them" and "it" and say "*a* different", then it'll be
> ok.
>

I am not sure if I follow how exactly you want to change it but still
let me know what you think about if we change it like: "Copy the index
bulk-deletion result returned from ambulkdelete and amvacuumcleanup to
the DSM segment if it's the first time because they allocate locally
and it's possible that an index will be vacuumed by the different
vacuum process at the next time."

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: doc review for parallel vacuum

From
Justin Pryzby
Date:
On Mon, Apr 13, 2020 at 10:44:42AM +0530, Amit Kapila wrote:
> On Fri, Apr 10, 2020 at 7:16 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > Also, this part still doesn't read well:
> >
> > -        * amvacuumcleanup to the DSM segment if it's the first time to get it?
> > -        * from them? because they? allocate it locally and it's possible that an
> > -        * index will be vacuumed by the different vacuum process at the next
> >
> > If you change "it" and "them" and "it" and say "*a* different", then it'll be
> > ok.
> >
> 
> I am not sure if I follow how exactly you want to change it but still
> let me know what you think about if we change it like: "Copy the index
> bulk-deletion result returned from ambulkdelete and amvacuumcleanup to
> the DSM segment if it's the first time because they allocate locally
> and it's possible that an index will be vacuumed by the different
> vacuum process at the next time."

I changed "the" to "a" and removed "at":

|Copy the index
|bulk-deletion result returned from ambulkdelete and amvacuumcleanup to
|the DSM segment if it's the first time [???] because they allocate locally
|and it's possible that an index will be vacuumed by a different
|vacuum process the next time."

Is it correct to say: "..if it's the first iteration" and "different process on
the next iteration" ?  Or "cycle" ?

-- 
Justin



Re: doc review for parallel vacuum

From
Amit Kapila
Date:
On Mon, Apr 13, 2020 at 2:00 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> |Copy the index
> |bulk-deletion result returned from ambulkdelete and amvacuumcleanup to
> |the DSM segment if it's the first time [???] because they allocate locally
> |and it's possible that an index will be vacuumed by a different
> |vacuum process the next time."
>
> Is it correct to say: "..if it's the first iteration" and "different process on
> the next iteration" ?  Or "cycle" ?
>

"cycle" sounds better.  I have changed the patch as per your latest
comments.  Let me know what you think?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: doc review for parallel vacuum

From
Justin Pryzby
Date:
On Mon, Apr 13, 2020 at 03:22:06PM +0530, Amit Kapila wrote:
> On Mon, Apr 13, 2020 at 2:00 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > |Copy the index
> > |bulk-deletion result returned from ambulkdelete and amvacuumcleanup to
> > |the DSM segment if it's the first time [???] because they allocate locally
> > |and it's possible that an index will be vacuumed by a different
> > |vacuum process the next time."
> >
> > Is it correct to say: "..if it's the first iteration" and "different process on
> > the next iteration" ?  Or "cycle" ?
> >
> 
> "cycle" sounds better.  I have changed the patch as per your latest
> comments.  Let me know what you think?

Looks good.  One more change:

[-Even if-]{+If+} this option is specified with the <option>ANALYZE</option> [-option-]{+option,+}

Remove "even" and add comma.

Thanks,
-- 
Justin



Re: doc review for parallel vacuum

From
Amit Kapila
Date:
On Tue, Apr 14, 2020 at 2:54 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> Looks good.  One more change:
>
> [-Even if-]{+If+} this option is specified with the <option>ANALYZE</option> [-option-]{+option,+}
>
> Remove "even" and add comma.
>

Pushed after making this change.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com