Home > mailing lists
Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: [HACKERS] Block level parallel vacuum
Date	February 27, 2019 20:44:23
Msg-id	CA+TgmoZbvyOq2S=bZKdM-SdpwUbJnW-LKo00uXViusn8SjbZ7A@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Block level parallel vacuum (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses	Re: [HACKERS] Block level parallel vacuum
List	pgsql-hackers
Tree view
On Thu, Feb 14, 2019 at 5:17 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:
> Thank you. Attached the rebased patch.

Here are some review comments.

+         started by a single utility command.  Currently, the parallel
+         utility commands that support the use of parallel workers are
+         <command>CREATE INDEX</command> and <command>VACUUM</command>
+         without <literal>FULL</literal> option, and only when building
+         a B-tree index.  Parallel workers are taken from the pool of

That sentence is garbled.  The end part about b-tree indexes applies
only to CREATE INDEX, not to VACUUM, since VACUUM does build indexes.

+      Vacuum index and cleanup index in parallel
+      <replaceable class="parameter">N</replaceable> background
workers (for the detail
+      of each vacuum phases, please refer to <xref
linkend="vacuum-phases"/>. If the

I have two problems with this.  One is that I can't understand the
English very well. I think you mean something like: "Perform the
'vacuum index' and 'cleanup index' phases of VACUUM in parallel using
N background workers," but I'm not entirely sure.  The other is that
if that is what you mean, I don't think it's a sufficient description.
Users need to understand whether, for example, only one worker can be
used per index, or whether the work for a single index can be split
across workers.

+      parallel degree <replaceable class="parameter">N</replaceable>
is omitted,
+      then <command>VACUUM</command> decides the number of workers based on
+      number of indexes on the relation which further limited by
+      <xref linkend="guc-max-parallel-workers-maintenance"/>. Also if
this option

Now this makes it sound like it's one worker per index, but you could
be more explicit about it.

+      is specified multile times, the last parallel degree
+      <replaceable class="parameter">N</replaceable> is considered
into the account.

Typo, but I'd just delete this sentence altogether; the behavior if
the option is multiply specified seems like a triviality that need not
be documented.

+    Setting a value for <literal>parallel_workers</literal> via
+    <xref linkend="sql-altertable"/> also controls how many parallel
+    worker processes will be requested by a <command>VACUUM</command>
+    against the table. This setting is overwritten by setting
+    <replaceable class="parameter">N</replaceable> of
<literal>PARALLEL</literal>
+    option.

I wonder if we really want this behavior.  Should a setting that
controls the degree of parallelism when scanning the table also affect
VACUUM?  I tend to think that we probably don't ever want VACUUM of a
table to be parallel by default, but rather something that the user
must explicitly request.  Happy to hear other opinions.  If we do want
this behavior, I think this should be written differently, something
like this: The PARALLEL N option to VACUUM takes precedence over this
option.

+ * parallel mode nor destories the parallel context. For updating the index

Spelling.

+/* DSM keys for parallel lazy vacuum */
+#define PARALLEL_VACUUM_KEY_SHARED UINT64CONST(0xFFFFFFFFFFF00001)
+#define PARALLEL_VACUUM_KEY_DEAD_TUPLES UINT64CONST(0xFFFFFFFFFFF00002)
+#define PARALLEL_VACUUM_KEY_QUERY_TEXT UINT64CONST(0xFFFFFFFFFFF00003)

Any special reason not to use just 1, 2, 3 here?  The general
infrastructure stuff uses high numbers to avoid conflicting with
plan_node_id values, but end clients of the parallel infrastructure
can generally just use small integers.

+ bool updated; /* is the stats updated? */

is -> are

+ * LVDeadTuples controls the dead tuple TIDs collected during heap scan.

what do you mean by "controls", exactly? stores?

+ * This is allocated in a dynamic shared memory segment when parallel
+ * lazy vacuum mode, or allocated in a local memory.

If this is in DSM, then max_tuples is a wart, I think.  We can't grow
the segment at that point.  I'm suspicious that we need a better
design here.  It looks like you gather all of the dead tuples in
backend-local memory and then allocate an equal amount of DSM to copy
them.  But that means that we are using twice as much memory, which
seems pretty bad.  You'd have to do that at least momentarily no
matter what, but it's not obvious that the backend-local copy is ever
freed.  There's another patch kicking around to allocate memory for
vacuum in chunks rather than preallocating the whole slab of memory at
once; we might want to think about getting that committed first and
then having this build on top of it.  At least we need something
smarter than this.

-heap_vacuum_rel(Relation onerel, int options, VacuumParams *params,
+heap_vacuum_rel(Relation onerel, VacuumOptions options, VacuumParams *params,

We generally avoid passing a struct by value; copying the struct can
be expensive and having multiple shallow copies of the same data
sometimes leads to surprising results.  I think it might be a good
idea to propose a preliminary refactoring patch that invents
VacuumOptions and gives it just a single 'int' member and refactors
everything to use it, and then that can be committed first.  It should
pass a pointer, though, not the actual struct.

+ LVState    *lvstate;

It's not clear to me why we need this new LVState thing.  What's the
motivation for that?  If it's a good idea, could it be done as a
separate, preparatory patch?  It seems to be responsible for a lot of
code churn in this patch.   It also leads to strange stuff like this:

  ereport(elevel,
- (errmsg("scanned index \"%s\" to remove %d row versions",
+ (errmsg("scanned index \"%s\" to remove %d row versions %s",
  RelationGetRelationName(indrel),
- vacrelstats->num_dead_tuples),
+ dead_tuples->num_tuples,
+ IsParallelWorker() ? "by parallel vacuum worker" : ""),

This doesn't seem to be great grammar, and translation guidelines
generally discourage this sort of incremental message construction
quite strongly.  Since the user can probably infer what happened by a
suitable choice of log_line_prefix, I'm not totally sure this is worth
doing in the first place, but if we're going to do it, it should
probably have two completely separate message strings and pick between
them using IsParallelWorker(), rather than building it up
incrementally like this.

+compute_parallel_workers(Relation rel, int nrequests, int nindexes)

I think 'nrequets' is meant to be 'nrequested'.  It isn't the number
of requests; it's the number of workers that were requested.

+ /* quick exit if no workers are prepared, e.g. under serializable isolation */

That comment makes very little sense in this context.

+ /* Report parallel vacuum worker information */
+ initStringInfo(&buf);
+ appendStringInfo(&buf,
+ ngettext("launched %d parallel vacuum worker %s (planned: %d",
+   "launched %d parallel vacuum workers %s (planned: %d",
+   lvstate->pcxt->nworkers_launched),
+ lvstate->pcxt->nworkers_launched,
+ for_cleanup ? "for index cleanup" : "for index vacuum",
+ lvstate->pcxt->nworkers);
+ if (lvstate->options.nworkers > 0)
+ appendStringInfo(&buf, ", requested %d", lvstate->options.nworkers);
+
+ appendStringInfo(&buf, ")");
+ ereport(elevel, (errmsg("%s", buf.data)));

This is another example of incremental message construction, again
violating translation guidelines.

+ WaitForParallelWorkersToAttach(lvstate->pcxt);

Why?

+ /*
+ * If there is already-updated result in the shared memory we use it.
+ * Otherwise we pass NULL to index AMs, meaning it's first time call,
+ * and copy the result to the shared memory segment.
+ */

I'm probably missing something here, but isn't the intention that we
only do each index once?  If so, how would there be anything there
already?  Once from for_cleanup = false and once for for_cleanup =
true?

+ if (a->options.flags != b->options.flags)
+ return false;
+ if (a->options.nworkers != b->options.nworkers)
+ return false;

You could just do COMPARE_SCALAR_FIELD(options.flags);
COMPARE_SCALAR_FIELD(options.nworkers);

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
pgsql-hackers by date:
From: Tomas Vondra
Date: 27 February 2019, 20:41:01
Subject: Re: Index INCLUDE vs. Bitmap Index Scan
From: "Bossart, Nathan"
Date: 27 February 2019, 20:45:47
Subject: Re: New vacuum option to do only freezing
Re: [HACKERS] Block level parallel vacuum - Mailing list pgsql-hackers

Previous

Next