Re: Duplicate Workers entries in some EXPLAIN plans - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Duplicate Workers entries in some EXPLAIN plans
Date
Msg-id 15300.1579918495@sss.pgh.pa.us
Whole thread Raw
In response to Re: Duplicate Workers entries in some EXPLAIN plans  (Maciek Sakrejda <m.sakrejda@gmail.com>)
Responses Re: Duplicate Workers entries in some EXPLAIN plans  (Maciek Sakrejda <m.sakrejda@gmail.com>)
List pgsql-hackers
Maciek Sakrejda <m.sakrejda@gmail.com> writes:
> Okay. Does not getting as many workers as it asks for include
> sometimes getting zero, completely changing the actual output?

Yup :-(.  We could possibly avoid that by running the explain
test by itself rather than in a parallel group, but I don't
especially want to add yet more non-parallelizable tests.

Meanwhile, I spent some time looking at the code, and wasn't very happy
with it.  I'm on board with the general plan of redirecting EXPLAIN
output into per-worker buffers that we eventually recombine, but I think
that this particular coding is pretty unmaintainable/unextensible.
In particular, I'm really unhappy that the code is ignoring EXPLAIN's
grouping_state stack.  That means that when it's formatting fields that
belong to the per-worker group, it's using the state-stack entry that
corresponds to the plan node's main level.  This seems to accidentally
work, but that fact depends on a number of shaky assumptions:

* Before messing with any per-worker data, we've always emitted at
least one field in the plan node's main level, so that the state-stack
entry isn't at its initial state for the level.

* Before actually emitting the shunted-aside data, we've always emitted
a "Worker Number" field in correct format within the per-worker group,
so that the formatting state is now correct for a non-initial field.

* There is no formatting difference between any non-first fields in
a level (ie the state stack entries are basically just booleans),
so that it doesn't matter how many plan-node fields we emitted before
starting the per-worker data, so long as there was at least one, nor
does transiently abusing the plan node level's stack entry like this
break the formatting of subsequent plan-node-level fields.

Now maybe we'll never implement an output format that breaks that
last assumption, and maybe we'll never rearrange the EXPLAIN code
in a way that breaks either of the first two.  But I don't like those
assumptions too much.  I also didn't like the code's assumption that
all the non-text formats interpret es->indent the same.

I also noted an actual bug, which is that the patch fails regression
testing under force_parallel_mode = regress.  This isn't really your
fault, because the issue is in this obscure and poorly-commented hack
in show_sort_info:

     * You might think we should just skip this stanza entirely when
     * es->hide_workers is true, but then we'd get no sort-method output at
     * all.  We have to make it look like worker 0's data is top-level data.
     * Currently, we only bother with that for text-format output.

Nonetheless, it's broken.

So I spent some time hacking on this and came up with the attached.
It's noticeably more verbose than your patch, but it keeps the
output-format-aware code at what seems to me to be a maintainable
arm's-length distance from the parallel-worker hacking.  TEXT is
still a special case of course :-(.

This patch just covers the code, I'm not taking any position yet
about what to do about the tests.  I did tweak the code to eliminate
the one formatting difference in select_parallel (ie put two spaces
after "Worker N:", which I think reads better anyhow), so it
passes check-world as it stands.

Thoughts?

            regards, tom lane

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d189b8d..9e965d2 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -123,11 +123,20 @@ static void ExplainSubPlans(List *plans, List *ancestors,
                             const char *relationship, ExplainState *es);
 static void ExplainCustomChildren(CustomScanState *css,
                                   List *ancestors, ExplainState *es);
+static ExplainWorkersState *ExplainCreateWorkersState(int num_workers);
+static void ExplainOpenWorker(int n, ExplainState *es);
+static void ExplainCloseWorker(int n, ExplainState *es);
+static void ExplainFlushWorkersState(ExplainState *es);
 static void ExplainProperty(const char *qlabel, const char *unit,
                             const char *value, bool numeric, ExplainState *es);
+static void ExplainOpenSetAsideGroup(const char *objtype, const char *labelname,
+                                     bool labeled, int depth, ExplainState *es);
+static void ExplainSaveGroup(ExplainState *es, int depth, int *state_save);
+static void ExplainRestoreGroup(ExplainState *es, int depth, int *state_save);
 static void ExplainDummyGroup(const char *objtype, const char *labelname,
                               ExplainState *es);
 static void ExplainXMLTag(const char *tagname, int flags, ExplainState *es);
+static void ExplainIndentText(ExplainState *es);
 static void ExplainJSONLineEnding(ExplainState *es);
 static void ExplainYAMLLineStarting(ExplainState *es);
 static void escape_yaml(StringInfo buf, const char *str);
@@ -827,7 +836,7 @@ ExplainPrintJIT(ExplainState *es, int jit_flags,
     /* for higher density, open code the text output format */
     if (es->format == EXPLAIN_FORMAT_TEXT)
     {
-        appendStringInfoSpaces(es->str, es->indent * 2);
+        ExplainIndentText(es);
         if (for_workers)
             appendStringInfo(es->str, "JIT for worker %u:\n", worker_num);
         else
@@ -836,7 +845,7 @@ ExplainPrintJIT(ExplainState *es, int jit_flags,

         ExplainPropertyInteger("Functions", NULL, ji->created_functions, es);

-        appendStringInfoSpaces(es->str, es->indent * 2);
+        ExplainIndentText(es);
         appendStringInfo(es->str, "Options: %s %s, %s %s, %s %s, %s %s\n",
                          "Inlining", jit_flags & PGJIT_INLINE ? "true" : "false",
                          "Optimization", jit_flags & PGJIT_OPT3 ? "true" : "false",
@@ -845,7 +854,7 @@ ExplainPrintJIT(ExplainState *es, int jit_flags,

         if (es->analyze && es->timing)
         {
-            appendStringInfoSpaces(es->str, es->indent * 2);
+            ExplainIndentText(es);
             appendStringInfo(es->str,
                              "Timing: %s %.3f ms, %s %.3f ms, %s %.3f ms, %s %.3f ms, %s %.3f ms\n",
                              "Generation", 1000.0 * INSTR_TIME_GET_DOUBLE(ji->generation_counter),
@@ -1074,9 +1083,10 @@ ExplainPreScanNode(PlanState *planstate, Bitmapset **rels_used)
  * optional name to be attached to the node.
  *
  * In text format, es->indent is controlled in this function since we only
- * want it to change at plan-node boundaries.  In non-text formats, es->indent
- * corresponds to the nesting depth of logical output groups, and therefore
- * is controlled by ExplainOpenGroup/ExplainCloseGroup.
+ * want it to change at plan-node boundaries (but a few subroutines will
+ * transiently increment it).  In non-text formats, es->indent corresponds
+ * to the nesting depth of logical output groups, and therefore is controlled
+ * by ExplainOpenGroup/ExplainCloseGroup.
  */
 static void
 ExplainNode(PlanState *planstate, List *ancestors,
@@ -1090,9 +1100,20 @@ ExplainNode(PlanState *planstate, List *ancestors,
     const char *partialmode = NULL;
     const char *operation = NULL;
     const char *custom_name = NULL;
+    ExplainWorkersState *save_workers_state = es->workers_state;
     int            save_indent = es->indent;
     bool        haschildren;

+    /*
+     * Prepare per-worker output buffers, if needed.  We'll append the data in
+     * these to the main output string further down.
+     */
+    if (planstate->worker_instrument && es->analyze && !es->hide_workers)
+        es->workers_state = ExplainCreateWorkersState(planstate->worker_instrument->num_workers);
+    else
+        es->workers_state = NULL;
+
+    /* Identify plan node type, and print generic details */
     switch (nodeTag(plan))
     {
         case T_Result:
@@ -1324,13 +1345,13 @@ ExplainNode(PlanState *planstate, List *ancestors,
     {
         if (plan_name)
         {
-            appendStringInfoSpaces(es->str, es->indent * 2);
+            ExplainIndentText(es);
             appendStringInfo(es->str, "%s\n", plan_name);
             es->indent++;
         }
         if (es->indent)
         {
-            appendStringInfoSpaces(es->str, es->indent * 2);
+            ExplainIndentText(es);
             appendStringInfoString(es->str, "->  ");
             es->indent += 2;
         }
@@ -1574,6 +1595,56 @@ ExplainNode(PlanState *planstate, List *ancestors,
     if (es->format == EXPLAIN_FORMAT_TEXT)
         appendStringInfoChar(es->str, '\n');

+    /* prepare per-worker general execution details */
+    if (es->workers_state && es->verbose)
+    {
+        WorkerInstrumentation *w = planstate->worker_instrument;
+
+        for (int n = 0; n < w->num_workers; n++)
+        {
+            Instrumentation *instrument = &w->instrument[n];
+            double        nloops = instrument->nloops;
+            double        startup_ms;
+            double        total_ms;
+            double        rows;
+
+            if (nloops <= 0)
+                continue;
+            startup_ms = 1000.0 * instrument->startup / nloops;
+            total_ms = 1000.0 * instrument->total / nloops;
+            rows = instrument->ntuples / nloops;
+
+            ExplainOpenWorker(n, es);
+
+            if (es->format == EXPLAIN_FORMAT_TEXT)
+            {
+                ExplainIndentText(es);
+                if (es->timing)
+                    appendStringInfo(es->str,
+                                     "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
+                                     startup_ms, total_ms, rows, nloops);
+                else
+                    appendStringInfo(es->str,
+                                     "actual rows=%.0f loops=%.0f\n",
+                                     rows, nloops);
+            }
+            else
+            {
+                if (es->timing)
+                {
+                    ExplainPropertyFloat("Actual Startup Time", "ms",
+                                         startup_ms, 3, es);
+                    ExplainPropertyFloat("Actual Total Time", "ms",
+                                         total_ms, 3, es);
+                }
+                ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+                ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+            }
+
+            ExplainCloseWorker(n, es);
+        }
+    }
+
     /* target list */
     if (es->verbose)
         show_plan_tlist(planstate, ancestors, es);
@@ -1885,76 +1956,30 @@ ExplainNode(PlanState *planstate, List *ancestors,
     if (es->buffers && planstate->instrument)
         show_buffer_usage(es, &planstate->instrument->bufusage);

-    /* Show worker detail */
-    if (es->analyze && es->verbose && !es->hide_workers &&
-        planstate->worker_instrument)
+    /* Prepare per-worker buffer usage */
+    if (es->workers_state && es->buffers && es->verbose)
     {
         WorkerInstrumentation *w = planstate->worker_instrument;
-        bool        opened_group = false;
-        int            n;

-        for (n = 0; n < w->num_workers; ++n)
+        for (int n = 0; n < w->num_workers; n++)
         {
             Instrumentation *instrument = &w->instrument[n];
             double        nloops = instrument->nloops;
-            double        startup_ms;
-            double        total_ms;
-            double        rows;

             if (nloops <= 0)
                 continue;
-            startup_ms = 1000.0 * instrument->startup / nloops;
-            total_ms = 1000.0 * instrument->total / nloops;
-            rows = instrument->ntuples / nloops;

-            if (es->format == EXPLAIN_FORMAT_TEXT)
-            {
-                appendStringInfoSpaces(es->str, es->indent * 2);
-                appendStringInfo(es->str, "Worker %d: ", n);
-                if (es->timing)
-                    appendStringInfo(es->str,
-                                     "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
-                                     startup_ms, total_ms, rows, nloops);
-                else
-                    appendStringInfo(es->str,
-                                     "actual rows=%.0f loops=%.0f\n",
-                                     rows, nloops);
-                es->indent++;
-                if (es->buffers)
-                    show_buffer_usage(es, &instrument->bufusage);
-                es->indent--;
-            }
-            else
-            {
-                if (!opened_group)
-                {
-                    ExplainOpenGroup("Workers", "Workers", false, es);
-                    opened_group = true;
-                }
-                ExplainOpenGroup("Worker", NULL, true, es);
-                ExplainPropertyInteger("Worker Number", NULL, n, es);
-
-                if (es->timing)
-                {
-                    ExplainPropertyFloat("Actual Startup Time", "ms",
-                                         startup_ms, 3, es);
-                    ExplainPropertyFloat("Actual Total Time", "ms",
-                                         total_ms, 3, es);
-                }
-                ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
-                ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
-
-                if (es->buffers)
-                    show_buffer_usage(es, &instrument->bufusage);
-
-                ExplainCloseGroup("Worker", NULL, true, es);
-            }
+            ExplainOpenWorker(n, es);
+            show_buffer_usage(es, &instrument->bufusage);
+            ExplainCloseWorker(n, es);
         }
-
-        if (opened_group)
-            ExplainCloseGroup("Workers", "Workers", false, es);
     }

+    /* Show per-worker details, then pop that stack */
+    if (es->workers_state)
+        ExplainFlushWorkersState(es);
+    es->workers_state = save_workers_state;
+
     /* Get ready to display the child plans */
     haschildren = planstate->initPlan ||
         outerPlanState(planstate) ||
@@ -2525,7 +2550,7 @@ show_tablesample(TableSampleClause *tsc, PlanState *planstate,
     {
         bool        first = true;

-        appendStringInfoSpaces(es->str, es->indent * 2);
+        ExplainIndentText(es);
         appendStringInfo(es->str, "Sampling: %s (", method_name);
         foreach(lc, params)
         {
@@ -2572,7 +2597,7 @@ show_sort_info(SortState *sortstate, ExplainState *es)

         if (es->format == EXPLAIN_FORMAT_TEXT)
         {
-            appendStringInfoSpaces(es->str, es->indent * 2);
+            ExplainIndentText(es);
             appendStringInfo(es->str, "Sort Method: %s  %s: %ldkB\n",
                              sortMethod, spaceType, spaceUsed);
         }
@@ -2588,12 +2613,14 @@ show_sort_info(SortState *sortstate, ExplainState *es)
      * You might think we should just skip this stanza entirely when
      * es->hide_workers is true, but then we'd get no sort-method output at
      * all.  We have to make it look like worker 0's data is top-level data.
-     * Currently, we only bother with that for text-format output.
+     * This is easily done by just skipping the OpenWorker/CloseWorker calls.
+     * Currently, we don't worry about the possibility that there are multiple
+     * workers in such a case; if there are, duplicate output fields will be
+     * emitted.
      */
     if (sortstate->shared_info != NULL)
     {
         int            n;
-        bool        opened_group = false;

         for (n = 0; n < sortstate->shared_info->num_workers; n++)
         {
@@ -2609,32 +2636,26 @@ show_sort_info(SortState *sortstate, ExplainState *es)
             spaceType = tuplesort_space_type_name(sinstrument->spaceType);
             spaceUsed = sinstrument->spaceUsed;

+            if (es->workers_state)
+                ExplainOpenWorker(n, es);
+
             if (es->format == EXPLAIN_FORMAT_TEXT)
             {
-                appendStringInfoSpaces(es->str, es->indent * 2);
-                if (n > 0 || !es->hide_workers)
-                    appendStringInfo(es->str, "Worker %d:  ", n);
+                ExplainIndentText(es);
                 appendStringInfo(es->str,
                                  "Sort Method: %s  %s: %ldkB\n",
                                  sortMethod, spaceType, spaceUsed);
             }
             else
             {
-                if (!opened_group)
-                {
-                    ExplainOpenGroup("Workers", "Workers", false, es);
-                    opened_group = true;
-                }
-                ExplainOpenGroup("Worker", NULL, true, es);
-                ExplainPropertyInteger("Worker Number", NULL, n, es);
                 ExplainPropertyText("Sort Method", sortMethod, es);
                 ExplainPropertyInteger("Sort Space Used", "kB", spaceUsed, es);
                 ExplainPropertyText("Sort Space Type", spaceType, es);
-                ExplainCloseGroup("Worker", NULL, true, es);
             }
+
+            if (es->workers_state)
+                ExplainCloseWorker(n, es);
         }
-        if (opened_group)
-            ExplainCloseGroup("Workers", "Workers", false, es);
     }
 }

@@ -2721,7 +2742,7 @@ show_hash_info(HashState *hashstate, ExplainState *es)
         else if (hinstrument.nbatch_original != hinstrument.nbatch ||
                  hinstrument.nbuckets_original != hinstrument.nbuckets)
         {
-            appendStringInfoSpaces(es->str, es->indent * 2);
+            ExplainIndentText(es);
             appendStringInfo(es->str,
                              "Buckets: %d (originally %d)  Batches: %d (originally %d)  Memory Usage: %ldkB\n",
                              hinstrument.nbuckets,
@@ -2732,7 +2753,7 @@ show_hash_info(HashState *hashstate, ExplainState *es)
         }
         else
         {
-            appendStringInfoSpaces(es->str, es->indent * 2);
+            ExplainIndentText(es);
             appendStringInfo(es->str,
                              "Buckets: %d  Batches: %d  Memory Usage: %ldkB\n",
                              hinstrument.nbuckets, hinstrument.nbatch,
@@ -2758,7 +2779,7 @@ show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
     {
         if (planstate->exact_pages > 0 || planstate->lossy_pages > 0)
         {
-            appendStringInfoSpaces(es->str, es->indent * 2);
+            ExplainIndentText(es);
             appendStringInfoString(es->str, "Heap Blocks:");
             if (planstate->exact_pages > 0)
                 appendStringInfo(es->str, " exact=%ld", planstate->exact_pages);
@@ -2894,7 +2915,7 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
         /* Show only positive counter values. */
         if (has_shared || has_local || has_temp)
         {
-            appendStringInfoSpaces(es->str, es->indent * 2);
+            ExplainIndentText(es);
             appendStringInfoString(es->str, "Buffers:");

             if (has_shared)
@@ -2949,7 +2970,7 @@ show_buffer_usage(ExplainState *es, const BufferUsage *usage)
         /* As above, show only positive counter values. */
         if (has_timing)
         {
-            appendStringInfoSpaces(es->str, es->indent * 2);
+            ExplainIndentText(es);
             appendStringInfoString(es->str, "I/O Timings:");
             if (!INSTR_TIME_IS_ZERO(usage->blk_read_time))
                 appendStringInfo(es->str, " read=%0.3f",
@@ -3237,7 +3258,7 @@ show_modifytable_info(ModifyTableState *mtstate, List *ancestors,
              */
             if (es->format == EXPLAIN_FORMAT_TEXT)
             {
-                appendStringInfoSpaces(es->str, es->indent * 2);
+                ExplainIndentText(es);
                 appendStringInfoString(es->str,
                                        fdwroutine ? foperation : operation);
             }
@@ -3427,6 +3448,150 @@ ExplainCustomChildren(CustomScanState *css, List *ancestors, ExplainState *es)
 }

 /*
+ * Create a per-plan-node workspace for collecting per-worker data.
+ */
+static ExplainWorkersState *
+ExplainCreateWorkersState(int num_workers)
+{
+    ExplainWorkersState *wstate;
+
+    wstate = (ExplainWorkersState *) palloc(sizeof(ExplainWorkersState));
+    wstate->num_workers = num_workers;
+    wstate->worker_inited = (bool *) palloc0(num_workers * sizeof(bool));
+    wstate->worker_str = (StringInfoData *)
+        palloc0(num_workers * sizeof(StringInfoData));
+    wstate->worker_state_save = (int *) palloc(num_workers * sizeof(int));
+    return wstate;
+}
+
+/*
+ * Begin or resume output into the set-aside group for worker N.
+ */
+static void
+ExplainOpenWorker(int n, ExplainState *es)
+{
+    ExplainWorkersState *wstate = es->workers_state;
+
+    Assert(wstate);
+    Assert(n >= 0 && n < wstate->num_workers);
+
+    /* Save prior output pointer */
+    wstate->prev_str = es->str;
+
+    if (!wstate->worker_inited[n])
+    {
+        /* First time through, so create the buffer for this worker */
+        initStringInfo(&wstate->worker_str[n]);
+        es->str = &wstate->worker_str[n];
+
+        /*
+         * Push suitable initial formatting state for this worker's field
+         * group.  We allow one extra logical nesting level, since this group
+         * will eventually be wrapped in an outer "Workers" group.
+         */
+        ExplainOpenSetAsideGroup("Worker", NULL, true, 2, es);
+
+        /*
+         * In non-TEXT formats, we always emit a "Worker Number" field even if
+         * there's no other data for this worker.
+         */
+        if (es->format != EXPLAIN_FORMAT_TEXT)
+            ExplainPropertyInteger("Worker Number", NULL, n, es);
+
+        wstate->worker_inited[n] = true;
+    }
+    else
+    {
+        /* Resuming output for a worker we've already emitted some data for */
+        es->str = &wstate->worker_str[n];
+
+        /* Restore formatting state saved by last ExplainCloseWorker() */
+        ExplainRestoreGroup(es, 2, &wstate->worker_state_save[n]);
+    }
+
+    /*
+     * In TEXT format, prefix the first output line for this worker with
+     * "Worker N:".  Then, any additional lines should be indented one more
+     * stop than the "Worker N" line is.
+     */
+    if (es->format == EXPLAIN_FORMAT_TEXT)
+    {
+        if (es->str->len == 0)
+        {
+            ExplainIndentText(es);
+            appendStringInfo(es->str, "Worker %d:  ", n);
+        }
+
+        es->indent++;
+    }
+}
+
+/*
+ * End output for worker N --- must pair with previous ExplainOpenWorker call
+ */
+static void
+ExplainCloseWorker(int n, ExplainState *es)
+{
+    ExplainWorkersState *wstate = es->workers_state;
+
+    Assert(wstate);
+    Assert(n >= 0 && n < wstate->num_workers);
+    Assert(wstate->worker_inited[n]);
+
+    /*
+     * Save formatting state in case we do another ExplainOpenWorker(), then
+     * pop the formatting stack.
+     */
+    ExplainSaveGroup(es, 2, &wstate->worker_state_save[n]);
+
+    /*
+     * In TEXT format, if we didn't actually produce any output line(s) then
+     * truncate off the partial line (this avoids bogus output if, for
+     * instance, show_buffer_usage chooses not to print anything).  Also fix
+     * up the indent level.
+     */
+    if (es->format == EXPLAIN_FORMAT_TEXT)
+    {
+        while (es->str->len > 0 && es->str->data[es->str->len - 1] != '\n')
+            es->str->data[--(es->str->len)] = '\0';
+
+        es->indent--;
+    }
+
+    /* Restore prior output pointer */
+    es->str = wstate->prev_str;
+}
+
+/*
+ * Print per-worker info for current node, then free the ExplainWorkersState.
+ */
+static void
+ExplainFlushWorkersState(ExplainState *es)
+{
+    ExplainWorkersState *wstate = es->workers_state;
+
+    ExplainOpenGroup("Workers", "Workers", false, es);
+    for (int i = 0; i < wstate->num_workers; i++)
+    {
+        if (wstate->worker_inited[i])
+        {
+            /* This must match previous ExplainOpenSetAsideGroup call */
+            ExplainOpenGroup("Worker", NULL, true, es);
+            appendStringInfoString(es->str, wstate->worker_str[i].data);
+            ExplainCloseGroup("Worker", NULL, true, es);
+
+            pfree(wstate->worker_str[i].data);
+        }
+    }
+    ExplainCloseGroup("Workers", "Workers", false, es);
+
+    pfree(wstate->worker_inited);
+    pfree(wstate->worker_str);
+    pfree(wstate->worker_state_save);
+    pfree(wstate);
+}
+
+/*
  * Explain a property, such as sort keys or targets, that takes the form of
  * a list of unlabeled items.  "data" is a list of C strings.
  */
@@ -3439,7 +3604,7 @@ ExplainPropertyList(const char *qlabel, List *data, ExplainState *es)
     switch (es->format)
     {
         case EXPLAIN_FORMAT_TEXT:
-            appendStringInfoSpaces(es->str, es->indent * 2);
+            ExplainIndentText(es);
             appendStringInfo(es->str, "%s: ", qlabel);
             foreach(lc, data)
             {
@@ -3560,7 +3725,7 @@ ExplainProperty(const char *qlabel, const char *unit, const char *value,
     switch (es->format)
     {
         case EXPLAIN_FORMAT_TEXT:
-            appendStringInfoSpaces(es->str, es->indent * 2);
+            ExplainIndentText(es);
             if (unit)
                 appendStringInfo(es->str, "%s: %s %s\n", qlabel, value, unit);
             else
@@ -3752,6 +3917,117 @@ ExplainCloseGroup(const char *objtype, const char *labelname,
 }

 /*
+ * Open a group of related objects, without emitting actual data.
+ *
+ * Prepare the formatting state as though we were beginning a group with
+ * the identified properties, but don't actually emit anything.  Output
+ * subsequent to this call can be redirected into a separate output buffer,
+ * and then eventually appended to the main output buffer after doing a
+ * regular ExplainOpenGroup call (with the same parameters).
+ *
+ * The extra "depth" parameter is the new group's depth compared to current.
+ * It could be more than one, in case the eventual output will be enclosed
+ * in additional nesting group levels.  We assume we don't need to track
+ * formatting state for those levels while preparing this group's output.
+ *
+ * There is no ExplainCloseSetAsideGroup --- in current usage, we always
+ * pop this state with ExplainSaveGroup.
+ */
+static void
+ExplainOpenSetAsideGroup(const char *objtype, const char *labelname,
+                         bool labeled, int depth, ExplainState *es)
+{
+    switch (es->format)
+    {
+        case EXPLAIN_FORMAT_TEXT:
+            /* nothing to do */
+            break;
+
+        case EXPLAIN_FORMAT_XML:
+            es->indent += depth;
+            break;
+
+        case EXPLAIN_FORMAT_JSON:
+            es->grouping_stack = lcons_int(0, es->grouping_stack);
+            es->indent += depth;
+            break;
+
+        case EXPLAIN_FORMAT_YAML:
+            if (labelname)
+                es->grouping_stack = lcons_int(1, es->grouping_stack);
+            else
+                es->grouping_stack = lcons_int(0, es->grouping_stack);
+            es->indent += depth;
+            break;
+    }
+}
+
+/*
+ * Pop one level of grouping state, allowing for a re-push later.
+ *
+ * This is typically used after ExplainOpenSetAsideGroup; pass the
+ * same "depth" used for that.
+ *
+ * This should not emit any output.  If state needs to be saved,
+ * save it at *state_save.  Currently, an integer save area is sufficient
+ * for all formats, but we might need to revisit that someday.
+ */
+static void
+ExplainSaveGroup(ExplainState *es, int depth, int *state_save)
+{
+    switch (es->format)
+    {
+        case EXPLAIN_FORMAT_TEXT:
+            /* nothing to do */
+            break;
+
+        case EXPLAIN_FORMAT_XML:
+            es->indent -= depth;
+            break;
+
+        case EXPLAIN_FORMAT_JSON:
+            es->indent -= depth;
+            *state_save = linitial_int(es->grouping_stack);
+            es->grouping_stack = list_delete_first(es->grouping_stack);
+            break;
+
+        case EXPLAIN_FORMAT_YAML:
+            es->indent -= depth;
+            *state_save = linitial_int(es->grouping_stack);
+            es->grouping_stack = list_delete_first(es->grouping_stack);
+            break;
+    }
+}
+
+/*
+ * Re-push one level of grouping state, undoing the effects of ExplainSaveGroup.
+ */
+static void
+ExplainRestoreGroup(ExplainState *es, int depth, int *state_save)
+{
+    switch (es->format)
+    {
+        case EXPLAIN_FORMAT_TEXT:
+            /* nothing to do */
+            break;
+
+        case EXPLAIN_FORMAT_XML:
+            es->indent += depth;
+            break;
+
+        case EXPLAIN_FORMAT_JSON:
+            es->grouping_stack = lcons_int(*state_save, es->grouping_stack);
+            es->indent += depth;
+            break;
+
+        case EXPLAIN_FORMAT_YAML:
+            es->grouping_stack = lcons_int(*state_save, es->grouping_stack);
+            es->indent += depth;
+            break;
+    }
+}
+
+/*
  * Emit a "dummy" group that never has any members.
  *
  * objtype is the type of the group object, labelname is its label within
@@ -3913,6 +4189,21 @@ ExplainXMLTag(const char *tagname, int flags, ExplainState *es)
 }

 /*
+ * Indent a text-format line.
+ *
+ * We indent by two spaces per indentation level.  However, when emitting
+ * data for a parallel worker there might already be data on the current line
+ * (cf. ExplainOpenWorker); in that case, don't indent any more.
+ */
+static void
+ExplainIndentText(ExplainState *es)
+{
+    Assert(es->format == EXPLAIN_FORMAT_TEXT);
+    if (es->str->len == 0 || es->str->data[es->str->len - 1] == '\n')
+        appendStringInfoSpaces(es->str, es->indent * 2);
+}
+
+/*
  * Emit a JSON line ending.
  *
  * JSON requires a comma after each property but the last.  To facilitate this,
diff --git a/src/include/commands/explain.h b/src/include/commands/explain.h
index 45027a7..853e206 100644
--- a/src/include/commands/explain.h
+++ b/src/include/commands/explain.h
@@ -25,6 +25,15 @@ typedef enum ExplainFormat
     EXPLAIN_FORMAT_YAML
 } ExplainFormat;

+typedef struct ExplainWorkersState
+{
+    int            num_workers;    /* # of worker processes the plan used */
+    bool       *worker_inited;    /* per-worker state-initialized flags */
+    StringInfoData *worker_str; /* per-worker transient output buffers */
+    int           *worker_state_save;    /* per-worker grouping state save areas */
+    StringInfo    prev_str;        /* saved output buffer while redirecting */
+} ExplainWorkersState;
+
 typedef struct ExplainState
 {
     StringInfo    str;            /* output buffer */
@@ -47,6 +56,8 @@ typedef struct ExplainState
     List       *deparse_cxt;    /* context list for deparsing expressions */
     Bitmapset  *printed_subplans;    /* ids of SubPlans we've printed */
     bool        hide_workers;    /* set if we find an invisible Gather */
+    /* state related to the current plan node */
+    ExplainWorkersState *workers_state; /* needed if parallel plan */
 } ExplainState;

 /* Hook for plugins to get control in ExplainOneQuery() */

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Memory-Bounded Hash Aggregation
Next
From: Melanie Plageman
Date:
Subject: Re: Avoiding hash join batch explosions with extreme skew and weird stats