Observations in Parallel Append - Mailing list pgsql-hackers

From Amit Kapila
Subject Observations in Parallel Append
Date
Msg-id CAA4eK1+qcbeai3coPpRW=GFCzFeLUsuY4T-AKHqMjxpEGZBPQg@mail.gmail.com
Whole thread Raw
Responses Re: Observations in Parallel Append
List pgsql-hackers
Few observations in Parallel Append commit (ab727167)

1.
+++ b/src/include/nodes/execnodes.h
@@ -21,6 +21,7 @@
 #include "lib/pairingheap.h"
 #include "nodes/params.h"
 #include "nodes/plannodes.h"
+#include "storage/spin.h"
..

There doesn't seem to be any need for including spin.h.  I think some
prior version of the patch might need it.  Patch attached to remove
it.

2.
+ *     choose_next_subplan_for_worker
..
+ *     We start from the first plan and advance through the list;
+ *     when we get back to the end, we loop back to the first
+ *     nonpartial plan.
..
+choose_next_subplan_for_worker(AppendState *node)
{
..
+       if (pstate->pa_next_plan < node->as_nplans - 1)
+       {
+           /* Advance to next plan. */
+           pstate->pa_next_plan++;
+       }
+       else if (append->first_partial_plan < node->as_nplans)
+       {
+           /* Loop back to first partial plan. */
+           pstate->pa_next_plan = append->first_partial_plan;
+       }
..

The code and comment don't seem to match.  The comments indicate that
after reaching the end of the list, we loop back to first nonpartial
plan whereas code indicates that we loop back to first partial plan.
I think one of those needs to be changed unless I am missing something
obvious.

3.
+cost_append(AppendPath *apath)
{
..
+           /*
+            * Apply parallel divisor to non-partial subpaths.  Also add the
+            * cost of partial paths to the total cost, but ignore non-partial
+            * paths for now.
+            */
+           if (i < apath->first_partial_path)
+               apath->path.rows += subpath->rows / parallel_divisor;
+           else
+           {
+               apath->path.rows += subpath->rows;
+               apath->path.total_cost += subpath->total_cost;
+           }
..
}

I think it is better to use clamp_row_est for rows for the case where
we use parallel_divisor so that the value of rows is always sane.
Also, don't we need to use parallel_divisor for partial paths instead
of non-partial paths as those will be actually distributed among
workers?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Attachment

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: Using ProcSignal to get memory context stats from a running backend
Next
From: Oliver Ford
Date:
Subject: Re: Add RANGE with values and exclusions clauses to the Window Functions