Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Parallel Seq Scan
Date
Msg-id CA+Tgmobd6iMdMwLkKz--GPUHNxS3ehrS7CkF+H-nST4u+hPKbg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: Parallel Seq Scan
Re: Parallel Seq Scan
Re: Parallel Seq Scan
List pgsql-hackers
On Thu, Sep 3, 2015 at 6:21 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> Attached, find the rebased version of patch.  It fixes the comments raised
> by Jeff  Davis and Antonin Houska.  The main changes in this version are
> now it supports sync scan along with parallel sequential scan (refer
> heapam.c)
> and the patch has been split into two parts, first contains the code for
> Funnel node and infrastructure to support the same and second contains
> the code for PartialSeqScan node  and its infrastructure.

+    if (es->analyze && nodeTag(plan) == T_Funnel)

Why not IsA()?

+        FinishParallelSetupAndAccumStats((FunnelState *)planstate);

Shouldn't there be a space before planstate?

+    /* inform executor to collect buffer usage stats from parallel workers. */
+    estate->total_time = queryDesc->totaltime ? 1 : 0;

Boy, the comment sure doesn't seem to match the code.

+         * Accumulate the stats by parallel workers before stopping the
+         * node.

Suggest: "Accumulate stats from parallel workers before stopping node".

+             * If we are not able to send the tuple, then we assume that
+             * destination has closed and we won't be able to send any more
+             * tuples so we just end the loop.

Suggest: "If we are not able to send the tuple, we assume the
destination has closed and no more tuples can be sent.  If that's the
case, end the loop."

+static void
+EstimateParallelSupportInfoSpace(ParallelContext *pcxt, ParamListInfo params,
+                                 List *serialized_param_exec_vals,
+                                 int instOptions, Size *params_size,
+                                 Size *params_exec_size);
+static void
+StoreParallelSupportInfo(ParallelContext *pcxt, ParamListInfo params,
+                         List *serialized_param_exec_vals,
+                         int instOptions, Size params_size,
+                         Size params_exec_size,
+                         char **inst_options_space,
+                         char **buffer_usage_space);

Whitespace doesn't look like PostgreSQL style.  Maybe run pgindent on
the newly-added files?

+/*
+ * This is required for parallel plan execution to fetch the information
+ * from dsm.
+ */

This comment doesn't really say anything.  Can we get a better one?

+    /*
+     * We expect each worker to populate the BufferUsage structure
+     * allocated by master backend and then master backend will aggregate
+     * all the usage along with it's own, so account it for each worker.
+     */

This also needs improvement.  Especially because...

+    /*
+     * We expect each worker to populate the instrumentation structure
+     * allocated by master backend and then master backend will aggregate
+     * all the information, so account it for each worker.
+     */

...it's almost identical to this one.

+     * Store bind parameter's list in dynamic shared memory.  This is
+     * used for parameters in prepared query.

s/bind parameter's list/bind parameters/.  I think you could drop the
second sentence, too.

+    /*
+     * Store PARAM_EXEC parameters list in dynamic shared memory.  This is
+     * used for evaluation plan->initPlan params.
+     */

So is the previous block for PARAM_EXTERN and this is PARAM_EXEC?  If
so, maybe that could be more clearly laid out.

+GetParallelSupportInfo(shm_toc *toc, ParamListInfo *params,

Could this be a static function?  Will it really be needed outside this file?

And is there any use case for letting some of the arguments be NULL?
Seems kind of an awkward API.

+bool
+ExecParallelBufferUsageAccum(Node *node)
+{
+    if (node == NULL)
+        return false;
+
+    switch (nodeTag(node))
+    {
+        case T_FunnelState:
+            {
+                FinishParallelSetupAndAccumStats((FunnelState*)node);
+                return true;
+            }
+            break;
+        default:
+            break;
+    }
+
+    (void) planstate_tree_walker((Node*)((PlanState *)node)->lefttree, NULL,
+                                 ExecParallelBufferUsageAccum, 0);
+    (void) planstate_tree_walker((Node*)((PlanState *)node)->righttree, NULL,
+                                 ExecParallelBufferUsageAccum, 0);
+    return false;
+}

This seems wacky.  I mean, isn't the point of planstate_tree_walker()
that the callback itself doesn't have to handle recursion like this?
And if not, then this wouldn't be adequate anyway, because some
planstate nodes have children that are not in lefttree or righttree
(cf. explain.c).

+    currentRelation = ExecOpenScanRelation(estate,
+                                           ((SeqScan *)
node->ss.ps.plan)->scanrelid,
+                                           eflags);

I can't see how this can possibly be remotely correct.  The funnel
node shouldn't be limited to scanning a baserel (cf. fdw_scan_tlist).

+void ExecAccumulateInstInfo(FunnelState *node)

Another place where pgindent would help.  There are a bunch of others
I noticed too, but I'm just mentioning a few here to make the point.

+            buffer_usage_worker = (BufferUsage *)(buffer_usage + (i *
sizeof(BufferUsage)));

Cast it to a BufferUsage * first.  Then you can use &foo[i] to find
the i'th element.

+    /*
+     * Re-initialize the parallel context and workers to perform
+     * rescan of relation.  We want to gracefully shutdown all the
+     * workers so that they should be able to propagate any error
+     * or other information to master backend before dying.
+     */
+    FinishParallelSetupAndAccumStats(node);

Somehow, this makes me feel like that function is badly named.

+/*
+ * _readPlanInvalItem
+ */
+static PlanInvalItem *
+_readPlanInvalItem(void)
+{
+    READ_LOCALS(PlanInvalItem);
+
+    READ_INT_FIELD(cacheId);
+    READ_UINT_FIELD(hashValue);
+
+    READ_DONE();
+}

I don't see why we should need to be able to copy PlanInvalItems.  In
fact, it seems like a bad idea.

+#parallel_setup_cost = 0.0  # same scale as above
+#define DEFAULT_PARALLEL_SETUP_COST  0.0

This value is probably a bit on the low side.

+int parallel_seqscan_degree = 0;

I think we should have a GUC for the maximum degree of parallelism in
a query generally, not the maximum degree of parallel sequential scan.

+    if (parallel_seqscan_degree >= MaxConnections)
+    {
+        write_stderr("%s: parallel_scan_degree must be less than
max_connections\n", progname);
+        ExitPostmaster(1);
+    }

I think this check is thoroughly unnecessary.  It's comparing to the
wrong thing anyway, because what actually matters is
max_worker_processes, not max_connections.  But in any case there is
no need for the check.  If somebody stupidly tries an unreasonable
value for the maximum degree of parallelism, they won't get that many
workers, but nothing will break.  It's no worse than setting any other
query planner costing parameter to an insane value.

--- a/src/include/access/heapam.h
+++ b/src/include/access/heapam.h
@@ -126,6 +126,7 @@ extern void heap_rescan_set_params(HeapScanDesc
scan, ScanKey key,extern void heap_endscan(HeapScanDesc scan);extern HeapTuple heap_getnext(HeapScanDesc scan,
ScanDirectiondirection);
 

+extern bool heap_fetch(Relation relation, Snapshot snapshot,

Stray whitespace change.

More later, that's what I noticed on a first read through.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Christoph Berg
Date:
Subject: Re: Building storage/lwlocknames.h?
Next
From: Simon Riggs
Date:
Subject: Re: Spurious standby query cancellations