Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel Seq Scan
Date
Msg-id CAA4eK1JSbow+Fdj6nrQhHP-6PQiFPGSxe5wrNa0pz0pj9nfjxA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Parallel Seq Scan
List pgsql-hackers
On Tue, Sep 29, 2015 at 10:49 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Sep 29, 2015 at 12:39 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Attached patch is a rebased patch based on latest commit (d1b7c1ff)
> > for Gather node.
> >
> > - I have to reorganize the defines in execParallel.h and .c. To keep
> > ParallelExecutorInfo, in GatherState node, we need to include execParallel.h
> > in execnodes.h which was creating compilation issues as execParallel.h
> > also includes execnodes.h, so for now I have defined ParallelExecutorInfo
> > in execnodes.h and instrumentation related structures in instrument.h.
> > - Renamed parallel_seqscan_degree to degree_of_parallelism
> > - Rename Funnel to Gather
> > - Removed PARAM_EXEC parameter handling code, I think we can do this
> > separately.
> >
> > I have to work more on partial seq scan patch for rebasing it and handling
> > review comments for the same, so for now I am sending the first part of
> > patch (which included Gather node functionality and some general support
> > for parallel-query execution).
>
> Thanks for the fast rebase.
>
> This patch needs a bunch of cleanup:
>
> - The formatting for the GatherState node's comment block is unlike
> that of surrounding comment blocks.  It lacks the ------- dividers,
> and the indentation is not the same.  Also, it refers to
> ParallelExecutorInfo by the type name, but the other members by
> structure member name.  The convention is to refer to them by
> structure member name, so please do that.
>
> - The naming of fs_workersReady is inconsistent with the other
> structure members.  The other members use all lower-case names,
> separating words with dashes, but this one uses a capital letter.  The
> other members also don't prefix the names with anything, but this uses
> a "fs_" prefix which I assume is left over from when this was called
> FunnelState.  Finally, this doesn't actually tell you when workers are
> ready, just whether they were launched.  I suggest we rename this to
> "any_worker_launched".
>
> - Instead of moving the declaration of ParallelExecutorInfo, please
> just refer to it as "struct ParallelExecutorInfo" in execnodes.h.
> That way, you're not sucking these includes into all kinds of places
> they don't really need to be.
>
> - Let's not create a new PARALLEL_QUERY category of GUC.  Instead,
> let's the GUC for the number of workers with under resource usage ->
> asynchronous behavior.
>

Changed as per suggestion.

> - I don't believe that shm_toc *toc has any business being part of a
> generic PlanState node.  At most, it should be part of an individual
> type of PlanState, like a GatherState or PartialSeqScanState.  But
> really, I don't see why we need it there at all.
>

We need it for getting parallelheapscan descriptor in case of
partial sequence scan node, it doesn't seem like a good idea
to retrieve it in begining, as we need to dig into plan tree to
get the node_id for getting the value of parallelheapscan descriptor
from toc.

Now, I think we can surely keep it in PartialSeqScanState or any
other node state which might need it later, but I felt this is quite
generic and we might need to fetch node specific information from toc
going forward.


> - I'd like to do some renaming of the new GUCs.  I suggest we rename
> cpu_tuple_comm_cost to parallel_tuple_cost and degree_of_parallelism
> to max_parallel_degree.
>

Changed as per suggestion.

> - I think that a Gather node should inherit from Plan, not Scan.  A
> Gather node really shouldn't have a scanrelid.  Now, admittedly, if
> the only thing under the Gather is a Partial Seq Scan, it wouldn't be
> totally bonkers to think of the Gather as scanning the same relation
> that the Partial Seq Scan is scanning.  But in any more complex case,
> like where it's scanning a join, you're out of luck.  You'll have to
> set scanrelid == 0, I suppose, but then, for example, ExecScanReScan
> is not going to work.  In fact, as far as I can see, the only way
> nodeGather.c is actually using any of the generic scan stuff is by
> calling ExecInitScanTupleSlot, which is all of one line of code.
> ExecEndGather fetches node->ss.ss_currentRelation but then does
> nothing with it.  So I think this is just a holdover from early
> version of this patch where what's now Gather and PartialSeqScan were
> a single node, and I think we should rip it out.
>

makes sense and I think GatherState should also be inherit from PlanState
instead of ScanState which I have changed in patch attached.

> - On a related note, the assertions in cost_gather() are both bogus
> and should be removed. Similarly with create_gather_plan().  As
> previously mentioned, the Gather node should not care what sort of
> thing is under it; I am not interested in restricting it to baserels
> and then undoing that later.
>
> - For the same reason, cost_gather() should refer to it's third
> argument as "rel" not "baserel".
>

Changed as per suggestion.

> - Also, I think this stuff about physical tlists in
> create_gather_plan() is bogus.  use_physical_tlist is ignorant of the
> possibility that the RelOptInfo passed to it might be anything other
> than a baserel, and I think it won't be happy if it gets a joinrel.
> Moreover, I think our plan here is that, at least for now, the
> Gather's tlist will always match the tlist of its child.  If that's
> so, there's no point to this: it will end up with the same tlist
> either way.  If any projection is needed, it should be done by the
> Gather node's child, not the Gather node itself.
>

Yes, Gather node itself doesn't need to do projection, but it
needs the projection info to store the same in Slot after fetching
the tuple from tuple queue.  Now this is not required for Gather
node itself, but it might be required for any node on top of
Gather node.

Here, I think one thing we could do is that use the subplan's target
list as currently is being done for quals.  The only risk is what if
Gating node is added on top of partialseqscan (subplan), but I have checked
that is safe, because Gating plan uses the same target list as it's child.
Also I don't think we need to process any quals at Gather node, so I will
make that as Null, I will do this change in next version unless you see
any problem with it.

Yet another idea is during set_plan_refs(), we can assign leftchild's
target list to parent in case of Gather node (right now it's done in
reverse way which needs to be changed.)

What is your preference?

> - Let's rename DestroyParallelSetupAndAccumStats to
> ExecShutdownGather.  Instead of encasing the entire function in if
> statement, let's start with if (node->pei == NULL || node->pei->pcxt
> == NULL) return.
>
> - ExecParallelBufferUsageAccum should be declared to take an argument
> of type PlanState, not Node.  Then you don't have to cast what you are
> passing to it, and it doesn't have to cast before calling itself. And,
> let's also rename it to ExecShutdownNode and move it to
> execProcnode.c.  Having a "shutdown phase" that stops a node from
> asynchronously consuming additional resources could be useful for
> non-parallel node types - especially ForeignScan and CustomScan.  And
> we could eventually extend this to be called in other situations, like
> when a Limit is filled give everything beneath it a chance to ease up.
> We don't have to do those bits of work right now but it seems well
> worth making this look like a generic facility.
>
> - Calling DestroyParallelSetupAndAccumStats from ExplainNode when we
> actually reach the Gather node is much too late.  We should really be
> shutting down parallel workers at the end of the ExecutorRun phase, or
> certainly no later than ExecutorFinish.  In fact, you have
> standard_ExecutorRun calling ExecParallelBufferUsageAccum() but only
> if queryDesc->totaltime is set.  What I think you should do instead is
> call ExecShutdownNode a few lines earlier, before shutting down the
> tuple receiver, and do so unconditionally.  That way, the workers are
> always shut down in the ExecutorRun phase, which should eliminate the
> need for this bit in explain.c.
>
> - The changes to postmaster.c and postgres.c consist of only
> additional #includes.  Those can, presumably, be reverted.
>

Changed as per suggestion.

Note - You will see one or two unrelated changes due to pgindent, but they
are from to parallel seq scan related files, so I have retained them.  It
might be better to do them separately, however as I was doing pgindent
for those files, so I thought it is okay to retain changed for related files.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
Attachment

pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: No Issue Tracker - Say it Ain't So!
Next
From: "Joshua D. Drake"
Date:
Subject: Re: No Issue Tracker - Say it Ain't So!