Re: [DESIGN] ParallelAppend - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: [DESIGN] ParallelAppend
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F8011300E4@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to Re: [DESIGN] ParallelAppend  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [DESIGN] ParallelAppend  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
> On Sat, Aug 1, 2015 at 6:39 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> >
> > > On Tue, Jul 28, 2015 at 6:08 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > >
> > > I am not sure, but what problem do you see in putting Funnel node
> > > for one of the relation scans and not for the others.
> > >
> > At this moment, I'm not certain whether background worker can/ought
> > to launch another background workers.
> > If sub-Funnel node is executed by 10-processes then it also launch
> > 10-processes for each, 100-processes will run for each?
> >
> 
> Yes, that could be more work than current, but what I had in mind
> is not that way, rather I was thinking that master backend will only
> kick of workers for Funnel nodes in plan.
>
I agree with, it is fair enough approach, so I mention about
pull-up of Funnel node.

> > > > If we pull Funnel here, I think the plan shall be as follows:
> > > >   Funnel
> > > >    --> SeqScan on rel1
> > > >    --> PartialSeqScan on rel2
> > > >    --> IndexScan on rel3
> > > >
> > >
> > > So if we go this route, then Funnel should have capability
> > > to execute non-parallel part of plan as well, like in this
> > > case it should be able to execute non-parallel IndexScan on
> > > rel3 as well and then it might need to distinguish between
> > > parallel and non-parallel part of plans.  I think this could
> > > make Funnel node complex.
> > >
> > It is difference from what I plan now. In the above example,
> > Funnel node has two non-parallel aware node (rel1 and rel3)
> > and one parallel aware node, then three PlannedStmt for each
> > shall be put on the TOC segment. Even though the background
> > workers pick up a PlannedStmt from the three, only one worker
> > can pick up the PlannedStmt for rel1 and rel3, however, rel2
> > can be executed by multiple workers simultaneously.
> 
> Okay, now I got your point, but I think the cost of execution
> of non-parallel node by additional worker is not small considering
> the communication cost and setting up an addional worker for
> each sub-plan (assume the case where out of 100-child nodes
> only few (2 or 3) nodes actually need multiple workers).
>
It is a competition between traditional Append that takes Funnel
children and the new appendable Funnel that takes parallel and
non-parallel children. Probably, key factors are cpu_tuple_comm_cost,
parallel_setup_cost and degree of selectivity of sub-plans.
Both cases has advantage and disadvantage depending on the query,
so we can never determine which is better without path consideration.

> > > I think for a particular PlannedStmt, number of workers must have
> > > been decided before start of execution, so if those many workers are
> > > available to work on that particular PlannedStmt, then next/new
> > > worker should work on next PlannedStmt.
> > >
> > My concern about pre-determined number of workers is, it depends on the
> > run-time circumstances of concurrent sessions. Even if planner wants to
> > assign 10-workers on a particular sub-plan, only 4-workers may be
> > available on the run-time because of consumption by side sessions.
> > So, I expect only maximum number of workers is meaningful configuration.
> >
> 
> In that case, there is possibility that many of the workers are just
> working on one or two of the nodes and other nodes execution might
> get starved.  I understand this is tricky problem to allocate number
> of workers for different nodes, however we should try to develop any
> algorithm where there is some degree of fairness in allocation of workers
> for different nodes.
>
I like to agree, however, I also want to keep the first version as
simple as possible we can. We can develop alternative logic to assign
suitable number of workers later.

> > > 2. Execution of work by workers and Funnel node and then pass
> > > the results back to upper node.  I think this needs some more
> > > work in addition to ParallelSeqScan patch.
> > >
> > I expect we can utilize existing infrastructure here. It just picks
> > up the records come from the underlying workers, then raise it to
> > the upper node.
> >
> 
> 
> Sure, but still you need some work atleast in the area of making
> workers understand different node types, I am guessing you need
> to modify readfuncs.c to support new plan node if any for this
> work.
> 
Yes, it was not a creative work. :-)
https://github.com/kaigai/sepgsql/blob/fappend/src/backend/nodes/readfuncs.c#L1479

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


pgsql-hackers by date:

Previous
From: Andreas Seltenreich
Date:
Subject: [sqlsmith] ERROR: too late to create a new PlaceHolderInfo
Next
From: Andres Freund
Date:
Subject: Re: Raising our compiler requirements for 9.6