Re: [DESIGN] ParallelAppend - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [DESIGN] ParallelAppend
Date
Msg-id CAA4eK1KPJMNLVr7PC5WaNNqkRWqBDQzFuAuAAbQ3u_7Ug7dBxQ@mail.gmail.com
Whole thread Raw
In response to Re: [DESIGN] ParallelAppend  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Responses Re: [DESIGN] ParallelAppend  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
List pgsql-hackers
On Sat, Aug 1, 2015 at 6:39 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>
> > On Tue, Jul 28, 2015 at 6:08 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> >
> > I am not sure, but what problem do you see in putting Funnel node
> > for one of the relation scans and not for the others.
> >
> At this moment, I'm not certain whether background worker can/ought
> to launch another background workers.
> If sub-Funnel node is executed by 10-processes then it also launch
> 10-processes for each, 100-processes will run for each?
>

Yes, that could be more work than current, but what I had in mind
is not that way, rather I was thinking that master backend will only
kick of workers for Funnel nodes in plan.

> > > If we pull Funnel here, I think the plan shall be as follows:
> > >   Funnel
> > >    --> SeqScan on rel1
> > >    --> PartialSeqScan on rel2
> > >    --> IndexScan on rel3
> > >
> >
> > So if we go this route, then Funnel should have capability
> > to execute non-parallel part of plan as well, like in this
> > case it should be able to execute non-parallel IndexScan on
> > rel3 as well and then it might need to distinguish between
> > parallel and non-parallel part of plans.  I think this could
> > make Funnel node complex.
> >
> It is difference from what I plan now. In the above example,
> Funnel node has two non-parallel aware node (rel1 and rel3)
> and one parallel aware node, then three PlannedStmt for each
> shall be put on the TOC segment. Even though the background
> workers pick up a PlannedStmt from the three, only one worker
> can pick up the PlannedStmt for rel1 and rel3, however, rel2
> can be executed by multiple workers simultaneously.

Okay, now I got your point, but I think the cost of execution
of non-parallel node by additional worker is not small considering
the communication cost and setting up an addional worker for
each sub-plan (assume the case where out of 100-child nodes
only few (2 or 3) nodes actually need multiple workers).

> >
> > I think for a particular PlannedStmt, number of workers must have
> > been decided before start of execution, so if those many workers are
> > available to work on that particular PlannedStmt, then next/new
> > worker should work on next PlannedStmt.
> >
> My concern about pre-determined number of workers is, it depends on the
> run-time circumstances of concurrent sessions. Even if planner wants to
> assign 10-workers on a particular sub-plan, only 4-workers may be
> available on the run-time because of consumption by side sessions.
> So, I expect only maximum number of workers is meaningful configuration.
>

In that case, there is possibility that many of the workers are just
working on one or two of the nodes and other nodes execution might
get starved.  I understand this is tricky problem to allocate number
of workers for different nodes, however we should try to develop any
algorithm where there is some degree of fairness in allocation of workers
for different nodes.


> > 2. Execution of work by workers and Funnel node and then pass
> > the results back to upper node.  I think this needs some more
> > work in addition to ParallelSeqScan patch.
> >
> I expect we can utilize existing infrastructure here. It just picks
> up the records come from the underlying workers, then raise it to
> the upper node.
>

Sure, but still you need some work atleast in the area of making
workers understand different node types, I am guessing you need
to modify readfuncs.c to support new plan node if any for this
work.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Haribabu Kommi
Date:
Subject: Re: Caching offsets of varlena attributes
Next
From: Bruce Momjian
Date:
Subject: Re: Hash index creation warning