Re: [DESIGN] ParallelAppend - Mailing list pgsql-hackers

From Kouhei Kaigai
Subject Re: [DESIGN] ParallelAppend
Date
Msg-id 9A28C8860F777E439AA12E8AEA7694F801131F76@BPXM15GP.gisp.nec.co.jp
Whole thread Raw
In response to [DESIGN] ParallelAppend  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Responses Re: [DESIGN] ParallelAppend  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
> On Fri, Aug 7, 2015 at 2:15 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> >
> > > On Sat, Aug 1, 2015 at 6:39 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > > >
> >
> > > > > > If we pull Funnel here, I think the plan shall be as follows:
> > > > > >   Funnel
> > > > > >    --> SeqScan on rel1
> > > > > >    --> PartialSeqScan on rel2
> > > > > >    --> IndexScan on rel3
> > > > > >
> > > > >
> > > > > So if we go this route, then Funnel should have capability
> > > > > to execute non-parallel part of plan as well, like in this
> > > > > case it should be able to execute non-parallel IndexScan on
> > > > > rel3 as well and then it might need to distinguish between
> > > > > parallel and non-parallel part of plans.  I think this could
> > > > > make Funnel node complex.
> > > > >
> > > > It is difference from what I plan now. In the above example,
> > > > Funnel node has two non-parallel aware node (rel1 and rel3)
> > > > and one parallel aware node, then three PlannedStmt for each
> > > > shall be put on the TOC segment. Even though the background
> > > > workers pick up a PlannedStmt from the three, only one worker
> > > > can pick up the PlannedStmt for rel1 and rel3, however, rel2
> > > > can be executed by multiple workers simultaneously.
> > >
> > > Okay, now I got your point, but I think the cost of execution
> > > of non-parallel node by additional worker is not small considering
> > > the communication cost and setting up an addional worker for
> > > each sub-plan (assume the case where out of 100-child nodes
> > > only few (2 or 3) nodes actually need multiple workers).
> > >
> > It is a competition between traditional Append that takes Funnel
> > children and the new appendable Funnel that takes parallel and
> > non-parallel children. Probably, key factors are cpu_tuple_comm_cost,
> > parallel_setup_cost and degree of selectivity of sub-plans.
> > Both cases has advantage and disadvantage depending on the query,
> > so we can never determine which is better without path consideration.
> 
> Sure, that is what we should do, however the tricky part would be when
> the path for doing local scan is extremely cheaper than path for parallel
> scan for one of the child nodes.  For such cases, pulling up Funnel-node
> can incur more cost.  I think some of the other possible ways to make this
> work could be to extend Funnel so that it is capable of executing both parallel
> and non-parallel nodes, have a new Funnel like node which has such a
> capability.
>
I think it is job of (more intelligent) planner but not in the first
version. If subplans of Append are mixture of nodes which has or does
not have worth of parallel execution, we will be able to arrange the
original form:
 Append  + Scan on rel1 (large)  + Scan on rel2 (large)  + Scan on rel3 (middle)  + Scan on rel4 (tiny)  + Scan on rel5
(tiny)

to Funnel aware form, but partially:
 Append  + Funnel  |  + Scan on rel1 (large)  |  + Scan on rel2 (large)  |  + Scan on rel3 (large)    + Scan on rel4
(tiny) + Scan on rel5 (tiny)
 

It does not require special functionalities of Append/Funnel more
than what we have discussed, as long as planner is enough intelligent.
One downside of this approach is, plan tree tends to become more
complicated, thus makes logic to pushdown joins also becomes complicated.


Here is one other issue I found. Existing code assumes a TOC segment has
only one contents per node type, so it uses pre-defined key (like
PARALLEL_KEY_SCAN) per node type, however, it is problematic if we put
multiple PlannedStmt or PartialSeqScan node on a TOC segment.
My idea is enhancement of Plan node to have an unique identifier within
a particular plan trees. Once a unique identifier is assigned, we can
put individual information on the TOC segment, even if multiple
PartialSeqScan nodes are packed.
Did we have a discussion about this topic in the past?

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Warnings around booleans
Next
From: Stephen Frost
Date:
Subject: Re: WIP: SCRAM authentication