Re: [DESIGN] ParallelAppend - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [DESIGN] ParallelAppend
Date
Msg-id CAA4eK1K+qz+TNW52xjc0Kee9Awh2CvbXrMRw-26r21DhfqF3Zg@mail.gmail.com
Whole thread Raw
In response to Re: [DESIGN] ParallelAppend  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
List pgsql-hackers
On Fri, Aug 7, 2015 at 2:15 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>
> > On Sat, Aug 1, 2015 at 6:39 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > >
>
> > > > > If we pull Funnel here, I think the plan shall be as follows:
> > > > >   Funnel
> > > > >    --> SeqScan on rel1
> > > > >    --> PartialSeqScan on rel2
> > > > >    --> IndexScan on rel3
> > > > >
> > > >
> > > > So if we go this route, then Funnel should have capability
> > > > to execute non-parallel part of plan as well, like in this
> > > > case it should be able to execute non-parallel IndexScan on
> > > > rel3 as well and then it might need to distinguish between
> > > > parallel and non-parallel part of plans.  I think this could
> > > > make Funnel node complex.
> > > >
> > > It is difference from what I plan now. In the above example,
> > > Funnel node has two non-parallel aware node (rel1 and rel3)
> > > and one parallel aware node, then three PlannedStmt for each
> > > shall be put on the TOC segment. Even though the background
> > > workers pick up a PlannedStmt from the three, only one worker
> > > can pick up the PlannedStmt for rel1 and rel3, however, rel2
> > > can be executed by multiple workers simultaneously.
> >
> > Okay, now I got your point, but I think the cost of execution
> > of non-parallel node by additional worker is not small considering
> > the communication cost and setting up an addional worker for
> > each sub-plan (assume the case where out of 100-child nodes
> > only few (2 or 3) nodes actually need multiple workers).
> >
> It is a competition between traditional Append that takes Funnel
> children and the new appendable Funnel that takes parallel and
> non-parallel children. Probably, key factors are cpu_tuple_comm_cost,
> parallel_setup_cost and degree of selectivity of sub-plans.
> Both cases has advantage and disadvantage depending on the query,
> so we can never determine which is better without path consideration.
 
Sure, that is what we should do, however the tricky part would be when
the path for doing local scan is extremely cheaper than path for parallel
scan for one of the child nodes.  For such cases, pulling up Funnel-node
can incur more cost.  I think some of the other possible ways to make this
work could be to extend Funnel so that it is capable of executing both parallel
and non-parallel nodes, have a new Funnel like node which has such a
capability.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Josh Berkus
Date:
Subject: Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows
Next
From: Amit Kapila
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file