Home > mailing lists

Re: [DESIGN] ParallelAppend - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: [DESIGN] ParallelAppend
Date	August 8, 2015 03:06:47
Msg-id	CAA4eK1K+qz+TNW52xjc0Kee9Awh2CvbXrMRw-26r21DhfqF3Zg@mail.gmail.com Whole thread
In response to	Re: [DESIGN] ParallelAppend (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
List	pgsql-hackers

Tree view

On Fri, Aug 7, 2015 at 2:15 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>
> > On Sat, Aug 1, 2015 at 6:39 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > >
>
> > > > > If we pull Funnel here, I think the plan shall be as follows:
> > > > > Funnel
> > > > > --> SeqScan on rel1
> > > > > --> PartialSeqScan on rel2
> > > > > --> IndexScan on rel3
> > > > >
> > > >
> > > > So if we go this route, then Funnel should have capability
> > > > to execute non-parallel part of plan as well, like in this
> > > > case it should be able to execute non-parallel IndexScan on
> > > > rel3 as well and then it might need to distinguish between
> > > > parallel and non-parallel part of plans. I think this could
> > > > make Funnel node complex.
> > > >
> > > It is difference from what I plan now. In the above example,
> > > Funnel node has two non-parallel aware node (rel1 and rel3)
> > > and one parallel aware node, then three PlannedStmt for each
> > > shall be put on the TOC segment. Even though the background
> > > workers pick up a PlannedStmt from the three, only one worker
> > > can pick up the PlannedStmt for rel1 and rel3, however, rel2
> > > can be executed by multiple workers simultaneously.
> >
> > Okay, now I got your point, but I think the cost of execution
> > of non-parallel node by additional worker is not small considering
> > the communication cost and setting up an addional worker for
> > each sub-plan (assume the case where out of 100-child nodes
> > only few (2 or 3) nodes actually need multiple workers).
> >
> It is a competition between traditional Append that takes Funnel
> children and the new appendable Funnel that takes parallel and
> non-parallel children. Probably, key factors are cpu_tuple_comm_cost,
> parallel_setup_cost and degree of selectivity of sub-plans.
> Both cases has advantage and disadvantage depending on the query,
> so we can never determine which is better without path consideration.

Sure, that is what we should do, however the tricky part would be when

the path for doing local scan is extremely cheaper than path for parallel

scan for one of the child nodes. For such cases, pulling up Funnel-node

can incur more cost. I think some of the other possible ways to make this

work could be to extend Funnel so that it is capable of executing both parallel

and non-parallel nodes, have a new Funnel like node which has such a

capability.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Josh Berkus
Date: 08 August 2015, 01:08:23
Subject: Re: Bug? Small samples in TABLESAMPLE SYSTEM returns zero rows

From: Amit Kapila
Date: 08 August 2015, 04:14:40
Subject: Re: Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file

Re: [DESIGN] ParallelAppend - Mailing list pgsql-hackers

Previous

Next