Home > mailing lists

Re: [DESIGN] ParallelAppend - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: [DESIGN] ParallelAppend
Date	November 18, 2015 15:26:06
Msg-id	CAA4eK1LgUxjRbi-CbhpiXE_NMJhup9JVEw=HMp87wfL9EdLUMg@mail.gmail.com Whole thread Raw
In response to	Re: [DESIGN] ParallelAppend (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: [DESIGN] ParallelAppend (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

On Sat, Nov 14, 2015 at 3:39 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Nov 12, 2015 at 12:09 AM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> > I'm now designing the parallel feature of Append...
> >
> > Here is one challenge. How do we determine whether each sub-plan
> > allows execution in the background worker context?
>
> I've been thinking about these questions for a bit now, and I think we
> can work on improving Append in multiple phases. The attached patch
> shows what I have in mind for phase 1.

Couple of comments and questions regarding this patch:

+/*

+ * add_partial_path

+ * produce the same number of rows. Neither do we need to consider startup

+ * costs: parallelism

is only used for plans that will be run to completion.

Don't we need the startup cost incase we need to build partial paths for

joinpaths like mergepath?

Also, I think there are other cases for single relation scan where startup

cost can matter like when there are psuedoconstants in qualification

(refer cost_qual_eval_walker()) or let us say if someone has disabled

seq scan (disable_cost is considered as startup cost.)

B. I think partial path is an important concept and desrves some

explanation in src/backend/optimizer/README.

There is already a good explanation about Paths, so I think it

seems that it is better to add explanation about partial paths.

+ * costs: parallelism is only used for plans that will be run to completion.

+ * Therefore, this

routine is much simpler than add_path: it needs to

+ * consider only pathkeys and total cost.

There seems to be some spacing issue in last two lines.

+static void

+create_parallel_paths(PlannerInfo *root, RelOptInfo *rel)

+ int parallel_threshold = 1000;

+ int parallel_degree = 1;

+ /*

+ * If this relation is too small to be worth a parallel scan, just return

+ * without doing anything ... unless it's an inheritance child. In that case,

+ * we want to generate a parallel path here anyway. It might not be worthwhile

+ * just for this relation, but when combined with all of its inheritance siblings

+ * it may well pay off.

+ */

+ if (rel->pages < parallel_threshold && rel->reloptkind == RELOPT_BASEREL)

+ return;

This means that for inheritance child relations for which rel pages are

less than parallel_threshold, it will always consider the cost shared

between 1 worker and leader as per below calc in cost_seqscan:

if (path->parallel_degree > 0)

run_cost = run_cost / (path->parallel_degree + 0.5);

I think this might not be the appropriate cost model for even for

non-inheritence relations which has pages more than parallel_threshold,

but it seems to be even worst for inheritance children which have

pages less than parallel_threshold

Will it be possible that if none of the inheritence child rels (or very few

of them) are big enough for parallel scan, then considering Append

node for parallelism of any use or in other words, won't it be better

to generate plan as it is done now without this patch for such cases

considering current execution model of Gather node?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Thomas Munro
Date: 18 November 2015, 13:50:27
Subject: Re: Proposal: "Causal reads" mode for load balancing reads without stale data

From: Michael Paquier
Date: 18 November 2015, 16:06:50
Subject: Re: Additional role attributes && superuser review

Re: [DESIGN] ParallelAppend - Mailing list pgsql-hackers

Previous

Next