Home > mailing lists

Re: [DESIGN] ParallelAppend - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: [DESIGN] ParallelAppend
Date	November 19, 2015 10:59:29
Msg-id	CAA4eK1J4oNGmxvB4_jY6CfemN5rpHbuMd7FrpTk5Q2gzUBmKqw@mail.gmail.com Whole thread Raw
In response to	Re: [DESIGN] ParallelAppend (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: [DESIGN] ParallelAppend (Amit Kapila <amit.kapila16@gmail.com>) Re: [DESIGN] ParallelAppend (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

On Thu, Nov 19, 2015 at 12:27 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Nov 18, 2015 at 7:25 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > Don't we need the startup cost incase we need to build partial paths for
> > joinpaths like mergepath?
> > Also, I think there are other cases for single relation scan where startup
> > cost can matter like when there are psuedoconstants in qualification
> > (refer cost_qual_eval_walker()) or let us say if someone has disabled
> > seq scan (disable_cost is considered as startup cost.)
>
> I'm not saying that we don't need to compute it. I'm saying we don't
> need to take it into consideration when deciding which paths have
> merit. Note that consider_statup is set this way:
>
> rel->consider_startup = (root->tuple_fraction > 0);
>

Even when consider_startup is false, still startup_cost is used for cost

calc, now may be ignoring that is okay for partial paths, but still it seems

worth thinking why leaving for partial paths it is okay even though it

is used in add_path().

+ * We don't generate parameterized partial paths because they seem unlikely

+ * ever to be

worthwhile. The only way we could ever use such a path is

+ * by executing a nested loop with a complete

path on the outer side - thus,

+ * each worker would scan the entire outer relation - and the partial

path

+ * on the inner side - thus, each worker would scan only part of the inner

+ * relation. This is

silly: a parameterized path is generally going to be

+ * based on an index scan, and we can't generate a

partial path for that.

Won't it be useful to consider parameterized paths for below kind of

plans where we can push the jointree to worker and each worker can

scan the complete outer relation A and then the rest work is divided

among workers (ofcourse there can be other ways to parallelize such joins,

but still the way described also seems to be possible)?

NestLoop

-> Seq Scan on A

Hash Join

Join Condition: B.Y = C.W

-> Seq Scan on B

-> Index Scan using C_Z_IDX on C

Index Condition: C.Z = A.X

Is the main reason to have add_partial_path() is that it has some

less checks or is it that current add_path will give wrong answers

in any case?

If there is no case where add_path can't work, then there is some

advanatge in retaining add_path() atleast in terms of maintining

the code.

+void

+add_partial_path(RelOptInfo *parent_rel, Path *new_path)

{

+ /* Unless pathkeys are incompable, keep just one of the two paths. */

typo - 'incompable'

> > A.
> > This means that for inheritance child relations for which rel pages are
> > less than parallel_threshold, it will always consider the cost shared
> > between 1 worker and leader as per below calc in cost_seqscan:
> > if (path->parallel_degree > 0)
> > run_cost = run_cost / (path->parallel_degree + 0.5);
> >
> > I think this might not be the appropriate cost model for even for
> > non-inheritence relations which has pages more than parallel_threshold,
> > but it seems to be even worst for inheritance children which have
> > pages less than parallel_threshold
>
> Why?

Because I think the way code is written, it assumes that for each of the

inheritence-child relation which has pages lesser than threshold, half

the work will be done by master-backend which doesn't seem to be the

right distribution. Consider a case where there are three such children

each having cost 100 to scan, now it will cost them as

100/1.5 + 100/1.5 + 100/1.5 which means that per worker, it is

considering 0.5 of master backends work which seems to be wrong.

I think for Append case, we should consider this cost during Append path

creation in create_append_path(). Basically we can make cost_seqscan

to ignore the cost reduction due to parallel_degree for inheritance relations

and then during Append path creation we can consider it and also consider

work unit of master backend as 0.5 with respect to overall work.

--- a/src/backend/optimizer/README

+++ b/src/backend/optimizer/README

+plan as possible. Expanding the range of cases in which more work can be

+pushed below the Gather (and

costly them accurately) is likely to keep us

+busy for a long time to come.

Seems there is a typo in above text.

/costly/cost

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Michael Paquier
Date: 19 November 2015, 10:39:20
Subject: Re: [PROPOSAL] VACUUM Progress Checker.

From: Erik Rijkers
Date: 19 November 2015, 11:41:54
Subject: Re: warning: HS_KEY redefined (9.5 beta2)

Re: [DESIGN] ParallelAppend - Mailing list pgsql-hackers

Previous

Next