Home > mailing lists

Re: [HACKERS] Parallel Append implementation - Mailing list pgsql-hackers

From	Amit Khandekar
Subject	Re: [HACKERS] Parallel Append implementation
Date	March 17, 2017 23:12:40
Msg-id	CAJ3gD9da4+EsXVPB4QNJ1fb3YGKiEM-kKVUQponY9ZAukzLWPw@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Parallel Append implementation (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: [HACKERS] Parallel Append implementation (Peter Geoghegan <pg@bowt.ie>) Re: [HACKERS] Parallel Append implementation (Amit Khandekar <amitdkhan.pg@gmail.com>) Re: [HACKERS] Parallel Append implementation (Robert Haas <robertmhaas@gmail.com>)
List	pgsql-hackers

Tree view

On 17 March 2017 at 01:37, Robert Haas <robertmhaas@gmail.com> wrote:
> - You've added a GUC (which is good) but not documented it (which is
> bad) or added it to postgresql.conf.sample (also bad).
>
> - You've used a loop inside a spinlock-protected critical section,
> which is against project policy.  Use an LWLock; define and document a
> new builtin tranche ID.
>
> - The comment for pa_finished claims that it is the number of workers
> executing the subplan, but it's a bool, not a count; I think this
> comment is just out of date.

Yes, agreed. Will fix the above.

>
> - paths_insert_sorted_by_cost() is a hand-coded insertion sort.  Can't
> we find a way to use qsort() for this instead of hand-coding a slower
> algorithm?  I think we could just create an array of the right length,
> stick each path into it from add_paths_to_append_rel, and then qsort()
> the array based on <is-partial, total-cost>.  Then the result can be
> turned into a list.

Yeah, I was in double minds as to whether to do the
copy-to-array-and-qsort thing, or should just write the same number of
lines of code to manually do an insertion sort. Actually I was
searching if we already have a linked list sort, but it seems we don't
have. Will do the qsort now since it would be faster.

>
> - Maybe the new helper functions in nodeAppend.c could get names
> starting with exec_append_, to match the style of
> exec_append_initialize_next().
>
> - There's a superfluous whitespace change in add_paths_to_append_rel.

Will fix this.

>
> - The substantive changes in add_paths_to_append_rel don't look right
> either.  It's not clear why accumulate_partialappend_subpath is
> getting called even in the non-enable_parallelappend case.  I don't
> think the logic for the case where we're not generating a parallel
> append path needs to change at all.

When accumulate_partialappend_subpath() is called for a childrel with
a partial path, it works just like accumulate_append_subpath() when
enable_parallelappend is false. That's why, for partial child path,
the same function is called irrespective of parallel-append or
non-parallel-append case. May be mentioning this in comments should
suffice here ?

>
> - When parallel append is enabled, I think add_paths_to_append_rel
> should still consider all the same paths that it does today, plus one
> extra.  The new path is a parallel append path where each subpath is
> the cheapest subpath for that childrel, whether partial or
> non-partial.  If !enable_parallelappend, or if all of the cheapest
> subpaths are partial, then skip this.  (If all the cheapest subpaths
> are non-partial, it's still potentially useful.)

In case of all-partial childrels, the paths are *exactly* same as
those that would have been created for enable_parallelappend=off. The
extra path is there for enable_parallelappend=on only when one or more
of the child rels do not have partial paths. Does this make sense ?

> In other words,
> don't skip consideration of parallel append just because you have a
> partial path available for every child rel; it could be

I didn't get this. Are you saying that in the patch it is getting
skipped if enable_parallelappend = off ? I don't think so. For
all-partial child rels, partial append is always created. Only thing
is, in case of enable_parallelappend=off, the Append path is not
parallel_aware, so it executes just like it executes today under
Gather without being parallel-aware.

>
> - I think the way cost_append() works is not right.  What you've got
> assumes that you can just multiply the cost of a partial plan by the
> parallel divisor to recover the total cost, which is not true because
> we don't divide all elements of the plan cost by the parallel divisor
> -- only the ones that seem like they should be divided.

Yes, that was an approximation done. For those subpaths for which
there is no parallel_divsor, we cannot calculate the total cost
considering the number of workers for the subpath. I feel we should
consider the per-subpath parallel_workers somehow. The
Path->total_cost for a partial path is *always* per-worker cost, right
? Just want to confirm this assumption of mine.

> Also, it
> could be smarter about what happens with the costs of non-partial
> paths. I suggest the following algorithm instead.
>
> 1. Add up all the costs of the partial paths.  Those contribute
> directly to the final cost of the Append.  This ignores the fact that
> the Append may escalate the parallel degree, but I think we should
> just ignore that problem for now, because we have no real way of
> knowing what the impact of that is going to be.

I wanted to take into account per-subpath parallel_workers for total
cost of Append. Suppose the partial subpaths have per worker total
costs (3, 3, 3) and their parallel_workers are (2, 8, 4), with 2
Append workers available. So according to what you say, the total cost
is 9. With per-subplan parallel_workers taken into account, total cost
= (3*2 + 3*8 * 3*4)/2 = 21.

May be I didn't follow exactly what you suggested. Your logic is not
taking into account number of workers ? I am assuming you are
calculating per-worker total cost here.

>
> 2. Next, estimate the cost of the non-partial paths.  To do this, make
> an array of Cost of that length and initialize all the elements to
> zero, then add the total cost of each non-partial plan in turn to the
> element of the array with the smallest cost, and then take the maximum
> of the array elements as the total cost of the non-partial plans.  Add
> this to the result from step 1 to get the total cost.

So with costs (8, 5, 2), add 8 and 5 to 2 so that it becomes (8, 5,
15) , and so the max is 15 ? I surely am misinterpreting this.

Actually, I couldn't come up with a general formula to find the
non-partial paths total cost, given the per-subplan cost and number of
workers. I mean, we can manually find out the total cost, but turning
it into a formula seems quite involved. We can even do a dry-run of
workers consuming each of the subplan slots and find the total time
time units taken, but finding some approximation seemed ok.

For e.g. we can manually find total time units taken for following :
costs (8, 2, 2, 2) with 2 workers : 8
costs (6, 6, 4, 1) with 2 workers : 10.
costs (6, 6, 4, 1) with 3 workers : 6.

But coming up with an alogrithm or a formula didn't look worth. So I
just did the total cost and divided it by workers. And besides that,
took the maximum of the 1st plan cost (since it is the highest) and
the average of total. I understand it would be too much approximation
for some cases, but another thing is, we don't know how to take into
account some of the workers shifting to partial workers. So the shift
may be quite fuzzy since all workers may not shift to partial plans
together.

>
> - In get_append_num_workers, instead of the complicated formula with
> log() and 0.693, just add the list lengths and call fls() on the
> result.  Integer arithmetic FTW!

Yeah fls() could be used. BTW I just found that costsize.c already has
this defined in the same way I did:
#define LOG2(x)  (log(x) / 0.693147180559945)
May be we need to shift this to some common header file.

pgsql-hackers by date:

From: Michael Banck
Date: 17 March 2017, 23:12:11
Subject: Re: [HACKERS] [patch] reorder tablespaces in basebackup tar streamfor backup_label

From: Tom Lane
Date: 17 March 2017, 23:14:12
Subject: Re: [HACKERS] <> join selectivity estimate question

Re: [HACKERS] Parallel Append implementation - Mailing list pgsql-hackers

Previous

Next