Re: [HACKERS] Parallel Append implementation - Mailing list pgsql-hackers
From | Amit Khandekar |
---|---|
Subject | Re: [HACKERS] Parallel Append implementation |
Date | |
Msg-id | CAJ3gD9cyn+yCLWW56S+v5s5eVEwWZVrdPT25a+Bv9rDYPz8o1Q@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Parallel Append implementation (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Parallel Append implementation
|
List | pgsql-hackers |
On 19 February 2017 at 14:59, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Feb 17, 2017 at 2:56 PM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote: >> The log2(num_children)+1 formula which you proposed does not take into >> account the number of workers for each of the subplans, that's why I >> am a bit more inclined to look for some other logic. May be, treat the >> children as if they belong to partitions, and accordingly calculate >> the final number of workers. So for 2 children with 4 and 5 workers >> respectively, Append parallel_workers would be : log3(3^4 + 3^5) . > > In general this will give an answer not different by more than 1 or 2 > from my answer, and often exactly the same. In the case you mention, > whether we get the same answer depends on which way you round: > log3(3^4+3^5) is 5 if you round down, 6 if you round up. > > My formula is more aggressive when there are many subplans that are > not parallel or take only 1 worker, because I'll always use at least 5 > workers for an append that has 9-16 children, whereas you might use > only 2 if you do log3(3^0+3^0+3^0+3^0+3^0+3^0+3^0+3^0+3^0). In that > case I like my formula better. With lots of separate children, the > chances of being able to use as many as 5 workers seem good. (Note > that using 9 workers as Ashutosh seems to be proposing would be a > waste if the different children have very unequal execution times, > because the workers that run children with short execution times can > be reused to run additional subplans while the long ones are still > running. Running a separate worker for each child only works out if > the shortest runtime is more than 50% of the longest runtime, which > may sometimes be true but doesn't seem like a good bet in general.) > > Your formula is more aggressive when you have 3 children that all use > the same number of workers; it'll always decide on <number of workers > per child>+1, whereas mine won't add the extra worker in that case. > Possibly your formula is better than mine in that case, but I'm not > sure. If you have as many as 9 children that all want N workers, your > formula will decide on N+2 workers, but since my formula guarantees a > minimum of 5 workers in such cases, I'll probably be within 1 of > whatever answer you were getting. > Yeah, that seems to be right in most of the cases. The only cases where your formula seems to give too few workers is for something like : (2, 8, 8). For such subplans, we should at least allocate 8 workers. It turns out that in most of the cases in my formula, the Append workers allocated is just 1 worker more than the max per-subplan worker count. So in (2, 1, 1, 8), it will be a fraction more than 8. So in the patch, in addition to the log2() formula you proposed, I have made sure that it allocates at least equal to max(per-subplan parallel_workers values). > >> BTW, there is going to be some logic change in the choose-next-subplan >> algorithm if we consider giving extra workers to subplans. > > I'm not sure that it's going to be useful to make this logic very > complicated. I think the most important thing is to give 1 worker to > each plan before we give a second worker to any plan. In general I > think it's sufficient to assign a worker that becomes available to the > subplan with the fewest number of workers (or one of them, if there's > a tie) without worrying too much about the target number of workers > for that subplan. In the attached v5 patch, the logic of distributing the workers is now kept simple : it just distributes the workers equally without considering the per-sublan parallel_workers value. I have retained the earlier logic of choosing the plan with minimum current workers. But now that the pa_max_workers is not needed, I removed it, and instead a partial_plans bitmapset is added in the Append node. Once a worker picks up a non-partial subplan, it immediately changes its pa_num_workers to -1. Whereas for partial subplans, the worker sets it to -1 only after it finishes executing it. Effectively, in parallel_append_next(), the check for whether subplan is executing with max parallel_workers is now removed, and all code that was using pa_max_workers is now removed. Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > 10. We should probably move the parallel_safe calculation out of cost_append(). > + path->parallel_safe = path->parallel_safe && > + subpath->parallel_safe; > > 11. This check shouldn't be part of cost_append(). > + /* All child paths must have same parameterization */ > + Assert(bms_equal(PATH_REQ_OUTER(subpath), required_outer)); > Moved out these two statements from cost_append(). Did it separately in create_append_path(). Also, I have removed some elog() statements which were there while inside Spinlock in parallel_append_next(). On 17 January 2017 at 11:10, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > I was looking at the executor portion of this patch and I noticed that in > exec_append_initialize_next(): > > if (appendstate->as_padesc) > return parallel_append_next(appendstate); > > /* > * Not parallel-aware. Fine, just go on to the next subplan in the > * appropriate direction. > */ > if (ScanDirectionIsForward(appendstate->ps.state->es_direction)) > appendstate->as_whichplan++; > else > appendstate->as_whichplan--; > > which seems to mean that executing Append in parallel mode disregards the > scan direction. I am not immediately sure what implications that has, so > I checked what heap scan does when executing in parallel mode, and found > this in heapgettup(): > > else if (backward) > { > /* backward parallel scan not supported */ > Assert(scan->rs_parallel == NULL); > > Perhaps, AppendState.as_padesc would not have been set if scan direction > is backward, because parallel mode would be disabled for the whole query > in that case (PlannerGlobal.parallelModeOK = false). Maybe add an > Assert() similar to one in heapgettup(). > Right. Thanks for noticing this. I have added a similar Assert in exec_append_initialize_next(). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: