Re: [HACKERS] Parallel Append implementation - Mailing list pgsql-hackers

From Amit Khandekar
Subject Re: [HACKERS] Parallel Append implementation
Date
Msg-id CAJ3gD9f_2ZhW9NrZy0xiL2H2YPGZd2jMKjHF_GSDOw14jdOweQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel Append implementation  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Responses Re: [HACKERS] Parallel Append implementation  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Re: [HACKERS] Parallel Append implementation  ("Tels" <nospam-pg-abuse@bloodgate.com>)
List pgsql-hackers
On 10 March 2017 at 12:33, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Fri, Mar 10, 2017 at 11:33 AM, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
>>>
>>> But as far as code is concerned, I think the two-list approach will
>>> turn out to be less simple if we derive corresponding two different
>>> arrays in AppendState node. Handling two different arrays during
>>> execution does not look clean. Whereas, the bitmapset that I have used
>>> in Append has turned out to be very simple. I just had to do the below
>>> check (and that is the only location) to see if it's a partial or
>>> non-partial subplan. There is nowhere else any special handling for
>>> non-partial subpath.
>>>
>>> /*
>>> * Increment worker count for the chosen node, if at all we found one.
>>> * For non-partial plans, set it to -1 instead, so that no other workers
>>> * run it.
>>> */
>>> if (min_whichplan != PA_INVALID_PLAN)
>>> {
>>>    if (bms_is_member(min_whichplan,
>>> ((Append*)state->ps.plan)->partial_subplans_set))
>>>            padesc->pa_info[min_whichplan].pa_num_workers++;
>>>    else
>>>            padesc->pa_info[min_whichplan].pa_num_workers = -1;
>>> }
>>>
>>> Now, since Bitmapset field is used during execution with such
>>> simplicity, why not have this same data structure in AppendPath, and
>>> re-use bitmapset field in Append plan node without making a copy of
>>> it. Otherwise, if we have two lists in AppendPath, and a bitmap in
>>> Append, again there is going to be code for data structure conversion.
>>>
>>
>> I think there is some merit in separating out non-parallel and
>> parallel plans within the same array or outside it. The current logic
>> to assign plan to a worker looks at all the plans, unnecessarily
>> hopping over the un-parallel ones after they are given to a worker. If
>> we separate those two, we can keep assigning new workers to the
>> non-parallel plans first and then iterate over the parallel ones when
>> a worker needs a plan to execute. We might eliminate the need for
>> special value -1 for num workers. You may separate those two kinds in
>> two different arrays or within the same array and remember the
>> smallest index of a parallel plan.

Do you think we might get performance benefit with this ? I am looking
more towards logic simplicity. non-parallel plans would be mostly
likely be there only in case of UNION ALL queries, and not partitioned
tables. And UNION ALL queries probably would have far lesser number of
subplans, there won't be too many unnecessary hops. The need for
num_workers=-1 will still be there for partial plans, because we need
to set it to -1 once a worker finishes a plan.

>
> Further to that, with this scheme and the scheme to distribute workers
> equally irrespective of the maximum workers per plan, you don't need
> to "scan" the subplans to find the one with minimum workers. If you
> treat the array of parallel plans as a circular queue, the plan to be
> assigned next to a worker will always be the plan next to the one
> which got assigned to the given worker.

> Once you have assigned workers
> to non-parallel plans, intialize a shared variable next_plan to point
> to the first parallel plan. When a worker comes asking for a plan,
> assign the plan pointed by next_plan and update it to the next plan in
> the circular queue.

At some point of time, this logic may stop working. Imagine plans are
running with (1, 1, 1). Next worker goes to plan 1, so they run with
(2, 1, 1). So now the next_plan points to plan 2. Now suppose worker
on plan 2 finishes. It should not again take plan 2, even though
next_plan points to 2. It should take plan 3, or whichever is not
finished. May be a worker that finishes a plan should do this check
before directly going to the next_plan. But if this is turning out as
simple as the finding-min-worker-plan, we can use this logic. But will
have to check. We can anyway consider this even when we have a single
list.

>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



pgsql-hackers by date:

Previous
From: Andreas Joseph Krogh
Date:
Subject: Re: [HACKERS] Gather Merge
Next
From: Jim Nasby
Date:
Subject: Re: [HACKERS] Need a builtin way to run all tests faster manner