Re: [HACKERS] Parallel Append implementation - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [HACKERS] Parallel Append implementation
Date
Msg-id CAA4eK1LOkD5STe4DXCoDpBBndoqgmhBUe41AYuBLh0zH=U9Keg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel Append implementation  (Amit Khandekar <amitdkhan.pg@gmail.com>)
Responses Re: [HACKERS] Parallel Append implementation  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Wed, Sep 20, 2017 at 10:59 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> On 16 September 2017 at 10:42, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Thu, Sep 14, 2017 at 9:41 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> On Mon, Sep 11, 2017 at 9:25 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>> I think the patch stores only non-partial paths in decreasing order,
>>>> what if partial paths having more costs follows those paths?
>>>
>>> The general picture here is that we don't want the leader to get stuck
>>> inside some long-running operation because then it won't be available
>>> to read tuples from the workers.  On the other hand, we don't want to
>>> just have the leader do no work because that might be slow.  And in
>>> most cast cases, the leader will be the first participant to arrive at
>>> the Append node, because of the worker startup time.  So the idea is
>>> that the workers should pick expensive things first, and the leader
>>> should pick cheap things first.
>>>
>>
>> At a broader level, the idea is good, but I think it won't turn out
>> exactly like that considering your below paragraph which indicates
>> that it is okay if leader picks a partial path that is costly among
>> other partial paths as a leader won't be locked with that.
>> Considering this is a good design for parallel append, the question is
>> do we really need worker and leader to follow separate strategy for
>> choosing next path.  I think the patch will be simpler if we can come
>> up with a way for the worker and leader to use the same strategy to
>> pick next path to process.  How about we arrange the list of paths
>> such that first, all partial paths will be there and then non-partial
>> paths and probably both in decreasing order of cost.  Now, both leader
>> and worker can start from the beginning of the list. In most cases,
>> the leader will start at the first partial path and will only ever
>> need to scan non-partial path if there is no other partial path left.
>> This is not bulletproof as it is possible that some worker starts
>> before leader in which case leader might scan non-partial path before
>> all partial paths are finished, but I think we can avoid that as well
>> if we are too worried about such cases.
>
> If there are no partial subpaths, then again the leader is likely to
> take up the expensive subpath.
>

I think in general the non-partial paths should be cheaper as compared
to partial paths as that is the reason planner choose not to make a
partial plan at first place. I think the idea patch is using will help
because the leader will choose to execute partial path in most cases
(when there is a mix of partial and non-partial paths) and for that
case, the leader is not bound to complete the execution of that path.
However, if all the paths are non-partial, then I am not sure much
difference it would be to choose one path over other.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files
Next
From: Sokolov Yura
Date:
Subject: Re: [HACKERS] Walsender timeouts and large transactions