Re: Parallel Append implementation - Mailing list pgsql-hackers

From Amit Khandekar
Subject Re: Parallel Append implementation
Date
Msg-id CAJ3gD9crnBW=apd7n=RynX08EzrLSnyzgfAordEuHHufDfTKhA@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel Append implementation  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Parallel Append implementation  (Amit Khandekar <amitdkhan.pg@gmail.com>)
List pgsql-hackers
Thanks Andres for your review comments. Will get back with the other
comments, but meanwhile some queries about the below particular
comment ...

On 4 April 2017 at 10:17, Andres Freund <andres@anarazel.de> wrote:
> On 2017-04-03 22:13:18 -0400, Robert Haas wrote:
>> On Mon, Apr 3, 2017 at 4:17 PM, Andres Freund <andres@anarazel.de> wrote:
>> > Hm.  I'm not really convinced by the logic here.  Wouldn't it be better
>> > to try to compute the minimum total cost across all workers for
>> > 1..#max_workers for the plans in an iterative manner?  I.e. try to map
>> > each of the subplans to 1 (if non-partial) or N workers (partial) using
>> > some fitting algorith (e.g. always choosing the worker(s) that currently
>> > have the least work assigned).  I think the current algorithm doesn't
>> > lead to useful #workers for e.g. cases with a lot of non-partial,
>> > high-startup plans - imo a quite reasonable scenario.

I think I might have not understood this part exactly. Are you saying
we need to consider per-subplan parallel_workers to calculate total
number of workers for Append ? I also didn't get about non-partial
subplans. Can you please explain how many workers you think should be
expected with , say , 7 subplans out of which 3 are non-partial
subplans ?

>>
>> Well, that'd be totally unlike what we do in any other case.  We only
>> generate a Parallel Seq Scan plan for a given table with one # of
>> workers, and we cost it based on that.  We have no way to re-cost it
>> if we changed our mind later about how many workers to use.
>> Eventually, we should probably have something like what you're
>> describing here, but in general, not just for this specific case.  One
>> problem, of course, is to avoid having a larger number of workers
>> always look better than a smaller number, which with the current
>> costing model would probably happen a lot.
>
> I don't think the parallel seqscan is comparable in complexity with the
> parallel append case.  Each worker there does the same kind of work, and
> if one of them is behind, it'll just do less.  But correct sizing will
> be more important with parallel-append, because with non-partial
> subplans the work is absolutely *not* uniform.
>
> Greetings,
>
> Andres Freund



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



pgsql-hackers by date:

Previous
From: "Tsunakawa, Takayuki"
Date:
Subject: Re: Statement timeout behavior in extended queries
Next
From: Tatsuo Ishii
Date:
Subject: Re: Statement timeout behavior in extended queries