Re: [HACKERS] Parallel Append implementation - Mailing list pgsql-hackers

From Amit Khandekar
Subject Re: [HACKERS] Parallel Append implementation
Date
Msg-id CAJ3gD9eRn6MJkJYUmAbvm62b1fVyUARhv9EiXiAjVRqBvHuKFw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel Append implementation  ("Tels" <nospam-pg-abuse@bloodgate.com>)
List pgsql-hackers
On 12 March 2017 at 19:31, Tels <nospam-pg-abuse@bloodgate.com> wrote:
> Moin,
>
> On Sat, March 11, 2017 11:29 pm, Robert Haas wrote:
>> On Fri, Mar 10, 2017 at 6:01 AM, Tels <nospam-pg-abuse@bloodgate.com>
>> wrote:
>>> Just a question for me to understand the implementation details vs. the
>>> strategy:
>>>
>>> Have you considered how the scheduling decision might impact performance
>>> due to "inter-plan parallelism vs. in-plan parallelism"?
>>>
>>> So what would be the scheduling strategy? And should there be a fixed
>>> one
>>> or user-influencable? And what could be good ones?
>>>
>>> A simple example:
>>>
>>> E.g. if we have 5 subplans, and each can have at most 5 workers and we
>>> have 5 workers overall.
>>>
>>> So, do we:
>>>
>>>   Assign 5 workers to plan 1. Let it finish.
>>>   Then assign 5 workers to plan 2. Let it finish.
>>>   and so on
>>>
>>> or:
>>>
>>>   Assign 1 workers to each plan until no workers are left?
>>
>> Currently, we do the first of those, but I'm pretty sure the second is
>> way better.  For example, suppose each subplan has a startup cost.  If
>> you have all the workers pile on each plan in turn, every worker pays
>> the startup cost for every subplan.  If you spread them out, then
>> subplans can get finished without being visited by all workers, and
>> then the other workers never pay those costs.  Moreover, you reduce
>> contention for spinlocks, condition variables, etc.  It's not
>> impossible to imagine a scenario where having all workers pile on one
>> subplan at a time works out better: for example, suppose you have a
>> table with lots of partitions all of which are on the same disk, and
>> it's actually one physical spinning disk, not an SSD or a disk array
>> or anything, and the query is completely I/O-bound.  Well, it could
>> be, in that scenario, that spreading out the workers is going to turn
>> sequential I/O into random I/O and that might be terrible.  In most
>> cases, though, I think you're going to be better off.  If the
>> partitions are on different spindles or if there's some slack I/O
>> capacity for prefetching, you're going to come out ahead, maybe way
>> ahead.  If you come out behind, then you're evidently totally I/O
>> bound and have no capacity for I/O parallelism; in that scenario, you
>> should probably just turn parallel query off altogether, because
>> you're not going to benefit from it.
>
> I agree with the proposition that both strategies can work well, or not,
> depending on system-setup, the tables and data layout. I'd be a bit more
> worried about turning it into the "random-io-case", but that's still just
> a feeling and guesswork.
>
> So which one will be better seems speculative, hence the question for
> benchmarking different strategies.
>
> So, I'd like to see the scheduler be out in a single place, maybe a
> function that get's called with the number of currently running workers,
> the max. number of workers to be expected, the new worker, the list of
> plans still todo, and then schedules that single worker to one of these
> plans by strategy X.
>
> That would make it easier to swap out X for Y and see how it fares,
> wouldn't it?

Yes, actually pretty much the scheduler logic is all in one single
function parallel_append_next().

>
>
> However, I don't think the patch needs to select the optimal strategy
> right from the beginning (if that even exists, maybe it's a mixed
> strategy), even "not so optimal" parallelism will be better than doing all
> things sequentially.
>
> Best regards,
>
> Tels



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



pgsql-hackers by date:

Previous
From: Amit Khandekar
Date:
Subject: Re: [HACKERS] Parallel Append implementation
Next
From: ilmari@ilmari.org (Dagfinn Ilmari Mannsåker)
Date:
Subject: Re: [HACKERS] make check-world output