Re: [HACKERS] Parallel Append implementation - Mailing list pgsql-hackers

From Tels
Subject Re: [HACKERS] Parallel Append implementation
Date
Msg-id 6d7a3a24b7389b306a79b7b2f372f78f.squirrel@sm.webmail.pair.com
Whole thread Raw
In response to Re: [HACKERS] Parallel Append implementation  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Parallel Append implementation  (Amit Khandekar <amitdkhan.pg@gmail.com>)
List pgsql-hackers
Moin,

On Sat, March 11, 2017 11:29 pm, Robert Haas wrote:
> On Fri, Mar 10, 2017 at 6:01 AM, Tels <nospam-pg-abuse@bloodgate.com>
> wrote:
>> Just a question for me to understand the implementation details vs. the
>> strategy:
>>
>> Have you considered how the scheduling decision might impact performance
>> due to "inter-plan parallelism vs. in-plan parallelism"?
>>
>> So what would be the scheduling strategy? And should there be a fixed
>> one
>> or user-influencable? And what could be good ones?
>>
>> A simple example:
>>
>> E.g. if we have 5 subplans, and each can have at most 5 workers and we
>> have 5 workers overall.
>>
>> So, do we:
>>
>>   Assign 5 workers to plan 1. Let it finish.
>>   Then assign 5 workers to plan 2. Let it finish.
>>   and so on
>>
>> or:
>>
>>   Assign 1 workers to each plan until no workers are left?
>
> Currently, we do the first of those, but I'm pretty sure the second is
> way better.  For example, suppose each subplan has a startup cost.  If
> you have all the workers pile on each plan in turn, every worker pays
> the startup cost for every subplan.  If you spread them out, then
> subplans can get finished without being visited by all workers, and
> then the other workers never pay those costs.  Moreover, you reduce
> contention for spinlocks, condition variables, etc.  It's not
> impossible to imagine a scenario where having all workers pile on one
> subplan at a time works out better: for example, suppose you have a
> table with lots of partitions all of which are on the same disk, and
> it's actually one physical spinning disk, not an SSD or a disk array
> or anything, and the query is completely I/O-bound.  Well, it could
> be, in that scenario, that spreading out the workers is going to turn
> sequential I/O into random I/O and that might be terrible.  In most
> cases, though, I think you're going to be better off.  If the
> partitions are on different spindles or if there's some slack I/O
> capacity for prefetching, you're going to come out ahead, maybe way
> ahead.  If you come out behind, then you're evidently totally I/O
> bound and have no capacity for I/O parallelism; in that scenario, you
> should probably just turn parallel query off altogether, because
> you're not going to benefit from it.

I agree with the proposition that both strategies can work well, or not,
depending on system-setup, the tables and data layout. I'd be a bit more
worried about turning it into the "random-io-case", but that's still just
a feeling and guesswork.

So which one will be better seems speculative, hence the question for
benchmarking different strategies.

So, I'd like to see the scheduler be out in a single place, maybe a
function that get's called with the number of currently running workers,
the max. number of workers to be expected, the new worker, the list of
plans still todo, and then schedules that single worker to one of these
plans by strategy X.

That would make it easier to swap out X for Y and see how it fares,
wouldn't it?


However, I don't think the patch needs to select the optimal strategy
right from the beginning (if that even exists, maybe it's a mixed
strategy), even "not so optimal" parallelism will be better than doing all
things sequentially.

Best regards,

Tels



pgsql-hackers by date:

Previous
From: Jinhua Luo
Date:
Subject: Re: [HACKERS] issue about the streaming replication
Next
From: Jan Michálek
Date:
Subject: Re: [HACKERS] Other formats in pset like markdown, rst, mediawiki