Re: [HACKERS] Parallel Append implementation - Mailing list pgsql-hackers
From | Amit Khandekar |
---|---|
Subject | Re: [HACKERS] Parallel Append implementation |
Date | |
Msg-id | CAJ3gD9eRn6MJkJYUmAbvm62b1fVyUARhv9EiXiAjVRqBvHuKFw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Parallel Append implementation ("Tels" <nospam-pg-abuse@bloodgate.com>) |
List | pgsql-hackers |
On 12 March 2017 at 19:31, Tels <nospam-pg-abuse@bloodgate.com> wrote: > Moin, > > On Sat, March 11, 2017 11:29 pm, Robert Haas wrote: >> On Fri, Mar 10, 2017 at 6:01 AM, Tels <nospam-pg-abuse@bloodgate.com> >> wrote: >>> Just a question for me to understand the implementation details vs. the >>> strategy: >>> >>> Have you considered how the scheduling decision might impact performance >>> due to "inter-plan parallelism vs. in-plan parallelism"? >>> >>> So what would be the scheduling strategy? And should there be a fixed >>> one >>> or user-influencable? And what could be good ones? >>> >>> A simple example: >>> >>> E.g. if we have 5 subplans, and each can have at most 5 workers and we >>> have 5 workers overall. >>> >>> So, do we: >>> >>> Assign 5 workers to plan 1. Let it finish. >>> Then assign 5 workers to plan 2. Let it finish. >>> and so on >>> >>> or: >>> >>> Assign 1 workers to each plan until no workers are left? >> >> Currently, we do the first of those, but I'm pretty sure the second is >> way better. For example, suppose each subplan has a startup cost. If >> you have all the workers pile on each plan in turn, every worker pays >> the startup cost for every subplan. If you spread them out, then >> subplans can get finished without being visited by all workers, and >> then the other workers never pay those costs. Moreover, you reduce >> contention for spinlocks, condition variables, etc. It's not >> impossible to imagine a scenario where having all workers pile on one >> subplan at a time works out better: for example, suppose you have a >> table with lots of partitions all of which are on the same disk, and >> it's actually one physical spinning disk, not an SSD or a disk array >> or anything, and the query is completely I/O-bound. Well, it could >> be, in that scenario, that spreading out the workers is going to turn >> sequential I/O into random I/O and that might be terrible. In most >> cases, though, I think you're going to be better off. If the >> partitions are on different spindles or if there's some slack I/O >> capacity for prefetching, you're going to come out ahead, maybe way >> ahead. If you come out behind, then you're evidently totally I/O >> bound and have no capacity for I/O parallelism; in that scenario, you >> should probably just turn parallel query off altogether, because >> you're not going to benefit from it. > > I agree with the proposition that both strategies can work well, or not, > depending on system-setup, the tables and data layout. I'd be a bit more > worried about turning it into the "random-io-case", but that's still just > a feeling and guesswork. > > So which one will be better seems speculative, hence the question for > benchmarking different strategies. > > So, I'd like to see the scheduler be out in a single place, maybe a > function that get's called with the number of currently running workers, > the max. number of workers to be expected, the new worker, the list of > plans still todo, and then schedules that single worker to one of these > plans by strategy X. > > That would make it easier to swap out X for Y and see how it fares, > wouldn't it? Yes, actually pretty much the scheduler logic is all in one single function parallel_append_next(). > > > However, I don't think the patch needs to select the optimal strategy > right from the beginning (if that even exists, maybe it's a mixed > strategy), even "not so optimal" parallelism will be better than doing all > things sequentially. > > Best regards, > > Tels -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: