Re: [HACKERS] Parallel Append implementation - Mailing list pgsql-hackers
From | Amit Khandekar |
---|---|
Subject | Re: [HACKERS] Parallel Append implementation |
Date | |
Msg-id | CAJ3gD9ddT4AgbGvZkvHxmFK2T4aiJzcicS2qZ9XXC8rYmyct4Q@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Parallel Append implementation (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Parallel Append implementation
|
List | pgsql-hackers |
On 16 February 2017 at 20:37, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Feb 16, 2017 at 1:34 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote: >>> What I was thinking about is something like this: >>> >>> 1. First, take the maximum parallel_workers value from among all the children. >>> >>> 2. Second, compute log2(num_children)+1 and round up. So, for 1 >>> child, 1; for 2 children, 2; for 3-4 children, 3; for 5-8 children, 4; >>> for 9-16 children, 5, and so on. >>> >>> 3. Use as the number of parallel workers for the children the maximum >>> of the value computed in step 1 and the value computed in step 2. >> >> Ah, now that I closely look at compute_parallel_worker(), I see what >> you are getting at. >> >> For plain unpartitioned table, parallel_workers is calculated as >> roughly equal to log(num_pages) (actually it is log3). So if the table >> size is n, the workers will be log(n). So if it is partitioned into p >> partitions of size n/p each, still the number of workers should be >> log(n). Whereas, in the patch, it is calculated as (total of all the >> child workers) i.e. n * log(n/p) for this case. But log(n) != p * >> log(x/p). For e.g. log(1000) is much less than log(300) + log(300) + >> log(300). >> >> That means, the way it is calculated in the patch turns out to be much >> larger than if it were calculated using log(total of sizes of all >> children). So I think for the step 2 above, log(total_rel_size) >> formula seems to be appropriate. What do you think ? For >> compute_parallel_worker(), it is actually log3 by the way. >> >> BTW this formula is just an extension of how parallel_workers is >> calculated for an unpartitioned table. > > log(total_rel_size) would be a reasonable way to estimate workers when > we're scanning an inheritance hierarchy, but I'm hoping Parallel > Append is also going to apply to UNION ALL queries, where there's no > concept of the total rel size. Yes ParallelAppend also gets used in UNION ALL. > For that we need something else, which > is why the algorithm that I proposed upthread doesn't rely on it. The log2(num_children)+1 formula which you proposed does not take into account the number of workers for each of the subplans, that's why I am a bit more inclined to look for some other logic. May be, treat the children as if they belong to partitions, and accordingly calculate the final number of workers. So for 2 children with 4 and 5 workers respectively, Append parallel_workers would be : log3(3^4 + 3^5) . > >>> The decision to use fewer workers for a smaller scan isn't really >>> because we think that using more workers will cause a regression. >>> It's because we think it may not help very much, and because it's not >>> worth firing up a ton of workers for a relatively small scan given >>> that workers are a limited resource. I think once we've got a bunch >>> of workers started, we might as well try to use them. >> >> One possible side-effect I see due to this is : Other sessions might >> not get a fair share of workers due to this. But again, there might be >> counter argument that, because Append is now focussing all the workers >> on a last subplan, it may finish faster, and release *all* of its >> workers earlier. > > Right. I think in general it's pretty clear that there are possible > fairness problems with parallel query. The first process that comes > along seizes however many workers it thinks it should use, and > everybody else can use whatever (if anything) is left. In the long > run, I think it would be cool to have a system where workers can leave > one parallel query in progress and join a different one (or exit and > spawn a new worker to join a different one), automatically rebalancing > as the number of parallel queries in flight fluctuates. But that's > clearly way beyond anything we can do right now. I think we should > assume that any parallel workers our process has obtained are ours to > use for the duration of the query, and use them as best we can. > Note that even if the Parallel Append tells one of the workers that there > are no more tuples and it should go away, some higher level of the > query plan could make a different choice anyway; there might be > another Append elsewhere in the plan tree. Yeah, that looks good enough to justify not losing the workers -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: