Re: [HACKERS] Parallel Append implementation - Mailing list pgsql-hackers

From Amit Khandekar
Subject Re: [HACKERS] Parallel Append implementation
Date
Msg-id CAJ3gD9ddT4AgbGvZkvHxmFK2T4aiJzcicS2qZ9XXC8rYmyct4Q@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel Append implementation  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Parallel Append implementation
List pgsql-hackers
On 16 February 2017 at 20:37, Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Feb 16, 2017 at 1:34 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>>> What I was thinking about is something like this:
>>>
>>> 1. First, take the maximum parallel_workers value from among all the children.
>>>
>>> 2. Second, compute log2(num_children)+1 and round up.  So, for 1
>>> child, 1; for 2 children, 2; for 3-4 children, 3; for 5-8 children, 4;
>>> for 9-16 children, 5, and so on.
>>>
>>> 3. Use as the number of parallel workers for the children the maximum
>>> of the value computed in step 1 and the value computed in step 2.
>>
>> Ah, now that I closely look at compute_parallel_worker(), I see what
>> you are getting at.
>>
>> For plain unpartitioned table, parallel_workers is calculated as
>> roughly equal to log(num_pages) (actually it is log3). So if the table
>> size is n, the workers will be log(n). So if it is partitioned into p
>> partitions of size n/p each, still the number of workers should be
>> log(n). Whereas, in the patch, it is calculated as (total of all the
>> child workers) i.e. n * log(n/p) for this case. But log(n) != p *
>> log(x/p). For e.g. log(1000) is much less than log(300) + log(300) +
>> log(300).
>>
>> That means, the way it is calculated in the patch turns out to be much
>> larger than if it were calculated using log(total of sizes of all
>> children). So I think for the step 2 above, log(total_rel_size)
>> formula seems to be appropriate. What do you think ? For
>> compute_parallel_worker(), it is actually log3 by the way.
>>
>> BTW this formula is just an extension of how parallel_workers is
>> calculated for an unpartitioned table.
>
> log(total_rel_size) would be a reasonable way to estimate workers when
> we're scanning an inheritance hierarchy, but I'm hoping Parallel
> Append is also going to apply to UNION ALL queries, where there's no
> concept of the total rel size.
Yes ParallelAppend also gets used in UNION ALL.

> For that we need something else, which
> is why the algorithm that I proposed upthread doesn't rely on it.

The log2(num_children)+1 formula which you proposed does not take into
account the number of workers for each of the subplans, that's why I
am a bit more inclined to look for some other logic. May be, treat the
children as if they belong to partitions, and accordingly calculate
the final number of workers. So for 2 children with 4 and 5 workers
respectively, Append parallel_workers would be : log3(3^4 + 3^5) .

>
>>> The decision to use fewer workers for a smaller scan isn't really
>>> because we think that using more workers will cause a regression.
>>> It's because we think it may not help very much, and because it's not
>>> worth firing up a ton of workers for a relatively small scan given
>>> that workers are a limited resource.  I think once we've got a bunch
>>> of workers started, we might as well try to use them.
>>
>> One possible side-effect I see due to this is : Other sessions might
>> not get a fair share of workers due to this. But again, there might be
>> counter argument that, because Append is now focussing all the workers
>> on a last subplan, it may finish faster, and release *all* of its
>> workers earlier.
>
> Right.  I think in general it's pretty clear that there are possible
> fairness problems with parallel query.  The first process that comes
> along seizes however many workers it thinks it should use, and
> everybody else can use whatever (if anything) is left.  In the long
> run, I think it would be cool to have a system where workers can leave
> one parallel query in progress and join a different one (or exit and
> spawn a new worker to join a different one), automatically rebalancing
> as the number of parallel queries in flight fluctuates.  But that's
> clearly way beyond anything we can do right now.  I think we should
> assume that any parallel workers our process has obtained are ours to
> use for the duration of the query, and use them as best we can.

> Note that even if the Parallel Append tells one of the workers that there
> are no more tuples and it should go away, some higher level of the
> query plan could make a different choice anyway; there might be
> another Append elsewhere in the plan tree.
Yeah, that looks good enough to justify not losing the workers

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



pgsql-hackers by date:

Previous
From: Alexander Korotkov
Date:
Subject: Re: [HACKERS] Should we cacheline align PGXACT?
Next
From: Ashutosh Sharma
Date:
Subject: Re: [HACKERS] Should we cacheline align PGXACT?