Re: [HACKERS] Parallel Append implementation - Mailing list pgsql-hackers

From Amit Khandekar
Subject Re: [HACKERS] Parallel Append implementation
Date
Msg-id CAJ3gD9f5nXMDGdZiT_ij0v+Y6X3R=ceJ0vCFqu-_=huy1R2ZQg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel Append implementation  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Parallel Append implementation
Re: [HACKERS] Parallel Append implementation
List pgsql-hackers
On 14 February 2017 at 22:35, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Feb 6, 2017 at 12:36 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
>>> Now that I think of that, I think for implementing above, we need to
>>> keep track of per-subplan max_workers in the Append path; and with
>>> that, the bitmap will be redundant. Instead, it can be replaced with
>>> max_workers. Let me check if it is easy to do that. We don't want to
>>> have the bitmap if we are sure it would be replaced by some other data
>>> structure.
>>
>> Attached is v2 patch, which implements above. Now Append plan node
>> stores a list of per-subplan max worker count, rather than the
>> Bitmapset. But still Bitmapset turned out to be necessary for
>> AppendPath. More details are in the subsequent comments.
>
> Keep in mind that, for a non-partial path, the cap of 1 worker for
> that subplan is a hard limit.  Anything more will break the world.
> But for a partial plan, the limit -- whether 1 or otherwise -- is a
> soft limit.  It may not help much to route more workers to that node,
> and conceivably it could even hurt, but it shouldn't yield any
> incorrect result.  I'm not sure it's a good idea to conflate those two
> things.

Yes, the logic that I used in the patch assumes that
"Path->parallel_workers field not only suggests how many workers to
allocate, but also prevents allocation of too many workers for that
path". For seqscan path, this field is calculated based on the
relation pages count. I believe the theory is that, too many workers
might even slow down the parallel scan. And the same theory would be
applied for calculating other types of low-level paths like index
scan.

The only reason I combined the soft limit and the hard limit is
because it is not necessary to have two different fields. But of
course this is again under the assumption that allocating more than
parallel_workers would never improve the speed, in fact it can even
slow it down.

Do we have such a case currently where the actual number of workers
launched turns out to be *more* than Path->parallel_workers ?

> For example, suppose that I have a scan of two children, one
> of which has parallel_workers of 4, and the other of which has
> parallel_workers of 3.  If I pick parallel_workers of 7 for the
> Parallel Append, that's probably too high.  Had those two tables been
> a single unpartitioned table, I would have picked 4 or 5 workers, not
> 7.  On the other hand, if I pick parallel_workers of 4 or 5 for the
> Parallel Append, and I finish with the larger table first, I think I
> might as well throw all 4 of those workers at the smaller table even
> though it would normally have only used 3 workers.

> Having the extra 1-2 workers exit does not seem better.

It is here, where I didn't understand exactly why would we want to
assign these extra workers to a subplan which tells use that it is
already being run by 'parallel_workers' number of workers.


>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company



-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Partitioned tables and relfilenode
Next
From: Amit Khandekar
Date:
Subject: Re: [HACKERS] UPDATE of partition key