Re: [HACKERS] Parallel Append implementation - Mailing list pgsql-hackers
From | Amit Khandekar |
---|---|
Subject | Re: [HACKERS] Parallel Append implementation |
Date | |
Msg-id | CAJ3gD9f5nXMDGdZiT_ij0v+Y6X3R=ceJ0vCFqu-_=huy1R2ZQg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Parallel Append implementation (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Parallel Append implementation
Re: [HACKERS] Parallel Append implementation |
List | pgsql-hackers |
On 14 February 2017 at 22:35, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Feb 6, 2017 at 12:36 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote: >>> Now that I think of that, I think for implementing above, we need to >>> keep track of per-subplan max_workers in the Append path; and with >>> that, the bitmap will be redundant. Instead, it can be replaced with >>> max_workers. Let me check if it is easy to do that. We don't want to >>> have the bitmap if we are sure it would be replaced by some other data >>> structure. >> >> Attached is v2 patch, which implements above. Now Append plan node >> stores a list of per-subplan max worker count, rather than the >> Bitmapset. But still Bitmapset turned out to be necessary for >> AppendPath. More details are in the subsequent comments. > > Keep in mind that, for a non-partial path, the cap of 1 worker for > that subplan is a hard limit. Anything more will break the world. > But for a partial plan, the limit -- whether 1 or otherwise -- is a > soft limit. It may not help much to route more workers to that node, > and conceivably it could even hurt, but it shouldn't yield any > incorrect result. I'm not sure it's a good idea to conflate those two > things. Yes, the logic that I used in the patch assumes that "Path->parallel_workers field not only suggests how many workers to allocate, but also prevents allocation of too many workers for that path". For seqscan path, this field is calculated based on the relation pages count. I believe the theory is that, too many workers might even slow down the parallel scan. And the same theory would be applied for calculating other types of low-level paths like index scan. The only reason I combined the soft limit and the hard limit is because it is not necessary to have two different fields. But of course this is again under the assumption that allocating more than parallel_workers would never improve the speed, in fact it can even slow it down. Do we have such a case currently where the actual number of workers launched turns out to be *more* than Path->parallel_workers ? > For example, suppose that I have a scan of two children, one > of which has parallel_workers of 4, and the other of which has > parallel_workers of 3. If I pick parallel_workers of 7 for the > Parallel Append, that's probably too high. Had those two tables been > a single unpartitioned table, I would have picked 4 or 5 workers, not > 7. On the other hand, if I pick parallel_workers of 4 or 5 for the > Parallel Append, and I finish with the larger table first, I think I > might as well throw all 4 of those workers at the smaller table even > though it would normally have only used 3 workers. > Having the extra 1-2 workers exit does not seem better. It is here, where I didn't understand exactly why would we want to assign these extra workers to a subplan which tells use that it is already being run by 'parallel_workers' number of workers. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: