Re: Parallel Append implementation - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Parallel Append implementation
Date
Msg-id 20170406020323.ef6tyffrg6lzdpvw@alap3.anarazel.de
Whole thread Raw
In response to Re: Parallel Append implementation  (Amit Khandekar <amitdkhan.pg@gmail.com>)
List pgsql-hackers
On 2017-04-05 14:52:38 +0530, Amit Khandekar wrote:
> This is what the earlier versions of my patch had done : just add up
> per-subplan parallel_workers (1 for non-partial subplan and
> subpath->parallel_workers for partial subplans) and set this total as
> the Append parallel_workers.

I don't think that's great, consider e.g. the case that you have one
very expensive query, and a bunch of cheaper ones. Most of those workers
wouldn't do much while waiting for the the expensive query.  What I'm
basically thinking we should do is something like the following
pythonesque pseudocode:

best_nonpartial_cost = -1
best_nonpartial_nworkers = -1

for numworkers in 1...#max workers:  worker_work = [0 for x in range(0, numworkers)]
  nonpartial_cost += startup_cost * numworkers
  # distribute all nonpartial tasks over workers.  Assign tasks to the  # worker with the least amount of work already
performed. for task in all_nonpartial_subqueries:      least_busy_worker = worker_work.smallest()
least_busy_worker+= task.total_nonpartial_cost
 
  # the nonpartial cost here is the largest amount any single worker  # has to perform.  nonpartial_cost +=
worker_work.largest()
  total_partial_cost = 0  for task in all_partial_subqueries:      total_partial_cost += task.total_nonpartial_cost
  # Compute resources needed by partial tasks. First compute how much  # cost we can distribute to workers that take
shorterthan the  # "busiest" worker doing non-partial tasks.  remaining_avail_work = 0  for i in range(0, numworkers):
   remaining_avail_work += worker_work.largest() - worker_work[i]
 
  # Equally divide up remaining work over all workers  if remaining_avail_work < total_partial_cost:
nonpartial_cost+= (worker_work.largest - remaining_avail_work) / numworkers
 
  # check if this is the best number of workers  if best_nonpartial_cost == -1 or best_nonpartial_cost >
nonpartial_cost:    best_nonpartial_cost = worker_work.largest     best_nonpartial_nworkers = nworkers
 

Does that make sense?


> BTW all of the above points apply only for non-partial plans.

Indeed. But I think that's going to be a pretty common type of plan,
especially if we get partitionwise joins.


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Re: Query fails when SRFs are part of FROM clause(Commit id: 69f4b9c85f)
Next
From: Masahiko Sawada
Date:
Subject: Interval for launching the table sync worker