Re: [HACKERS] Increasing parallel workers at runtime - Mailing list pgsql-hackers

From Haribabu Kommi
Subject Re: [HACKERS] Increasing parallel workers at runtime
Date
Msg-id CAJrrPGeaTu_WmRdQ3_fXCaWrbuRC5yYVBEoVEUCRWPfPCyd-mw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Increasing parallel workers at runtime  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Increasing parallel workers at runtime  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


On Tue, May 16, 2017 at 1:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, May 15, 2017 at 10:06 AM, Haribabu Kommi
<kommi.haribabu@gmail.com> wrote:
> This still needs some adjustments to fix for the cases where
> the main backend also does the scan instead of waiting for
> the workers to finish the job. As increasing the workers logic
> shouldn't add an overhead in this case.

I think it would be pretty crazy to try relaunching workers after
every tuple, as this patch does.  The overhead of that will be very
high for queries where the number of tuples passing through the Gather
is large, whereas when the number of tuples passing through Gather is
small, or where tuples are sent all at once at the end of procesisng,
it will not actually be very effective at getting hold of more
workers.

In the current state of the patch, the main backend tries to start the
extra workers only when there is no tuples that are available from the
available workers. I feel that the invocation for more workers doesn't
do for every tuple.

1. When there are large number of tuples are getting transferred from workers,
I feel there is very less chance that backend is free that it can start more workers
as because the backend itself may not need to execute the plan locally.

2. When there are tuples that are transferred at the end of the plan for the cases
something it involves a sort node or has aggregate or etc, either the backend is
waiting for the tuples to arrive or by itself doing the plan execution along with
workers after trying of extending the number workers once.

3. When there are small number of tuples that are getting transferred, in this
case there are chance of extra workers invocation more time compare to the
other scenarios, but still in this case also, The less number of tuples transfer
is may be because of a complex filter condition that is taking time and also it
is filtering more records. So in this case also, once the backend tried to extend
the number of workers, after that it also participate in executing the plan, then
the backend also takes time to get a tuple by executing the plan locally. By that
time there are more chances of that workers are already ready with tuples.

The problem of invoking for more number of workers is possible when there is
only one worker that is allotted to the query execution.

Am I missing?
 
  A different idea is to have an area in shared memory where
queries can advertise that they didn't get all of the workers they
wanted, plus a background process that periodically tries to launch
workers to help those queries as parallel workers become available.
It can recheck for available workers after some interval, say 10s.
There are some problems there -- the process won't have bgw_notify_pid
pointing at the parallel leader -- but I think it might be best to try
to solve those problems instead of making it the leader's job to try
to grab more workers as we go along.  For one thing, the background
process idea can attempt to achieve fairness.  Suppose there are two
processes that didn't get all of their workers; one got 3 of 4, the
other 1 of 4.  When a worker becomes available, we'd presumably like
to give it to the process that got 1 of 4, rather than having the
leaders race to see who grabs the new worker first.  Similarly if
there are four workers available and two queries that each got 1 of 5
workers they wanted, we'd like to split the workers two and two,
rather than having one leader grab all four of them.  Or at least, I
think that's what we want.

Another background process logic can produce a fair distribution of
workers to the parallel queries. In this case also, the backend should
advertise only when the allotted workers are not enough, this is because
there may be a case where the planned workers may be 5, but because
of other part of the query, the main backend is feed by the tuples just by
2 workers, then there is no need to provide extra workers.

The another background process approach of wait interval to reassign
more workers after an interval period doesn't work for the queries that
are getting finished before the configured time of the wait. May be we
can ignore those scenarios?

Needs some smarter logic to share the required details to start the worker
as it is started by the main backend itself. But this approach is useful for
the cases where the query doesn't get any workers I feel.

Regards,
Hari Babu
Fujitsu Australia

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: [HACKERS] NOT NULL constraints on range partition key columns
Next
From: Etsuro Fujita
Date:
Subject: Re: [HACKERS] Bug in ExecModifyTable function and trigger issues forforeign tables