Re: [HACKERS] Enabling parallelism for queries coming from SQL orother PL functions - Mailing list pgsql-hackers

From Rafia Sabih
Subject Re: [HACKERS] Enabling parallelism for queries coming from SQL orother PL functions
Date
Msg-id CAOGQiiN4dLZOkrjP2Pta6Kw0wE0jYZCFZ5nsXWbRWoUX5gjyPw@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Enabling parallelism for queries coming from SQL orother PL functions  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Enabling parallelism for queries coming from SQL orother PL functions  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers


On Sun, Feb 26, 2017 at 7:09 PM, Robert Haas <robertmhaas@gmail.com> wrote:
I think I see the problem that you're trying to solve, but I agree
that this doesn't seem all that elegant.  The reason why we have that
numberTuples check is because we're afraid that we might be in a
context like the extended-query protocol, where the caller can ask for
1 tuple, and then later ask for another tuple.  That won't work,
because once we shut down the workers we can't reliably generate the
rest of the query results.  However, I think it would probably work
fine to let somebody ask for less than the full number of tuples if
it's certain that they won't later ask for any more.

So maybe what we ought to do is allow CURSOR_OPT_PARALLEL_OK to be set
any time we know that ExecutorRun() will be called for the QueryDesc
at most once rather than (as at present) only where we know it will be
executed only once with a tuple-count of zero.  Then we could change
things in ExecutePlan so that it doesn't disable parallel query when
the tuple-count is non-zero, but does take an extra argument "bool
execute_only_once", and it disables parallel execution if that is not
true.  Also, if ExecutorRun() is called a second time for the same
QueryDesc when execute_only_once is specified as true, it should
elog(ERROR, ...).  Then exec_execute_message(), for example, can pass
that argument as false when the tuple-count is non-zero, but other
places that are going to fetch a limited number of rows could pass it
as true even though they also pass a row-count.

I'm not sure if that's exactly right, but something along those lines
seems like it should work.

IIUC, this needs an additional bool execute_once in the queryDesc which is set to true in standard_ExecutorRun when the query is detected to be coming from PL function or provided count is zero i.e. execute till the end, in case execute_once is already true then report the error.

I think that a final patch for this functionality should involve
adding CURSOR_OPT_PARALLEL_OK to appropriate places in each PL, plus
maybe some infrastructure changes like the ones mentioned above.
Maybe it can be divided into two patches, one to make the
infrastructure changes and a second to add CURSOR_OPT_PARALLEL_OK to
more places.
 
I have split the patch into two, one is to allow optimiser to select a parallel plan for queries in PL functions (pl_parallel_opt_support_v1.patch), wherein CURSOR_OPT_PARALLEL_OK is passed at required places.

Next, the patch for allowing execution of such queries in parallel mode, that involves infrastructural changes along the lines mentioned upthread (pl_parallel_exec_support_v1.patch). 

--
Regards,
Rafia Sabih
Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [HACKERS] SCRAM authentication, take three
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Proposal : Parallel Merge Join