Re: [HACKERS] Enabling parallelism for queries coming from SQL orother PL functions - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [HACKERS] Enabling parallelism for queries coming from SQL orother PL functions
Date
Msg-id CAA4eK1KHhRwBEgtrgGcTjO+VENYTk_u2UPsPMjfT7KY3in7L2A@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Enabling parallelism for queries coming from SQL orother PL functions  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: [HACKERS] Enabling parallelism for queries coming from SQL orother PL functions  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Sun, Feb 26, 2017 at 4:14 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Feb 26, 2017 at 6:34 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>> On Sat, Feb 25, 2017 at 9:47 PM, Dilip Kumar <dilipbalaut@gmail.com> wrote:
>>> On Sat, Feb 25, 2017 at 5:12 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>> Sure, but that should only happen if the function is *not* declared as
>>>> parallel safe (aka in parallel safe functions, we should not generate
>>>> parallel plans).
>>>
>>> So basically we want to put a restriction that parallel-safe function
>>> can not use the parallel query? This will work but it seems too
>>> restrictive to me. Because by marking function parallel safe we enable
>>> it to be used with the outer parallel query that is fine. But, that
>>> should not restrict the function from using the parallel query if it's
>>> used with the other outer query which is not having the parallel
>>> plan(or function is executed directly).
>>
>> I think if the user is explicitly marking a function as parallel-safe,
>> then it doesn't make much sense to allow parallel query in such
>> functions as it won't be feasible for the planner (or at least it will
>> be quite expensive) to detect the same.  By the way, if the user has
>> any such expectation from a function, then he can mark the function as
>> parallel-restricted or parallel-unsafe.
>
> However, if a query is parallel-safe, it might not end up getting run
> in parallel.  In that case, it could still benefit from parallelism
> internally.  I think we want to allow that.  For example, suppose you
> run a query like:
>
> SELECT x, sum(somewhat_expensive_function(y)) FROM sometab GROUP BY 1;
>
> If sometab isn't very big, it's probably better to use a non-parallel
> plan for this query, because then somewhat_expensive_function() can
> still use parallelism internally, which might be better. However, if
> sometab is large enough, then it might be better to parallelize the
> whole query using a Partial/FinalizeAggregate and force each call to
> somewhat_expensive_function() to run serially.
>

Is there any easy way to find out which way is less expensive?  Even
if we find some way or just make a rule that when an outer query uses
parallelism, then force function call to run serially, how do we
achieve that?  I mean in each worker we can ensure that each
individual statements from a function can run serially (by having a
check of isparallelworker() in gather node), but having a similar
check in the master backend is tricky or maybe we don't want to care
for the same in master backend.  Do you have any suggestions on how to
make it work?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Amit Langote
Date:
Subject: Re: [HACKERS] dropping partitioned tables without CASCADE
Next
From: Amit Langote
Date:
Subject: Re: [HACKERS] error detail when partition not found