Re: [HACKERS] parallelize queries containing subplans - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: [HACKERS] parallelize queries containing subplans
Date
Msg-id CAA4eK1LQA-avGD=H3nNfVC8nMahozgRzEHb3KPpeppCTQ-gayg@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] parallelize queries containing subplans  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [HACKERS] parallelize queries containing subplans  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tue, Jan 3, 2017 at 4:19 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Wed, Dec 28, 2016 at 11:47 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> Now, we can further extend this to parallelize queries containing
>> correlated subplans like below:
>>
>> explain select * from t1 where t1.i in (select t2.i from t2 where t2.i=t1.i);
>>                          QUERY PLAN
>> -------------------------------------------------------------
>>  Seq Scan on t1  (cost=0.00..831049.09 rows=8395 width=12)
>>    Filter: (SubPlan 1)
>>    SubPlan 1
>>      ->  Seq Scan on t2  (cost=0.00..97.73 rows=493 width=4)
>>            Filter: (i = t1.i)
>> (5 rows)
>>
>> In the above query, Filter on t2 (i = t1.i) generates Param node which
>> is a parallel-restricted node, so such queries won't be able to use
>> parallelism even with the patch.  I think we can mark Params which
>> refer to same level as parallel-safe and I think we have this
>> information (node-> varlevelsup/ phlevelsup/ agglevelsup) available
>> when we replace correlation vars (SS_replace_correlation_vars).
>>
>
> I have implemented the above idea which will allow same or immediate
> outer level PARAMS as parallel_safe.  The results of above query after
> patch:
>
> postgres=# explain select * from t1 where t1.i in (select t2.i from t2
> where t2.i=t1.i);
>                                 QUERY PLAN
> --------------------------------------------------------------------------
>  Gather  (cost=0.00..488889.88 rows=8395 width=12)
>    Workers Planned: 1
>    ->  Parallel Seq Scan on t1  (cost=0.00..488889.88 rows=4938 width=12)
>          Filter: (SubPlan 1)
>          SubPlan 1
>            ->  Seq Scan on t2  (cost=0.00..97.73 rows=493 width=4)
>                  Filter: (i = t1.i)
> (7 rows)
>

On further evaluation, it seems this patch has one big problem which
is that it will allow forming parallel plans which can't be supported
with current infrastructure.  For ex. marking immediate level params
as parallel safe can generate below type of plan:

Seq Scan on t1  Filter: (SubPlan 1)  SubPlan 1    ->  Gather          Workers Planned: 1          ->  Result
   One-Time Filter: (t1.k = 0)                ->  Parallel Seq Scan on t2
 


In this plan, we can't evaluate one-time filter (that contains
correlated param) unless we have the capability to pass all kind of
PARAM_EXEC param to workers.   I don't want to invest too much time in
this patch unless somebody can see some way using current parallel
infrastructure to implement correlated subplans.

Note that still, the other patch [1] in this thread which implements
parallelism for uncorrelated subplan holds good.


[1] - https://www.postgresql.org/message-id/CAA4eK1J9mDZLcp-OskkdzAf_yT8W4dBSGL9E%3DkoEiJkdpVZsEA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "Seki, Eiji"
Date:
Subject: Re: [HACKERS] Proposal: GetOldestXminExtend for ignoring arbitraryvacuum flags
Next
From: Corey Huinker
Date:
Subject: Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands:\quit_if, \quit_unless)