Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel Seq Scan
Date
Msg-id CAA4eK1JpmQJzLLFOP0eUFo5Kv9hvjQBBA59jYJMrsOFkzLaTmA@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Parallel Seq Scan  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Thu, Sep 17, 2015 at 1:40 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Sep 10, 2015 at 12:12 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> >> 2. I think it's probably a good idea - at least for now, and maybe
> >> forever - to avoid nesting parallel plans inside of other parallel
> >> plans.  It's hard to imagine that being a win in a case like this, and
> >> it certainly adds a lot more cases to think about.
> >
> > I also think that avoiding nested parallel plans is a good step forward.
>
> Doing that as a part of the assess parallel safety patch was trivial, so I did.
>

As per my understanding, what you have done there will not prohibit such
cases.

+    * For now, we don't try to use parallel mode if we're running inside
+    * a parallel worker.  We might eventually be able to relax this
+    * restriction, but for now it seems best not to have parallel workers
+    * trying to create their own parallel workers.
+    */
+   glob->parallelModeOK = (cursorOptions & CURSOR_OPT_PARALLEL_OK) != 0 &&
+       IsUnderPostmaster && dynamic_shared_memory_type != DSM_IMPL_NONE &&
+       parse->commandType == CMD_SELECT && !parse->hasModifyingCTE &&
+       parse->utilityStmt == NULL && !IsParallelWorker() &&
+       !contain_parallel_unsafe((Node *) parse);


IIUC, your are referring to !IsParallelWorker() check in above code.  If yes,
then I think it won't work because we generate the plan in master backend,
parallel worker will never exercise this code.  I have tested it as well with
below example and it still generates SubPlan as Funnel.

CREATE TABLE t1(c1, c2) AS SELECT g, repeat('x', 5) FROM
generate_series(1, 10000000) g;

CREATE TABLE t2(c1, c2) AS SELECT g, repeat('x', 5) FROM
generate_series(1, 1000000) g;

set parallel_seqscan_degree=2;
set cpu_tuple_comm_cost=0.01;

explain select * from t1 where c1 not in (select c1 from t2 where c2 = 'xxxx');
                                     QUERY PLAN

--------------------------------------------------------------------------------
----
 Funnel on t1  (cost=11536.88..126809.17 rows=3432492 width=36)
   Filter: (NOT (hashed SubPlan 1))
   Number of Workers: 2
   ->  Partial Seq Scan on t1  (cost=11536.88..58159.32 rows=3432492 width=36)
         Filter: (NOT (hashed SubPlan 1))
         SubPlan 1
           ->  Funnel on t2  (cost=0.00..11528.30 rows=3433 width=4)
                 Filter: (c2 = 'xxxx'::text)
                 Number of Workers: 2
                 ->  Partial Seq Scan on t2  (cost=0.00..4662.68 rows=3433 width
=4)
                       Filter: (c2 = 'xxxx'::text)
   SubPlan 1
     ->  Funnel on t2  (cost=0.00..11528.30 rows=3433 width=4)
           Filter: (c2 = 'xxxx'::text)
           Number of Workers: 2
           ->  Partial Seq Scan on t2  (cost=0.00..4662.68 rows=3433 width=4)
                 Filter: (c2 = 'xxxx'::text)
(17 rows)


Here the subplan is generated before the top level plan and while generation
of subplan we can't predict whether it is okay to generate it as Funnel or not,
because it might be that top level plan is non-Funnel.  Also if such a subplan
is actually an InitPlan, then we are safe (as we execute the InitPlans in
master backend and then pass the result to parallel worker) even if top level
plan is Funnel.  I think the place where we can catch this is during the
generation of Funnel path, basically we can evaluate if any nodes beneath
Funnel node has 'filter' or 'targetlist' as another Funnel node, then we have
two options to proceed:
a. Mark such a filter or target list as non-pushable which will indicate that
they need to be executed only in master backend.  If we go with this
option, then we have to make Funnel node capable of evaluating Filter
and Targetlist which is not a big thing.
b. Don't choose the current path as Funnel path.  

I prefer second one as that seems to be simpler as compare to first and
there doesn't seem to be much benefit in going by first.

Any better ideas?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Support for N synchronous standby servers - take 2
Next
From: Amit Kapila
Date:
Subject: Re: a funnel by any other name