Re: Parallel Seq Scan - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Parallel Seq Scan |
Date | |
Msg-id | CAA4eK1K36exy0ssGa9M9oSE7kCdxo7AoRohMq4JwRc-K6--Kxw@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Seq Scan (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Parallel Seq Scan
|
List | pgsql-hackers |
On Fri, Oct 23, 2015 at 5:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Oct 13, 2015 at 5:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > - Although the changes in parallelpaths.c are in a good direction, I'm
> > pretty sure this is not yet up to scratch. I am less sure exactly
> > what needs to be fixed, so I'll have to give some more thought to
> > that.
>
> Please find attached a proposed set of changes that I think are
> better. These changes compute a consider_parallel flag for each
> RelOptInfo, which is true if it's a non-temporary relation whose
> baserestrictinfo references no PARAM_EXEC parameters, sublinks, or
> parallel-restricted functions. Actually, I made an effort to set the
> flag correctly even for baserels other than plain tables, and for
> joinrels, though we don't technically need that stuff until we get to
> the point of pushing joins beneath Gather nodes. When we get there,
> it will be important - any joinrel for which consider_parallel = false
> needn't even try to generate parallel paths, while if
> consider_parallel = true then we can consider it, if the costing makes
> sense.
>
> The advantage of this is that the logic is centralized. If we have
> parallel seq scan and also, say, parallel bitmap heap scan, your
> approach would require that we duplicate the logic to check for
> parallel-restricted functions for each path generation function.
>
> On Tue, Oct 13, 2015 at 5:59 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> > - Although the changes in parallelpaths.c are in a good direction, I'm
> > pretty sure this is not yet up to scratch. I am less sure exactly
> > what needs to be fixed, so I'll have to give some more thought to
> > that.
>
> Please find attached a proposed set of changes that I think are
> better. These changes compute a consider_parallel flag for each
> RelOptInfo, which is true if it's a non-temporary relation whose
> baserestrictinfo references no PARAM_EXEC parameters, sublinks, or
> parallel-restricted functions. Actually, I made an effort to set the
> flag correctly even for baserels other than plain tables, and for
> joinrels, though we don't technically need that stuff until we get to
> the point of pushing joins beneath Gather nodes. When we get there,
> it will be important - any joinrel for which consider_parallel = false
> needn't even try to generate parallel paths, while if
> consider_parallel = true then we can consider it, if the costing makes
> sense.
>
Considering parallelism at RelOptInfo level in the way as done in patch,
won't consider the RelOptInfo's for child relations in case of Append node.
Refer build_simple_rel().
Also for cases when parallelism is not enabled like max_parallel_degree = 0,
the current way of doing could add an overhead of traversing the
baserestrictinfo without need. I think one way to avoid that would be check
that while setting parallelModeOK flag.
Another point is that it will consider parallelism for cases where we really
can't parallelize example for foreign table, sample scan.
One thing to note here is that we already have precedent of verifying qual
push down safety while path generation (during subquery path generation),
so it doesn't seem wrong to consider the same for parallel paths and it would
minimize the cases where we need to evaluate parallelism.
> The advantage of this is that the logic is centralized. If we have
> parallel seq scan and also, say, parallel bitmap heap scan, your
> approach would require that we duplicate the logic to check for
> parallel-restricted functions for each path generation function.
>
Don't we anyway need that irrespective of caching it in RelOptInfo?
During bitmappath creation, bitmapqual could contain something
which needs to be evaluated for parallel-safety as it is built based
on index paths which inturn can be based on some join clause. As per
patch, the join clause parallel-safety is checked much later than
generation bitmappath.
+ else if (IsA(node, SubPlan) || IsA(node, SubLink) ||
+ IsA(node, AlternativeSubPlan) || IsA(node, Param))
+ {
+ /*
+ * Since we don't have the ability to push subplans down to workers
+ * at present, we treat subplan references as parallel-restricted.
+ */
+ if (!context->allow_restricted)
+ return true;
+ }
I think it is better to do this for PARAM_EXEC paramkind, as those are
the cases where it would be subplan or initplan.
pgsql-hackers by date: