Thread: Pushdown target list below gather node (WAS Re: WIP: Upper planner pathification)
Pushdown target list below gather node (WAS Re: WIP: Upper planner pathification)
From
Amit Kapila
Date:
On Wed, Mar 9, 2016 at 11:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Mar 9, 2016 at 12:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Gather is a bit weird, because although it can project (and needs to,
> > per the example of needing to compute a non-parallel-safe function),
> > you would rather push down as much work as possible to the child node;
> > and doing so is semantically OK for parallel-safe functions. (Pushing
> > functions down past a Sort node, for a counterexample, is not so OK
> > if you are concerned about function evaluation order, or even number
> > of executions.)
> >
> > In the current code structure it would perhaps be reasonable to teach
> > apply_projection_to_path about that --- although this would require
> > logic to separate parallel-safe and non-parallel-safe subexpressions,
> > which doesn't quite seem like something apply_projection_to_path
> > should be doing.
>
> I think for v1 it would be fine to make this all-or-nothing; that's
> what I had in mind to do. That is, if the entire tlist is
> parallel-safe, push it all down. If not, let the workers just return
> the necessary Vars and have Gather compute the final tlist.
>
>
> On Wed, Mar 9, 2016 at 12:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Gather is a bit weird, because although it can project (and needs to,
> > per the example of needing to compute a non-parallel-safe function),
> > you would rather push down as much work as possible to the child node;
> > and doing so is semantically OK for parallel-safe functions. (Pushing
> > functions down past a Sort node, for a counterexample, is not so OK
> > if you are concerned about function evaluation order, or even number
> > of executions.)
> >
> > In the current code structure it would perhaps be reasonable to teach
> > apply_projection_to_path about that --- although this would require
> > logic to separate parallel-safe and non-parallel-safe subexpressions,
> > which doesn't quite seem like something apply_projection_to_path
> > should be doing.
>
> I think for v1 it would be fine to make this all-or-nothing; that's
> what I had in mind to do. That is, if the entire tlist is
> parallel-safe, push it all down. If not, let the workers just return
> the necessary Vars and have Gather compute the final tlist.
>
I find it quite convenient to teach apply_projection_to_path() to push down target-list beneath Gather node, when targetlist contains parallel-safe expression. Attached patch implements pushing targetlist beneath gather node.
Below is output of a simple test which shows the effect of implementation.
Without Patch -
------------------------
postgres=# explain verbose select c1+2 from t1 where c1<10;
QUERY PLAN
-----------------------------------------------------------------------------
Gather (cost=0.00..44420.43 rows=30 width=4)
Output: (c1 + 2)
Number of Workers: 2
-> Parallel Seq Scan on public.t1 (cost=0.00..44420.35 rows=13 width=4)
Output: c1
Filter: (t1.c1 < 10)
(6 rows)
With Patch -
-----------------------
postgres=# explain verbose select c1+2 from t1 where c1<10;
QUERY PLAN
-----------------------------------------------------------------------------
Gather (cost=0.00..45063.75 rows=30 width=4)
Output: ((c1 + 2))
Number of Workers: 1
-> Parallel Seq Scan on public.t1 (cost=0.00..45063.68 rows=18 width=4)
Output: (c1 + 2)
Filter: (t1.c1 < 10)
(6 rows)
In the above plans, you can notice that target list expression (c1 + 2) is pushed beneath Gather node after patch.
Thoughts?
Attachment
Re: Pushdown target list below gather node (WAS Re: WIP: Upper planner pathification)
From
Robert Haas
Date:
On Wed, Mar 16, 2016 at 3:09 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
--
On Wed, Mar 9, 2016 at 11:58 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Mar 9, 2016 at 12:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >
> > Gather is a bit weird, because although it can project (and needs to,
> > per the example of needing to compute a non-parallel-safe function),
> > you would rather push down as much work as possible to the child node;
> > and doing so is semantically OK for parallel-safe functions. (Pushing
> > functions down past a Sort node, for a counterexample, is not so OK
> > if you are concerned about function evaluation order, or even number
> > of executions.)
> >
> > In the current code structure it would perhaps be reasonable to teach
> > apply_projection_to_path about that --- although this would require
> > logic to separate parallel-safe and non-parallel-safe subexpressions,
> > which doesn't quite seem like something apply_projection_to_path
> > should be doing.
>
> I think for v1 it would be fine to make this all-or-nothing; that's
> what I had in mind to do. That is, if the entire tlist is
> parallel-safe, push it all down. If not, let the workers just return
> the necessary Vars and have Gather compute the final tlist.
>I find it quite convenient to teach apply_projection_to_path() to push down target-list beneath Gather node, when targetlist contains parallel-safe expression. Attached patch implements pushing targetlist beneath gather node.
That doesn't update the cost of the subpath, which it probably needs to do. I wonder if this shouldn't be implemented by recursing.
if (IsA(path, GatherPath) && !has_parallel_hazard((Node *) target->exprs, false))
apply_projection_to_path(root, something, ((GatherPath *) path)->subpath, target);
Tom, any comments? I think it would be smart to push this into 9.6.
--
Re: Pushdown target list below gather node (WAS Re: WIP: Upper planner pathification)
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > That doesn't update the cost of the subpath, which it probably needs to > do. I wonder if this shouldn't be implemented by recursing. > if (IsA(path, GatherPath) && !has_parallel_hazard((Node *) target->exprs, > false)) > apply_projection_to_path(root, something, ((GatherPath *) > path)->subpath, target); > Tom, any comments? I think it would be smart to push this into 9.6. I agree with the idea that this problem should be solved, but there's nothing much to like about the particular details here. If we're going to push down stuff into the workers, I think adding a Result if needed to do that is something we'd definitely want it to do. So more along the lines of if (IsA(path, GatherPath) && !has_parallel_hazard((Node *) target->exprs, false)){ GatherPath *gpath = (GatherPath*) path; gpath->subpath = apply_projection_to_path(root, gpath->subpath->parent, gpath->subpath, target);} However, I continue to doubt that apply_projection_to_path really ought to know about parallel safety. And there is a larger problem with this: I'm not sure that it's appropriate for apply_projection_to_path to assume that the subpath is not shared with any other purposes. If it is shared, and we update the subpath's target in-place, we just broke the other path chains. Now the existing uses of apply_projection_to_path don't have to worry about that because they're always messing with paths whose parent rel isn't going to be used in any other way. Maybe this one doesn't either because the subpath would always be a partial path that won't have any other potential application besides being a child of *this specific* Gather path. But I'd like to see that addressed in a comment, if so. If it's not so, maybe the answer is to always interpose a Result node, in which case just use create_projection_path() in the above fragment. regards, tom lane
Re: Pushdown target list below gather node (WAS Re: WIP: Upper planner pathification)
From
Robert Haas
Date:
On Wed, Mar 16, 2016 at 12:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > That doesn't update the cost of the subpath, which it probably needs to > > do. I wonder if this shouldn't be implemented by recursing. > > > if (IsA(path, GatherPath) && !has_parallel_hazard((Node *) target->exprs, > > false)) > > apply_projection_to_path(root, something, ((GatherPath *) > > path)->subpath, target); > > > Tom, any comments? I think it would be smart to push this into 9.6. > > I agree with the idea that this problem should be solved, but there's > nothing much to like about the particular details here. If we're going > to push down stuff into the workers, I think adding a Result if needed > to do that is something we'd definitely want it to do. So more > along the lines of > > if (IsA(path, GatherPath) && > !has_parallel_hazard((Node *) target->exprs, false)) > { > GatherPath *gpath = (GatherPath *) path; > > gpath->subpath = > apply_projection_to_path(root, > gpath->subpath->parent, > gpath->subpath, > target); > } I agree. That's a cleaned-up version of the formula I posted. > However, I continue to doubt that apply_projection_to_path really ought to > know about parallel safety. > > And there is a larger problem with this: I'm not sure that it's > appropriate for apply_projection_to_path to assume that the subpath is not > shared with any other purposes. If it is shared, and we update the > subpath's target in-place, we just broke the other path chains. That's true. I don't see an obvious hazard here, because the Gather's child came from the rel's partial_pathlist, and the only way it gets used from there is to stick the Gather on top of it. So it really can't show up anywhere else. I think. But I agree it's a little scary. (To some lesser extent, apply_projection_to_path is always scary like that.) One idea is to skip generate_gather_paths() for the final rel and then make up the difference later: apply the final rel's target list once we've computed it, and then generate a gather path based on that. But I don't see an obvious way of doing that. > Now the existing uses of apply_projection_to_path don't have to worry > about that because they're always messing with paths whose parent rel > isn't going to be used in any other way. Maybe this one doesn't either > because the subpath would always be a partial path that won't have any > other potential application besides being a child of *this specific* > Gather path. But I'd like to see that addressed in a comment, if so. > > If it's not so, maybe the answer is to always interpose a Result node, > in which case just use create_projection_path() in the above fragment. Mmmph. That seems like a 2-bit solution, but I guess it would work. What if we taught create_projection_plan() to elide the Result node in that case? That is, instead of this: if (tlist_same_exprs(tlist, subplan->targetlist)) We could do this: if (is_projection_capable_path(subplan) || tlist_same_exprs(tlist, subplan->targetlist)) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Pushdown target list below gather node (WAS Re: WIP: Upper planner pathification)
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Mar 16, 2016 at 12:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> And there is a larger problem with this: I'm not sure that it's >> appropriate for apply_projection_to_path to assume that the subpath is not >> shared with any other purposes. If it is shared, and we update the >> subpath's target in-place, we just broke the other path chains. > That's true. I don't see an obvious hazard here, because the Gather's > child came from the rel's partial_pathlist, and the only way it gets > used from there is to stick the Gather on top of it. So it really > can't show up anywhere else. I think. The key question I think is could there ever be more than one Gather sharing the same subpath? > (To some lesser extent, apply_projection_to_path is always > scary like that.) Right, that's why there's also create_projection_path for when you aren't sure. > Mmmph. That seems like a 2-bit solution, but I guess it would work. > What if we taught create_projection_plan() to elide the Result node in > that case? Yeah, I was thinking about the same thing. The comment block above where you're looking would need some adjustment. regards, tom lane
Re: Pushdown target list below gather node (WAS Re: WIP: Upper planner pathification)
From
Robert Haas
Date:
On Wed, Mar 16, 2016 at 12:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Yeah, I was thinking about the same thing. The comment block above > where you're looking would need some adjustment. OK, how about this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
Re: Pushdown target list below gather node (WAS Re: WIP: Upper planner pathification)
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Mar 16, 2016 at 12:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah, I was thinking about the same thing. The comment block above >> where you're looking would need some adjustment. > OK, how about this? Looks pretty close. One point is that if we do end up using a Result node, then the parent GatherPath does not get charged for the Result node's cpu_per_tuple overhead. I'm not sure that that's worth changing though. It's probably better to bet that the subpath is projectable and so no cost will ensue, than to bet the other way. regards, tom lane
Re: Pushdown target list below gather node (WAS Re: WIP: Upper planner pathification)
From
Robert Haas
Date:
On Wed, Mar 16, 2016 at 2:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Mar 16, 2016 at 12:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Yeah, I was thinking about the same thing. The comment block above >>> where you're looking would need some adjustment. > >> OK, how about this? > > Looks pretty close. One point is that if we do end up using a Result > node, then the parent GatherPath does not get charged for the Result > node's cpu_per_tuple overhead. I'm not sure that that's worth changing > though. It's probably better to bet that the subpath is projectable and > so no cost will ensue, than to bet the other way. I'm almost sure this way is the better bet. I actually think at present the GatherPath is always on top of a scan or join, and those all project. There might be other cases in the future that don't, but I think it'd be fine to leave off worrying about this until we (a) find a case where it happens and (b) failing to charge for the Result causes a problem. The current situation of never projecting in the workers is far worse. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Pushdown target list below gather node (WAS Re: WIP: Upper planner pathification)
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Mar 16, 2016 at 2:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Looks pretty close. One point is that if we do end up using a Result >> node, then the parent GatherPath does not get charged for the Result >> node's cpu_per_tuple overhead. I'm not sure that that's worth changing >> though. It's probably better to bet that the subpath is projectable and >> so no cost will ensue, than to bet the other way. > I'm almost sure this way is the better bet. Actually, we do know what will happen ... so maybe /* * We always use create_projection_path here, even if the subpath is * projection-capable, so as toavoid modifying the subpath in place. * It seems unlikely at present that there could be any other * referencesto the subpath anyway, but better safe than sorry. */ + if (!is_projection_capable_path(gpath->subpath)) + gpath->path.total_cost += cpu_tuple_cost * gpath->subpath->rows; gpath->subpath = (Path *) create_projection_path(root, gpath->subpath->parent, gpath->subpath, target); The comment could use adjustment if you adopt that, to reference the fact that we know create_projection_plan will get rid of the Result if not needed. regards, tom lane
Re: Pushdown target list below gather node (WAS Re: WIP: Upper planner pathification)
From
Amit Kapila
Date:
On Wed, Mar 16, 2016 at 10:57 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Mar 16, 2016 at 12:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Yeah, I was thinking about the same thing. The comment block above
> > where you're looking would need some adjustment.
>
> OK, how about this?
>
>
> On Wed, Mar 16, 2016 at 12:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Yeah, I was thinking about the same thing. The comment block above
> > where you're looking would need some adjustment.
>
> OK, how about this?
>
+ * children. Alternatively, apply_projection_to_path might have created
+ * a projection path as the subpath of a Gather node even though the
+ * subpath was projection-capable. So, if the subpath is capable of
+ * projection or the desired tlist is the same expression-wise as the
+ * subplan's, just jam it in there. We'll have charged for a Result that
+ * doesn't actually appear in the plan, but that's better than having a
+ * Result we don't need.
*/
- if (tlist_same_exprs(tlist, subplan->targetlist))
+ if (is_projection_capable_path(best_path->subpath) ||
+ tlist_same_exprs(tlist, subplan->targetlist))
While reading above code changes, it looks like it is assuming that subpath and subplan will always be same (as it is verifying projection capability of subpath and attaching the tlist to subplan), but I think it is possible that subpath and subplan correspond to different nodes when gating Result node is added on to top of scan plan by create_scan_plan(). I think it might be better to explain in comments, why it is safe to rely on projection capability of subpath to attach tlist to subplan.
Re: Pushdown target list below gather node (WAS Re: WIP: Upper planner pathification)
From
Tom Lane
Date:
Amit Kapila <amit.kapila16@gmail.com> writes: > While reading above code changes, it looks like it is assuming that subpath > and subplan will always be same (as it is verifying projection capability > of subpath and attaching the tlist to subplan), but I think it is possible > that subpath and subplan correspond to different nodes when gating Result > node is added on to top of scan plan by create_scan_plan(). A little more thought will show you that that's not actually relevant, because the tlist computation would have happened (or not) below the gating Result. If gating Results had an impact on apply_projection_to_path's decisions we'd have had to do something about that before this. regards, tom lane
Re: Pushdown target list below gather node (WAS Re: WIP: Upper planner pathification)
From
Amit Kapila
Date:
On Thu, Mar 17, 2016 at 7:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Amit Kapila <amit.kapila16@gmail.com> writes:
> > While reading above code changes, it looks like it is assuming that subpath
> > and subplan will always be same (as it is verifying projection capability
> > of subpath and attaching the tlist to subplan), but I think it is possible
> > that subpath and subplan correspond to different nodes when gating Result
> > node is added on to top of scan plan by create_scan_plan().
>
> A little more thought will show you that that's not actually relevant,
> because the tlist computation would have happened (or not) below the
> gating Result. If gating Results had an impact on
> apply_projection_to_path's decisions we'd have had to do something about
> that before this.
>
>
> Amit Kapila <amit.kapila16@gmail.com> writes:
> > While reading above code changes, it looks like it is assuming that subpath
> > and subplan will always be same (as it is verifying projection capability
> > of subpath and attaching the tlist to subplan), but I think it is possible
> > that subpath and subplan correspond to different nodes when gating Result
> > node is added on to top of scan plan by create_scan_plan().
>
> A little more thought will show you that that's not actually relevant,
> because the tlist computation would have happened (or not) below the
> gating Result. If gating Results had an impact on
> apply_projection_to_path's decisions we'd have had to do something about
> that before this.
>
I understand that gating Results won't impact it, but it was just not apparent from looking at the code I had referred. If you think it is quite obvious thing, then we can leave the comment as it is.
Re: Pushdown target list below gather node (WAS Re: WIP: Upper planner pathification)
From
Robert Haas
Date:
On Wed, Mar 16, 2016 at 2:25 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Mar 16, 2016 at 2:02 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Looks pretty close. One point is that if we do end up using a Result >>> node, then the parent GatherPath does not get charged for the Result >>> node's cpu_per_tuple overhead. I'm not sure that that's worth changing >>> though. It's probably better to bet that the subpath is projectable and >>> so no cost will ensue, than to bet the other way. > >> I'm almost sure this way is the better bet. > > Actually, we do know what will happen ... so maybe > > /* > * We always use create_projection_path here, even if the subpath is > * projection-capable, so as to avoid modifying the subpath in place. > * It seems unlikely at present that there could be any other > * references to the subpath anyway, but better safe than sorry. > */ > + if (!is_projection_capable_path(gpath->subpath)) > + gpath->path.total_cost += cpu_tuple_cost * gpath->subpath->rows; > gpath->subpath = (Path *) > create_projection_path(root, > gpath->subpath->parent, > gpath->subpath, > target); > > The comment could use adjustment if you adopt that, to reference the fact > that we know create_projection_plan will get rid of the Result if not > needed. OK, I've committed something along those lines. Thanks for the advice, and feel free to whack it around if you have an idea for improving it further - though IMHO this is good enough for 9.6. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Pushdown target list below gather node (WAS Re: WIP: Upper planner pathification)
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > OK, I've committed something along those lines. Thanks for the > advice, and feel free to whack it around if you have an idea for > improving it further - though IMHO this is good enough for 9.6. The committed patch looks fine to me. WRT your commit-message comment about pushing down only parallel-safe tlist expressions, I would strongly advise leaving that for another day. I have some irons in the fire concerning ways to push down computation of index expressions (so that we can consistently get f(x) out of an index on f(x) rather than recomputing it), and managing parallel-safe expressions seems like it'd benefit from that infrastucture too. regards, tom lane
Re: Pushdown target list below gather node (WAS Re: WIP: Upper planner pathification)
From
Robert Haas
Date:
On Fri, Mar 18, 2016 at 10:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> OK, I've committed something along those lines. Thanks for the >> advice, and feel free to whack it around if you have an idea for >> improving it further - though IMHO this is good enough for 9.6. > > The committed patch looks fine to me. WRT your commit-message comment > about pushing down only parallel-safe tlist expressions, I would strongly > advise leaving that for another day. I have some irons in the fire > concerning ways to push down computation of index expressions (so that > we can consistently get f(x) out of an index on f(x) rather than > recomputing it), and managing parallel-safe expressions seems like it'd > benefit from that infrastucture too. Roger. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company