Re: pgsql: Try again to fix the way the scanjoin_target is used with partia - Mailing list pgsql-committers
| From | Tom Lane |
|---|---|
| Subject | Re: pgsql: Try again to fix the way the scanjoin_target is used with partia |
| Date | |
| Msg-id | 16117.1466537904@sss.pgh.pa.us Whole thread Raw |
| In response to | Re: pgsql: Try again to fix the way the scanjoin_target is used with partia (Tom Lane <tgl@sss.pgh.pa.us>) |
| List | pgsql-committers |
I wrote:
> I had hoped that this would result in simplifying create_projection_plan
> so that it just makes a Result or not according to what
> create_projection_path decided, but there's one regression test case that
> fails (in the sense of showing a Result in the plan that isn't really
> needed). That happens because create_merge_append_plan adds sort columns
> to the tlist and so a tlist match is possible after that happens when it
> didn't match before. For the moment I kluged create_projection_plan so
> that that keeps working, but I wonder if it'd be better to just accept an
> extra Result in that case.
After further thought about that, I realized the issue is bigger than just
MergeAppend nodes: in general, if createplan.c alters the tlist of a node
by adding resjunk columns, that could break the earlier decision about
whether the tlist expressions are equal, *in either direction*. So we
have to consider that create_projection_path's decision is just tentative
anytime we are dealing with a non-projection-capable subpath. I think
this means that apply_projection_to_path is actually broken right now,
because it supposes that any tlist equality that it sees can't change
later. (It might be that we don't use it in a way that would cause that
to manifest, but I don't have a lot of faith in that, since this code is
all pretty new.) Revised patch attached.
regards, tom lane
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index b675447..b0a28f5 100644
*** a/src/backend/nodes/outfuncs.c
--- b/src/backend/nodes/outfuncs.c
*************** _outProjectionPath(StringInfo str, const
*** 1809,1814 ****
--- 1809,1815 ----
_outPathInfo(str, (const Path *) node);
WRITE_NODE_FIELD(subpath);
+ WRITE_BOOL_FIELD(dummypp);
}
static void
diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c
index ab8df76..b2db6e8 100644
*** a/src/backend/optimizer/plan/createplan.c
--- b/src/backend/optimizer/plan/createplan.c
*************** create_gather_plan(PlannerInfo *root, Ga
*** 1409,1416 ****
/*
* create_projection_plan
*
! * Create a Result node to do a projection step and (recursively) plans
! * for its subpaths.
*/
static Plan *
create_projection_plan(PlannerInfo *root, ProjectionPath *best_path)
--- 1409,1417 ----
/*
* create_projection_plan
*
! * Create a plan tree to do a projection step and (recursively) plans
! * for its subpaths. We may need a Result node for the projection,
! * but sometimes we can just let the subplan do the work.
*/
static Plan *
create_projection_plan(PlannerInfo *root, ProjectionPath *best_path)
*************** create_projection_plan(PlannerInfo *root
*** 1425,1455 ****
tlist = build_path_tlist(root, &best_path->path);
/*
! * We might not really need a Result node here. There are several ways
! * that this can happen. For example, MergeAppend doesn't project, so we
! * would have thought that we needed a projection to attach resjunk sort
! * columns to its output ... but create_merge_append_plan might have added
! * those same resjunk sort columns to both MergeAppend and its children.
! * Alternatively, apply_projection_to_path might have created a projection
! * path as the subpath of a Gather node even though the subpath was
! * projection-capable. So, if the subpath is capable of projection or the
! * desired tlist is the same expression-wise as the subplan's, just jam it
! * in there. We'll have charged for a Result that doesn't actually appear
! * in the plan, but that's better than having a Result we don't need.
*/
if (is_projection_capable_path(best_path->subpath) ||
tlist_same_exprs(tlist, subplan->targetlist))
{
plan = subplan;
plan->targetlist = tlist;
! /* Adjust cost to match what we thought during planning */
plan->startup_cost = best_path->path.startup_cost;
plan->total_cost = best_path->path.total_cost;
/* ... but be careful not to munge subplan's parallel-aware flag */
}
else
{
plan = (Plan *) make_result(tlist, NULL, subplan);
copy_generic_path_info(plan, (Path *) best_path);
--- 1426,1462 ----
tlist = build_path_tlist(root, &best_path->path);
/*
! * We might not really need a Result node here, either because the subplan
! * can project or because it's returning the right list of expressions
! * anyway. Usually create_projection_path will have detected that and set
! * dummypp if we don't need a Result; but its decision can't be final,
! * because some createplan.c routines change the tlists of their nodes.
! * (An example is that create_merge_append_plan might add resjunk sort
! * columns to a MergeAppend.) So we have to recheck here. If we do
! * arrive at a different answer than create_projection_path did, we'll
! * have made slightly wrong cost estimates; but label the plan with the
! * cost estimates we actually used, not "corrected" ones. (XXX this could
! * be cleaned up if we moved more of the sortcolumn setup logic into Path
! * creation, but that would add expense to creating Paths we might end up
! * not using.)
*/
if (is_projection_capable_path(best_path->subpath) ||
tlist_same_exprs(tlist, subplan->targetlist))
{
+ /* Don't need a separate Result, just assign tlist to subplan */
plan = subplan;
plan->targetlist = tlist;
! /* Label plan with the estimated costs we actually used */
plan->startup_cost = best_path->path.startup_cost;
plan->total_cost = best_path->path.total_cost;
+ plan->plan_rows = best_path->path.rows;
+ plan->plan_width = best_path->path.pathtarget->width;
/* ... but be careful not to munge subplan's parallel-aware flag */
}
else
{
+ /* We need a Result node */
plan = (Plan *) make_result(tlist, NULL, subplan);
copy_generic_path_info(plan, (Path *) best_path);
diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c
index 0434a5a..cefec7b 100644
*** a/src/backend/optimizer/plan/planagg.c
--- b/src/backend/optimizer/plan/planagg.c
*************** build_minmax_path(PlannerInfo *root, Min
*** 465,472 ****
* cheapest path.)
*/
sorted_path = apply_projection_to_path(subroot, final_rel, sorted_path,
! create_pathtarget(subroot, tlist),
! false);
/*
* Determine cost to get just the first row of the presorted path.
--- 465,471 ----
* cheapest path.)
*/
sorted_path = apply_projection_to_path(subroot, final_rel, sorted_path,
! create_pathtarget(subroot, tlist));
/*
* Determine cost to get just the first row of the presorted path.
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 094c708..2372311 100644
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*************** grouping_planner(PlannerInfo *root, bool
*** 1500,1506 ****
PathTarget *grouping_target;
PathTarget *scanjoin_target;
bool have_grouping;
- bool scanjoin_target_parallel_safe = false;
WindowFuncLists *wflists = NULL;
List *activeWindows = NIL;
List *rollup_lists = NIL;
--- 1500,1505 ----
*************** grouping_planner(PlannerInfo *root, bool
*** 1731,1744 ****
scanjoin_target = grouping_target;
/*
- * Check whether scan/join target is parallel safe ... unless there
- * are no partial paths, in which case we don't care.
- */
- if (current_rel->partial_pathlist &&
- !has_parallel_hazard((Node *) scanjoin_target->exprs, false))
- scanjoin_target_parallel_safe = true;
-
- /*
* Forcibly apply scan/join target to all the Paths for the scan/join
* rel.
*
--- 1730,1735 ----
*************** grouping_planner(PlannerInfo *root, bool
*** 1756,1763 ****
Assert(subpath->param_info == NULL);
path = apply_projection_to_path(root, current_rel,
! subpath, scanjoin_target,
! scanjoin_target_parallel_safe);
/* If we had to add a Result, path is different from subpath */
if (path != subpath)
{
--- 1747,1753 ----
Assert(subpath->param_info == NULL);
path = apply_projection_to_path(root, current_rel,
! subpath, scanjoin_target);
/* If we had to add a Result, path is different from subpath */
if (path != subpath)
{
*************** grouping_planner(PlannerInfo *root, bool
*** 1774,1788 ****
* partial pathlist will expect partial paths for that rel to produce
* the same output as complete paths ... and we just changed the
* output for the complete paths, so we'll need to do the same thing
! * for partial paths.
*/
! if (scanjoin_target_parallel_safe)
{
! /*
! * Apply the scan/join target to each partial path. Otherwise,
! * anything that attempts to use the partial paths for further
! * upper planning may go wrong.
! */
foreach(lc, current_rel->partial_pathlist)
{
Path *subpath = (Path *) lfirst(lc);
--- 1764,1776 ----
* partial pathlist will expect partial paths for that rel to produce
* the same output as complete paths ... and we just changed the
* output for the complete paths, so we'll need to do the same thing
! * for partial paths. But only parallel-safe expressions can be
! * computed by partial paths.
*/
! if (current_rel->partial_pathlist &&
! !has_parallel_hazard((Node *) scanjoin_target->exprs, false))
{
! /* Apply the scan/join target to each partial path */
foreach(lc, current_rel->partial_pathlist)
{
Path *subpath = (Path *) lfirst(lc);
*************** grouping_planner(PlannerInfo *root, bool
*** 1792,1827 ****
Assert(subpath->param_info == NULL);
/*
! * We can't use apply_projection_to_path() here, because there
! * could already be pointers to these paths, and therefore we
! * dare not modify them in place. Instead, we must use
! * create_projection_path() unconditionally.
*/
newpath = (Path *) create_projection_path(root,
current_rel,
subpath,
scanjoin_target);
-
- /*
- * Although create_projection_path() inserts a ProjectionPath
- * unconditionally, create_projection_plan() will only insert
- * a Result node if the subpath is not projection-capable, so
- * we should discount the cost of that node if it will not
- * actually get inserted. (This is pretty grotty but we can
- * improve it later if it seems important.)
- */
- if (equal(scanjoin_target->exprs, subpath->pathtarget->exprs))
- {
- /* at most we need a relabeling of the subpath */
- newpath->startup_cost = subpath->startup_cost;
- newpath->total_cost = subpath->total_cost;
- }
- else if (is_projection_capable_path(subpath))
- {
- /* need to project, but we don't need a Result */
- newpath->total_cost -= cpu_tuple_cost * subpath->rows;
- }
-
lfirst(lc) = newpath;
}
}
--- 1780,1793 ----
Assert(subpath->param_info == NULL);
/*
! * Don't use apply_projection_to_path() here, because there
! * could be other pointers to these paths, and therefore we
! * mustn't modify them in place.
*/
newpath = (Path *) create_projection_path(root,
current_rel,
subpath,
scanjoin_target);
lfirst(lc) = newpath;
}
}
*************** create_ordered_paths(PlannerInfo *root,
*** 4231,4237 ****
/* Add projection step if needed */
if (path->pathtarget != target)
path = apply_projection_to_path(root, ordered_rel,
! path, target, false);
add_path(ordered_rel, path);
}
--- 4197,4203 ----
/* Add projection step if needed */
if (path->pathtarget != target)
path = apply_projection_to_path(root, ordered_rel,
! path, target);
add_path(ordered_rel, path);
}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index 30975e0..552b756 100644
*** a/src/backend/optimizer/prep/prepunion.c
--- b/src/backend/optimizer/prep/prepunion.c
*************** recurse_set_operations(Node *setOp, Plan
*** 325,332 ****
refnames_tlist);
path = apply_projection_to_path(root, rel, path,
! create_pathtarget(root, tlist),
! false);
/* Return the fully-fledged tlist to caller, too */
*pTargetList = tlist;
--- 325,331 ----
refnames_tlist);
path = apply_projection_to_path(root, rel, path,
! create_pathtarget(root, tlist));
/* Return the fully-fledged tlist to caller, too */
*pTargetList = tlist;
*************** recurse_set_operations(Node *setOp, Plan
*** 395,402 ****
path->parent,
path,
create_pathtarget(root,
! *pTargetList),
! false);
}
return path;
}
--- 394,400 ----
path->parent,
path,
create_pathtarget(root,
! *pTargetList));
}
return path;
}
diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c
index 0ff353f..8fd933f 100644
*** a/src/backend/optimizer/util/pathnode.c
--- b/src/backend/optimizer/util/pathnode.c
*************** create_projection_path(PlannerInfo *root
*** 2168,2173 ****
--- 2168,2174 ----
PathTarget *target)
{
ProjectionPath *pathnode = makeNode(ProjectionPath);
+ PathTarget *oldtarget = subpath->pathtarget;
pathnode->path.pathtype = T_Result;
pathnode->path.parent = rel;
*************** create_projection_path(PlannerInfo *root
*** 2184,2196 ****
pathnode->subpath = subpath;
/*
! * The Result node's cost is cpu_tuple_cost per row, plus the cost of
! * evaluating the tlist. There is no qual to worry about.
*/
! pathnode->path.rows = subpath->rows;
! pathnode->path.startup_cost = subpath->startup_cost + target->cost.startup;
! pathnode->path.total_cost = subpath->total_cost + target->cost.startup +
! (cpu_tuple_cost + target->cost.per_tuple) * subpath->rows;
return pathnode;
}
--- 2185,2230 ----
pathnode->subpath = subpath;
/*
! * We might not need a separate Result node. If the input plan node type
! * can project, we can just tell it to project something else. Or, if it
! * can't project but the desired target has the same expression list as
! * what the input will produce anyway, we can still give it the desired
! * tlist (possibly changing its ressortgroupref labels, but nothing else).
! * Note: in the latter case, create_projection_plan has to recheck our
! * conclusion; see comments therein.
*/
! if (is_projection_capable_path(subpath) ||
! equal(oldtarget->exprs, target->exprs))
! {
! /* No separate Result node needed */
! pathnode->dummypp = true;
!
! /*
! * Set cost of plan as subpath's cost, adjusted for tlist replacement.
! */
! pathnode->path.rows = subpath->rows;
! pathnode->path.startup_cost = subpath->startup_cost +
! (target->cost.startup - oldtarget->cost.startup);
! pathnode->path.total_cost = subpath->total_cost +
! (target->cost.startup - oldtarget->cost.startup) +
! (target->cost.per_tuple - oldtarget->cost.per_tuple) * subpath->rows;
! }
! else
! {
! /* We really do need the Result node */
! pathnode->dummypp = false;
!
! /*
! * The Result node's cost is cpu_tuple_cost per row, plus the cost of
! * evaluating the tlist. There is no qual to worry about.
! */
! pathnode->path.rows = subpath->rows;
! pathnode->path.startup_cost = subpath->startup_cost +
! target->cost.startup;
! pathnode->path.total_cost = subpath->total_cost +
! target->cost.startup +
! (cpu_tuple_cost + target->cost.per_tuple) * subpath->rows;
! }
return pathnode;
}
*************** create_projection_path(PlannerInfo *root
*** 2199,2236 ****
* apply_projection_to_path
* Add a projection step, or just apply the target directly to given path.
*
! * Most plan types include ExecProject, so we can implement a new projection
! * without an extra plan node: just replace the given path's pathtarget with
! * the desired one. If the given path can't project, add a ProjectionPath.
*
! * We can also short-circuit cases where the targetlist expressions are
! * actually equal; this is not an uncommon case, since it may arise from
! * trying to apply a PathTarget with sortgroupref labeling to a derived
! * path without such labeling.
*
! * This requires knowing that the source path won't be referenced for other
! * purposes (e.g., other possible paths), since we modify it in-place. Note
! * also that we mustn't change the source path's parent link; so when it is
* add_path'd to "rel" things will be a bit inconsistent. So far that has
* not caused any trouble.
*
* 'rel' is the parent relation associated with the result
* 'path' is the path representing the source of data
* 'target' is the PathTarget to be computed
- * 'target_parallel' indicates that target expressions are all parallel-safe
*/
Path *
apply_projection_to_path(PlannerInfo *root,
RelOptInfo *rel,
Path *path,
! PathTarget *target,
! bool target_parallel)
{
QualCost oldcost;
! /* Make a separate ProjectionPath if needed */
! if (!is_projection_capable_path(path) &&
! !equal(path->pathtarget->exprs, target->exprs))
return (Path *) create_projection_path(root, rel, path, target);
/*
--- 2233,2269 ----
* apply_projection_to_path
* Add a projection step, or just apply the target directly to given path.
*
! * This has the same net effect as create_projection_path(), except that if
! * a separate Result plan node isn't needed, we just replace the given path's
! * pathtarget with the desired one. This must be used only when the caller
! * knows that the given path isn't referenced elsewhere and so can be modified
! * in-place.
*
! * If the input path is a GatherPath, we try to push the new target down to
! * its input as well; this is a yet more invasive modification of the input
! * path, which create_projection_path() can't do.
*
! * Note that we mustn't change the source path's parent link; so when it is
* add_path'd to "rel" things will be a bit inconsistent. So far that has
* not caused any trouble.
*
* 'rel' is the parent relation associated with the result
* 'path' is the path representing the source of data
* 'target' is the PathTarget to be computed
*/
Path *
apply_projection_to_path(PlannerInfo *root,
RelOptInfo *rel,
Path *path,
! PathTarget *target)
{
QualCost oldcost;
! /*
! * If given path can't project, we might need a Result node, so make a
! * separate ProjectionPath.
! */
! if (!is_projection_capable_path(path))
return (Path *) create_projection_path(root, rel, path, target);
/*
*************** apply_projection_to_path(PlannerInfo *ro
*** 2247,2256 ****
/*
* If the path happens to be a Gather path, we'd like to arrange for the
* subpath to return the required target list so that workers can help
! * project. But if there is something that is not parallel-safe in the
* target expressions, then we can't.
*/
! if (IsA(path, GatherPath) &&target_parallel)
{
GatherPath *gpath = (GatherPath *) path;
--- 2280,2290 ----
/*
* If the path happens to be a Gather path, we'd like to arrange for the
* subpath to return the required target list so that workers can help
! * project. But if there is something that is not parallel-safe in the
* target expressions, then we can't.
*/
! if (IsA(path, GatherPath) &&
! !has_parallel_hazard((Node *) target->exprs, false))
{
GatherPath *gpath = (GatherPath *) path;
*************** apply_projection_to_path(PlannerInfo *ro
*** 2258,2271 ****
* We always use create_projection_path here, even if the subpath is
* projection-capable, so as to avoid modifying the subpath in place.
* It seems unlikely at present that there could be any other
! * references to the subpath anyway, but better safe than sorry.
! * (create_projection_plan will only insert a Result node if the
! * subpath is not projection-capable, so we only include the cost of
! * that node if it will actually be inserted. This is a bit grotty
! * but we can improve it later if it seems important.)
*/
- if (!is_projection_capable_path(gpath->subpath))
- gpath->path.total_cost += cpu_tuple_cost * gpath->subpath->rows;
gpath->subpath = (Path *)
create_projection_path(root,
gpath->subpath->parent,
--- 2292,2303 ----
* We always use create_projection_path here, even if the subpath is
* projection-capable, so as to avoid modifying the subpath in place.
* It seems unlikely at present that there could be any other
! * references to the subpath, but better safe than sorry.
! *
! * Note that we don't change the GatherPath's cost estimates; it might
! * be appropriate to do so, to reflect the fact that the bulk of the
! * target evaluation will happen in workers.
*/
gpath->subpath = (Path *)
create_projection_path(root,
gpath->subpath->parent,
diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h
index 57747fc..9470df6 100644
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
*************** typedef struct HashPath
*** 1274,1288 ****
/*
* ProjectionPath represents a projection (that is, targetlist computation)
*
! * This path node represents using a Result plan node to do a projection.
! * It's only needed atop a node that doesn't support projection (such as
! * Sort); otherwise we just jam the new desired PathTarget into the lower
! * path node, and adjust that node's estimated cost accordingly.
*/
typedef struct ProjectionPath
{
Path path;
Path *subpath; /* path representing input source */
} ProjectionPath;
/*
--- 1274,1295 ----
/*
* ProjectionPath represents a projection (that is, targetlist computation)
*
! * Nominally, this path node represents using a Result plan node to do a
! * projection step. However, if the input plan node supports projection,
! * we can just modify its output targetlist to do the required calculations
! * directly, and not need a Result. In some places in the planner we can just
! * jam the desired PathTarget into the input path node (and adjust its cost
! * accordingly), so we don't need a ProjectionPath. But in other places
! * it's necessary to not modify the input path node, so we need a separate
! * ProjectionPath node, which is marked dummy to indicate that we intend to
! * assign the work to the input plan node. The estimated cost for the
! * ProjectionPath node will account for whether a Result will be used or not.
*/
typedef struct ProjectionPath
{
Path path;
Path *subpath; /* path representing input source */
+ bool dummypp; /* true if no separate Result is needed */
} ProjectionPath;
/*
diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h
index 586ecdd..5de4c34 100644
*** a/src/include/optimizer/pathnode.h
--- b/src/include/optimizer/pathnode.h
*************** extern ProjectionPath *create_projection
*** 143,150 ****
extern Path *apply_projection_to_path(PlannerInfo *root,
RelOptInfo *rel,
Path *path,
! PathTarget *target,
! bool target_parallel);
extern SortPath *create_sort_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
--- 143,149 ----
extern Path *apply_projection_to_path(PlannerInfo *root,
RelOptInfo *rel,
Path *path,
! PathTarget *target);
extern SortPath *create_sort_path(PlannerInfo *root,
RelOptInfo *rel,
Path *subpath,
pgsql-committers by date: