Re: pgsql: Try again to fix the way the scanjoin_target is used with partia - Mailing list pgsql-committers
From | Tom Lane |
---|---|
Subject | Re: pgsql: Try again to fix the way the scanjoin_target is used with partia |
Date | |
Msg-id | 8536.1466530136@sss.pgh.pa.us Whole thread Raw |
In response to | Re: pgsql: Try again to fix the way the scanjoin_target is used with partia (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: pgsql: Try again to fix the way the scanjoin_target is used with partia
|
List | pgsql-committers |
I wrote: > If we keep it like this, we definitely ought to refactor things so that > fewer places are aware of the possibility of the Result getting omitted. > Maybe push that logic into create_projection_path? If we did, we might > not need a separate apply_projection_to_path function at all? Too tired > to decide about it right now. After some contemplation, it seems to me that the right thing is to make ProjectionPath explicitly aware of the possibility that it's just a placeholder used for the purpose of not modifying the input Path node, and have create_projection_path calculate the correct costing either way. In this design, the principal difference between create_projection_path and apply_projection_to_path is that the latter assumes it can scribble directly on the given Path while the former doesn't modify that Path. There is one other difference: I did not do anything about making create_projection_path push the target below a Gather. In principle, it could do that by cloning the given GatherPath, but right now I think that'd be unreachable code so I did not bother. I had hoped that this would result in simplifying create_projection_plan so that it just makes a Result or not according to what create_projection_path decided, but there's one regression test case that fails (in the sense of showing a Result in the plan that isn't really needed). That happens because create_merge_append_plan adds sort columns to the tlist and so a tlist match is possible after that happens when it didn't match before. For the moment I kluged create_projection_plan so that that keeps working, but I wonder if it'd be better to just accept an extra Result in that case. Also, I got rid of the target_parallel argument to apply_projection_to_path, as I thought that that was just way too much interconnection between apply_projection_to_path and its callers than is justified for what it saves (namely one call of has_parallel_hazard when planning a Gather). In general, having that argument could *add* extra calls of has_parallel_hazard, since callers might have to do that computation whether or not a Gather is present. Any objections? regards, tom lane diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index b675447..b0a28f5 100644 *** a/src/backend/nodes/outfuncs.c --- b/src/backend/nodes/outfuncs.c *************** _outProjectionPath(StringInfo str, const *** 1809,1814 **** --- 1809,1815 ---- _outPathInfo(str, (const Path *) node); WRITE_NODE_FIELD(subpath); + WRITE_BOOL_FIELD(dummypp); } static void diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index ab8df76..1050e5e 100644 *** a/src/backend/optimizer/plan/createplan.c --- b/src/backend/optimizer/plan/createplan.c *************** create_gather_plan(PlannerInfo *root, Ga *** 1409,1416 **** /* * create_projection_plan * ! * Create a Result node to do a projection step and (recursively) plans ! * for its subpaths. */ static Plan * create_projection_plan(PlannerInfo *root, ProjectionPath *best_path) --- 1409,1417 ---- /* * create_projection_plan * ! * Create a plan tree to do a projection step and (recursively) plans ! * for its subpaths. We may need a Result node for the projection, ! * but sometimes we can just let the subplan do the work. */ static Plan * create_projection_plan(PlannerInfo *root, ProjectionPath *best_path) *************** create_projection_plan(PlannerInfo *root *** 1425,1460 **** tlist = build_path_tlist(root, &best_path->path); /* ! * We might not really need a Result node here. There are several ways ! * that this can happen. For example, MergeAppend doesn't project, so we ! * would have thought that we needed a projection to attach resjunk sort ! * columns to its output ... but create_merge_append_plan might have added ! * those same resjunk sort columns to both MergeAppend and its children. ! * Alternatively, apply_projection_to_path might have created a projection ! * path as the subpath of a Gather node even though the subpath was ! * projection-capable. So, if the subpath is capable of projection or the ! * desired tlist is the same expression-wise as the subplan's, just jam it ! * in there. We'll have charged for a Result that doesn't actually appear ! * in the plan, but that's better than having a Result we don't need. */ ! if (is_projection_capable_path(best_path->subpath) || ! tlist_same_exprs(tlist, subplan->targetlist)) { plan = subplan; plan->targetlist = tlist; ! ! /* Adjust cost to match what we thought during planning */ ! plan->startup_cost = best_path->path.startup_cost; ! plan->total_cost = best_path->path.total_cost; ! /* ... but be careful not to munge subplan's parallel-aware flag */ } else { plan = (Plan *) make_result(tlist, NULL, subplan); - - copy_generic_path_info(plan, (Path *) best_path); } return plan; } --- 1426,1461 ---- tlist = build_path_tlist(root, &best_path->path); /* ! * We might not really need a Result node here. Usually, ! * create_projection_path will have detected that and set dummypp if we ! * don't need one, but there is one exception. MergeAppend doesn't ! * project, so we would have thought that we needed a projection to attach ! * resjunk sort columns to its output ... but create_merge_append_plan ! * might have added those same resjunk sort columns to both MergeAppend ! * and its children. So, if dummypp is true or the desired tlist is the ! * same expression-wise as the subplan's, just jam it in there. In the ! * MergeAppend special case, we'll have charged for a Result that doesn't ! * actually appear in the plan, but that's better than having a Result we ! * don't need. (XXX this could be cleaned up if we moved more of the ! * sortcolumn setup logic into Path creation, but that would add expense ! * to creating Paths we might end up not using.) */ ! if (best_path->dummypp || ! (IsA(subplan, MergeAppend) && ! tlist_same_exprs(tlist, subplan->targetlist))) { + /* Don't need a separate Result, just assign tlist to subplan */ plan = subplan; plan->targetlist = tlist; ! /* We'll change the costs below too, to account for the new tlist */ } else { plan = (Plan *) make_result(tlist, NULL, subplan); } + copy_generic_path_info(plan, (Path *) best_path); + return plan; } diff --git a/src/backend/optimizer/plan/planagg.c b/src/backend/optimizer/plan/planagg.c index 0434a5a..cefec7b 100644 *** a/src/backend/optimizer/plan/planagg.c --- b/src/backend/optimizer/plan/planagg.c *************** build_minmax_path(PlannerInfo *root, Min *** 465,472 **** * cheapest path.) */ sorted_path = apply_projection_to_path(subroot, final_rel, sorted_path, ! create_pathtarget(subroot, tlist), ! false); /* * Determine cost to get just the first row of the presorted path. --- 465,471 ---- * cheapest path.) */ sorted_path = apply_projection_to_path(subroot, final_rel, sorted_path, ! create_pathtarget(subroot, tlist)); /* * Determine cost to get just the first row of the presorted path. diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 094c708..2372311 100644 *** a/src/backend/optimizer/plan/planner.c --- b/src/backend/optimizer/plan/planner.c *************** grouping_planner(PlannerInfo *root, bool *** 1500,1506 **** PathTarget *grouping_target; PathTarget *scanjoin_target; bool have_grouping; - bool scanjoin_target_parallel_safe = false; WindowFuncLists *wflists = NULL; List *activeWindows = NIL; List *rollup_lists = NIL; --- 1500,1505 ---- *************** grouping_planner(PlannerInfo *root, bool *** 1731,1744 **** scanjoin_target = grouping_target; /* - * Check whether scan/join target is parallel safe ... unless there - * are no partial paths, in which case we don't care. - */ - if (current_rel->partial_pathlist && - !has_parallel_hazard((Node *) scanjoin_target->exprs, false)) - scanjoin_target_parallel_safe = true; - - /* * Forcibly apply scan/join target to all the Paths for the scan/join * rel. * --- 1730,1735 ---- *************** grouping_planner(PlannerInfo *root, bool *** 1756,1763 **** Assert(subpath->param_info == NULL); path = apply_projection_to_path(root, current_rel, ! subpath, scanjoin_target, ! scanjoin_target_parallel_safe); /* If we had to add a Result, path is different from subpath */ if (path != subpath) { --- 1747,1753 ---- Assert(subpath->param_info == NULL); path = apply_projection_to_path(root, current_rel, ! subpath, scanjoin_target); /* If we had to add a Result, path is different from subpath */ if (path != subpath) { *************** grouping_planner(PlannerInfo *root, bool *** 1774,1788 **** * partial pathlist will expect partial paths for that rel to produce * the same output as complete paths ... and we just changed the * output for the complete paths, so we'll need to do the same thing ! * for partial paths. */ ! if (scanjoin_target_parallel_safe) { ! /* ! * Apply the scan/join target to each partial path. Otherwise, ! * anything that attempts to use the partial paths for further ! * upper planning may go wrong. ! */ foreach(lc, current_rel->partial_pathlist) { Path *subpath = (Path *) lfirst(lc); --- 1764,1776 ---- * partial pathlist will expect partial paths for that rel to produce * the same output as complete paths ... and we just changed the * output for the complete paths, so we'll need to do the same thing ! * for partial paths. But only parallel-safe expressions can be ! * computed by partial paths. */ ! if (current_rel->partial_pathlist && ! !has_parallel_hazard((Node *) scanjoin_target->exprs, false)) { ! /* Apply the scan/join target to each partial path */ foreach(lc, current_rel->partial_pathlist) { Path *subpath = (Path *) lfirst(lc); *************** grouping_planner(PlannerInfo *root, bool *** 1792,1827 **** Assert(subpath->param_info == NULL); /* ! * We can't use apply_projection_to_path() here, because there ! * could already be pointers to these paths, and therefore we ! * dare not modify them in place. Instead, we must use ! * create_projection_path() unconditionally. */ newpath = (Path *) create_projection_path(root, current_rel, subpath, scanjoin_target); - - /* - * Although create_projection_path() inserts a ProjectionPath - * unconditionally, create_projection_plan() will only insert - * a Result node if the subpath is not projection-capable, so - * we should discount the cost of that node if it will not - * actually get inserted. (This is pretty grotty but we can - * improve it later if it seems important.) - */ - if (equal(scanjoin_target->exprs, subpath->pathtarget->exprs)) - { - /* at most we need a relabeling of the subpath */ - newpath->startup_cost = subpath->startup_cost; - newpath->total_cost = subpath->total_cost; - } - else if (is_projection_capable_path(subpath)) - { - /* need to project, but we don't need a Result */ - newpath->total_cost -= cpu_tuple_cost * subpath->rows; - } - lfirst(lc) = newpath; } } --- 1780,1793 ---- Assert(subpath->param_info == NULL); /* ! * Don't use apply_projection_to_path() here, because there ! * could be other pointers to these paths, and therefore we ! * mustn't modify them in place. */ newpath = (Path *) create_projection_path(root, current_rel, subpath, scanjoin_target); lfirst(lc) = newpath; } } *************** create_ordered_paths(PlannerInfo *root, *** 4231,4237 **** /* Add projection step if needed */ if (path->pathtarget != target) path = apply_projection_to_path(root, ordered_rel, ! path, target, false); add_path(ordered_rel, path); } --- 4197,4203 ---- /* Add projection step if needed */ if (path->pathtarget != target) path = apply_projection_to_path(root, ordered_rel, ! path, target); add_path(ordered_rel, path); } diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c index 30975e0..552b756 100644 *** a/src/backend/optimizer/prep/prepunion.c --- b/src/backend/optimizer/prep/prepunion.c *************** recurse_set_operations(Node *setOp, Plan *** 325,332 **** refnames_tlist); path = apply_projection_to_path(root, rel, path, ! create_pathtarget(root, tlist), ! false); /* Return the fully-fledged tlist to caller, too */ *pTargetList = tlist; --- 325,331 ---- refnames_tlist); path = apply_projection_to_path(root, rel, path, ! create_pathtarget(root, tlist)); /* Return the fully-fledged tlist to caller, too */ *pTargetList = tlist; *************** recurse_set_operations(Node *setOp, Plan *** 395,402 **** path->parent, path, create_pathtarget(root, ! *pTargetList), ! false); } return path; } --- 394,400 ---- path->parent, path, create_pathtarget(root, ! *pTargetList)); } return path; } diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index 0ff353f..13d28ab 100644 *** a/src/backend/optimizer/util/pathnode.c --- b/src/backend/optimizer/util/pathnode.c *************** create_projection_path(PlannerInfo *root *** 2168,2173 **** --- 2168,2174 ---- PathTarget *target) { ProjectionPath *pathnode = makeNode(ProjectionPath); + PathTarget *oldtarget = subpath->pathtarget; pathnode->path.pathtype = T_Result; pathnode->path.parent = rel; *************** create_projection_path(PlannerInfo *root *** 2184,2196 **** pathnode->subpath = subpath; /* ! * The Result node's cost is cpu_tuple_cost per row, plus the cost of ! * evaluating the tlist. There is no qual to worry about. */ ! pathnode->path.rows = subpath->rows; ! pathnode->path.startup_cost = subpath->startup_cost + target->cost.startup; ! pathnode->path.total_cost = subpath->total_cost + target->cost.startup + ! (cpu_tuple_cost + target->cost.per_tuple) * subpath->rows; return pathnode; } --- 2185,2231 ---- pathnode->subpath = subpath; /* ! * We might not need a separate Result node. If the input plan node type ! * can project, we can just tell it to project something else. Or, if it ! * can't project but the desired target has the same expression list as ! * what the input will produce anyway, we can still give it the desired ! * tlist (possibly changing its ressortgroupref labels, but nothing else). */ ! if (is_projection_capable_path(subpath) || ! equal(oldtarget->exprs, target->exprs)) ! { ! /* No separate Result node needed */ ! pathnode->dummypp = true; ! ! /* ! * Set cost of plan as subpath's cost, adjusted for tlist replacement. ! */ ! pathnode->path.rows = subpath->rows; ! pathnode->path.startup_cost = subpath->startup_cost + ! (target->cost.startup - oldtarget->cost.startup); ! pathnode->path.total_cost = subpath->total_cost + ! (target->cost.startup - oldtarget->cost.startup) + ! (target->cost.per_tuple - oldtarget->cost.per_tuple) * subpath->rows; ! ! /* Make sure create_projection_plan doesn't clobber parallel_aware */ ! pathnode->path.parallel_aware = subpath->parallel_aware; ! } ! else ! { ! /* We really do need the Result node */ ! pathnode->dummypp = false; ! ! /* ! * The Result node's cost is cpu_tuple_cost per row, plus the cost of ! * evaluating the tlist. There is no qual to worry about. ! */ ! pathnode->path.rows = subpath->rows; ! pathnode->path.startup_cost = subpath->startup_cost + ! target->cost.startup; ! pathnode->path.total_cost = subpath->total_cost + ! target->cost.startup + ! (cpu_tuple_cost + target->cost.per_tuple) * subpath->rows; ! } return pathnode; } *************** create_projection_path(PlannerInfo *root *** 2199,2234 **** * apply_projection_to_path * Add a projection step, or just apply the target directly to given path. * ! * Most plan types include ExecProject, so we can implement a new projection ! * without an extra plan node: just replace the given path's pathtarget with ! * the desired one. If the given path can't project, add a ProjectionPath. * ! * We can also short-circuit cases where the targetlist expressions are ! * actually equal; this is not an uncommon case, since it may arise from ! * trying to apply a PathTarget with sortgroupref labeling to a derived ! * path without such labeling. * ! * This requires knowing that the source path won't be referenced for other ! * purposes (e.g., other possible paths), since we modify it in-place. Note ! * also that we mustn't change the source path's parent link; so when it is * add_path'd to "rel" things will be a bit inconsistent. So far that has * not caused any trouble. * * 'rel' is the parent relation associated with the result * 'path' is the path representing the source of data * 'target' is the PathTarget to be computed - * 'target_parallel' indicates that target expressions are all parallel-safe */ Path * apply_projection_to_path(PlannerInfo *root, RelOptInfo *rel, Path *path, ! PathTarget *target, ! bool target_parallel) { QualCost oldcost; ! /* Make a separate ProjectionPath if needed */ if (!is_projection_capable_path(path) && !equal(path->pathtarget->exprs, target->exprs)) return (Path *) create_projection_path(root, rel, path, target); --- 2234,2269 ---- * apply_projection_to_path * Add a projection step, or just apply the target directly to given path. * ! * This has the same net effect as create_projection_path(), except that if ! * a separate Result plan node isn't needed, we just replace the given path's ! * pathtarget with the desired one. This must be used only when the caller ! * knows that the given path isn't referenced elsewhere and so can be modified ! * in-place. * ! * If the input path is a GatherPath, we try to push the new target down to ! * its input as well; this is a yet more invasive modification of the input ! * path, which create_projection_path() can't do. * ! * Note that we mustn't change the source path's parent link; so when it is * add_path'd to "rel" things will be a bit inconsistent. So far that has * not caused any trouble. * * 'rel' is the parent relation associated with the result * 'path' is the path representing the source of data * 'target' is the PathTarget to be computed */ Path * apply_projection_to_path(PlannerInfo *root, RelOptInfo *rel, Path *path, ! PathTarget *target) { QualCost oldcost; ! /* ! * Make a separate ProjectionPath if a Result node is needed (see comments ! * in create_projection_path). ! */ if (!is_projection_capable_path(path) && !equal(path->pathtarget->exprs, target->exprs)) return (Path *) create_projection_path(root, rel, path, target); *************** apply_projection_to_path(PlannerInfo *ro *** 2247,2256 **** /* * If the path happens to be a Gather path, we'd like to arrange for the * subpath to return the required target list so that workers can help ! * project. But if there is something that is not parallel-safe in the * target expressions, then we can't. */ ! if (IsA(path, GatherPath) &&target_parallel) { GatherPath *gpath = (GatherPath *) path; --- 2282,2292 ---- /* * If the path happens to be a Gather path, we'd like to arrange for the * subpath to return the required target list so that workers can help ! * project. But if there is something that is not parallel-safe in the * target expressions, then we can't. */ ! if (IsA(path, GatherPath) && ! !has_parallel_hazard((Node *) target->exprs, false)) { GatherPath *gpath = (GatherPath *) path; *************** apply_projection_to_path(PlannerInfo *ro *** 2258,2271 **** * We always use create_projection_path here, even if the subpath is * projection-capable, so as to avoid modifying the subpath in place. * It seems unlikely at present that there could be any other ! * references to the subpath anyway, but better safe than sorry. ! * (create_projection_plan will only insert a Result node if the ! * subpath is not projection-capable, so we only include the cost of ! * that node if it will actually be inserted. This is a bit grotty ! * but we can improve it later if it seems important.) */ - if (!is_projection_capable_path(gpath->subpath)) - gpath->path.total_cost += cpu_tuple_cost * gpath->subpath->rows; gpath->subpath = (Path *) create_projection_path(root, gpath->subpath->parent, --- 2294,2305 ---- * We always use create_projection_path here, even if the subpath is * projection-capable, so as to avoid modifying the subpath in place. * It seems unlikely at present that there could be any other ! * references to the subpath, but better safe than sorry. ! * ! * Note that we don't change the GatherPath's cost estimates; it might ! * be appropriate to do so, to reflect the fact that the bulk of the ! * target evaluation will happen in workers. */ gpath->subpath = (Path *) create_projection_path(root, gpath->subpath->parent, diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index 57747fc..9470df6 100644 *** a/src/include/nodes/relation.h --- b/src/include/nodes/relation.h *************** typedef struct HashPath *** 1274,1288 **** /* * ProjectionPath represents a projection (that is, targetlist computation) * ! * This path node represents using a Result plan node to do a projection. ! * It's only needed atop a node that doesn't support projection (such as ! * Sort); otherwise we just jam the new desired PathTarget into the lower ! * path node, and adjust that node's estimated cost accordingly. */ typedef struct ProjectionPath { Path path; Path *subpath; /* path representing input source */ } ProjectionPath; /* --- 1274,1295 ---- /* * ProjectionPath represents a projection (that is, targetlist computation) * ! * Nominally, this path node represents using a Result plan node to do a ! * projection step. However, if the input plan node supports projection, ! * we can just modify its output targetlist to do the required calculations ! * directly, and not need a Result. In some places in the planner we can just ! * jam the desired PathTarget into the input path node (and adjust its cost ! * accordingly), so we don't need a ProjectionPath. But in other places ! * it's necessary to not modify the input path node, so we need a separate ! * ProjectionPath node, which is marked dummy to indicate that we intend to ! * assign the work to the input plan node. The estimated cost for the ! * ProjectionPath node will account for whether a Result will be used or not. */ typedef struct ProjectionPath { Path path; Path *subpath; /* path representing input source */ + bool dummypp; /* true if no separate Result is needed */ } ProjectionPath; /* diff --git a/src/include/optimizer/pathnode.h b/src/include/optimizer/pathnode.h index 586ecdd..5de4c34 100644 *** a/src/include/optimizer/pathnode.h --- b/src/include/optimizer/pathnode.h *************** extern ProjectionPath *create_projection *** 143,150 **** extern Path *apply_projection_to_path(PlannerInfo *root, RelOptInfo *rel, Path *path, ! PathTarget *target, ! bool target_parallel); extern SortPath *create_sort_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath, --- 143,149 ---- extern Path *apply_projection_to_path(PlannerInfo *root, RelOptInfo *rel, Path *path, ! PathTarget *target); extern SortPath *create_sort_path(PlannerInfo *root, RelOptInfo *rel, Path *subpath,
pgsql-committers by date: