Re: Parallel Aggregate - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Parallel Aggregate |
Date | |
Msg-id | CAKJS1f80=f-z1CUU7=QDmn0r=_yeU7paN2dZ6rQSnUpfEFOUNw@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Aggregate (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Parallel Aggregate
|
List | pgsql-hackers |
On 18 March 2016 at 20:25, Amit Kapila <amit.kapila16@gmail.com> wrote: > Few more comments: > > 1. > > + if (parse->groupClause) > + path = (Path *) create_sort_path(root, > + grouped_rel, > + path, > + root->group_pathkeys, > + -1.0); > > For final path, why do you want to sort just for group by case? If there's no GROUP BY then there will only be a single group, this does not require sorting, e.g SELECT SUM(col) from sometable; I added the comment: /* * Gather is always unsorted, so we'll need to sort, unless there's * no GROUP BY clause, in which case there will only be a single * group. */ > 2. > + path = (Path *) create_gather_path(root, partial_grouped_rel, path, > + NULL, &total_groups); > + > + if (parse->groupClause) > + path = (Path *) create_sort_path(root, > + grouped_rel, > + path, > + root->group_pathkeys, > + -1.0); > + > + if (parse->hasAggs) > + add_path(grouped_rel, (Path *) > + create_agg_path(root, > + grouped_rel, > + path, > + target, > + parse->groupClause ? AGG_SORTED : AGG_PLAIN, > + parse->groupClause, > + (List *) parse->havingQual, > + &agg_costs, > + partial_grouped_rel->rows, > + true, > + true)); > + else > + add_path(grouped_rel, (Path *) > + create_group_path(root, > + grouped_rel, > + path, > + target, > + parse->groupClause, > + (List *) parse->havingQual, > + total_groups)); > > In above part of patch, it seems you are using number of groups > differenetly; for create_group_path() and create_gather_path(), you have > used total_groups whereas for create_agg_path() partial_grouped_rel->rows is > used, is there a reason for the same? That's a mistake... too much code shuffling yesterday it seems. > 3. > + if (grouped_rel->partial_pathlist) > + { > + Path *path = (Path *) > linitial(grouped_rel->partial_pathlist); > + double total_groups; > + > + total_groups = > path->rows * path->parallel_degree; > + path = (Path *) create_gather_path(root, partial_grouped_rel, > path, > + NULL, &total_groups); > > A. Won't passing partial_grouped_rel lead to incomplete information required > by create_gather_path() w.r.t the case of parameterized path info? There should be no parameterized path info after joins are over, but never-the-less I took your advice about passing PathTarget to create_gather_path(), so this partial_grouped_rel no longer exists. > B. You have mentioned that passing grouped_rel will make gather path contain > the information of final path target, but what is the problem with that? I > mean to ask why Gather node is required to contain partial path target > information instead of final path target. Imagine a query such as: SELECT col,SUM(this) FROM sometable GROUP BY col HAVING SUM(somecolumn) > 0; In this case SUM(somecolumn) won't be in the final PathTarget. The partial grouping target will contain the Aggref from the HAVING clause. The other difference with te partial aggregate PathTarget is that the Aggrefs return the partial state in exprType() rather than the final value's type, which is required so the executor knows how to form and deform tuples, plus many other things. > C. Can we consider passing pathtarget to create_gather_path() as that seems > to save us from inventing new UpperRelationKind? If you are worried about > adding the new parameter (pathtarget) to create_gather_path(), then I think > we are already passing it in many other path generation functions, so why > not for gather path generation as well? That's a better idea... Changed to that... > 4A. > Overall function create_grouping_paths() looks better than previous, but I > think still it is difficult to read. I think it can be improved by > generating partial aggregate paths separately as we do for nestloop join > refer function consider_parallel_nestloop hmm, perhaps the partial path generation could be moved off to another static function, although we'd need to pass quite a few parameters to it, like can_sort, can_hash, partial_grouping_target, grouped_rel, root. Perhaps it's worth doing, but we still need the partial_grouping_target for the Gather node, so it's not like that other function can do all of the parallel stuff... We'd still need some knowledge of that in create_grouping_paths() > 4B. > Rather than directly using create_gather_path(), can't we use > generate_gather_paths as for all places where we generate gather node, > generate_gather_paths() is used. I don't think this is a good fit here, although it would be nice as it would save having to special case generating the final aggregate paths on the top of the partial paths. It does not seem that nice as it's not really that clear if we need to make a combine aggregate node, or a normal aggregate node on the path. The only way to determine that would by by checking if it was a GatherPath or not, and that does not seem like a nice way to go about doing that. Someone might go and invent something new like MergeGather one day. > 5. > +make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target) > { > .. > .. > + foreach(lc, final_target->exprs) > + { > + Expr *expr = (Expr *) lfirst(lc); > + > + > i++; > + > + if (parse->groupClause) > + { > + Index sgref = final_target- >>sortgrouprefs[i]; > + > + if (sgref && get_sortgroupref_clause_noerr(sgref, parse->groupClause) > + > != NULL) > + { > + /* > + > * It's a grouping column, so add it to the input target as-is. > + */ > + > add_column_to_pathtarget(input_target, expr, sgref); > + continue; > + > } > + } > + > + /* > + * Non-grouping column, so just remember the expression for later > + > * call to pull_var_clause. > + */ > + non_group_cols = lappend(non_group_cols, expr); > + > } > .. > } > > Do we want to achieve something different in the above foreach loop than the > similar loop in make_group_input_target(), if not then why are they not > exactly same? It seems that the problem that causes me to change that around is now gone. With the change reverted I'm unable to produce the original crash that I was getting. I know that Tom has done quite a number of changes to PathTargets while I've been working on this, so perhaps not surprising. I've reverted that change now. > 6. > + /* XXX this causes some redundant cost calculation ... */ > + input_target = set_pathtarget_cost_width(root, > input_target); > + return input_target; > > Can't we use return set_pathtarget_cost_width() directly rather than > fetching it in input_target and then returning input_target? Yes, fixed. Many thanks for the thorough review. I've attached an updated patch. I also tweaked the partial path generation in create_grouping_paths() so that it only considers sorting the cheapest path, or using any existing pre-sorted paths, rather than trying to sort every path. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
pgsql-hackers by date: